Bayesian Data Analysis -Part I (Beta-Binomial Models)

6 min readNov 22, 2022

Even though Einstein accepted all the empirical outcomes of quantum physics and suggested several physicists in this area to Nobel Price, he was not convinced by the explanation attempts. Being an avid believer in Spinoza’s deterministic Nature-God, he said “God does not play dice.” Bohr’s response to this famous statement is “Do not tell God what to do.” A vast area of philosophical discussion under empiricism and the critical point of metaphysics is still ongoing. What resembling with this discussion is the essential difference in understanding of population parameter, typically named θ (theta) in statistics. The classical approach accepts that there is a true unknown value of θ, and we get close to it with some statistical methods. However, the Bayesian approach only accepts that there is a distribution of population parameter θ. With the new information, we update our beliefs about this distribution. In this setting, we can say that the Bayesian approach is less susceptible to metaphysical assumptions (existence of a certain unknown parameter θ), but seems like the opposite when making assumptions about the prior distribution. Even in physics, it seems like we keep getting far from the comforting presupposition of certainty and it is eye-opening to read on this topic. There are various better sources for the theoretical and philosophical fundamentals of the topic. In this work, I will share some of my notes on practical examples.

Assume that there is a student representative election coming up at a university, Adam is one of the candidates and we want to guess his percentage of support. We know that in the cycling club president elections, Adam got 45% of the votes, and we believe that this information is relevant to the student representative election. However, this was some time ago and we go ask random 20 students in the cafeteria and 12 of them have their support for the representative elections for Adam. What would the Bayesian estimation of Adam’s election results be?

The information we have in this example is somewhat not precise, and there are some beliefs involved, which is the often case in the real world. It is best to explain our observation with the binomial distribution. We can take the 45% election result for the student club president as a prior and explain it with a beta distribution since it is conjugate to the binomial distribution. This also makes it quite easy to do the math when doing the multiplication with prior and observation distributions. This is nicely explained in Hoff’s book and I will be providing the references down below.

I am assuming that we do not know the participant quantity for the club president elections and I am approaching the prior distribution more conservatively. Let’s accept that the parameter from prior has provided from a Beta(2.7, 3.3) distribution (Beta(a,b)).

Prior: θ ~ Beta(2.7,3.3)

This distribution has the expected value of E[θ] = .45. We could have taken any number providing the mean value of .45 but this has a relatively high variance, therefore we take more uncertainty into account. The average is .45 but we obtained this quantity from a cycling club presidential election. Adam might be more popular among students who are into cycling and I do not want this prior information too dominant in our analysis.

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,9,11),t="l",col="blue", lwd=2, xlab="", ylab="Density")
lines(ix,dbeta(ix,4.5,5.5),col="orange", lwd=2)
lines(ix,dbeta(ix,2.7,3.3),col="red", lwd=2)
legend("topright",lwd = 1.5,col = c("blue","orange","red"),c("Beta(9,11)","Beta(4.5,5.5)", "Beta(2.7,3.3)"))
title(main="Beta Distributions with E[θ]=.45", ylab="Density", xlab="")

The distribution of the dataset (observation) we obtained from the cafeteria can be expressed as a binomial distribution (Binomial(n, θ)).

Observation: Y|θ ~ Binomial(20, .6)

success <- 0:20
plot(success, dbinom(success, size=21, prob=.6), type="h", prob=T, lwd=5,
     main="Binomial Distribution of Y|θ", ylab="Density", xlab="Success")

Now how do we use this prior and observation and reach a posterior distribution? Here is the formula for the posterior distribution:

Denominator p(y) is calculated for every possible value of θ (Θ in the integration stands for the parameter sample), which is quite difficult to calculate. However, the denominator is only the normalizing constant and the posterior is proportional to the nominator part. Therefore, we do not need to find this constant.

This multiplication provides us the sufficient statistics. Now combining this information relies upon some proofs and concepts including conjugacy, and interchangeability, and I am not getting into details. It is nicely explained in Hoff’s book, part 3.

Combining the posterior and observation will provide us a new Beta distribution in this example. To be precise, the new beta distribution is:

Posterior: θ|Y ~ Beta(a + y, b + n — y)

Posterior: θ|Y ~ Beta(14.7, 11.3)

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,14.7,11.3),lty=1, t="l", lwd=2, ylab="Density", xlab="")
lines(ix,dbeta(ix,2.7,3.3),lty=2, lwd=2)
legend("topright",lwd = 2, lty = c(2,1),c("Prior","Posterior"))

See how new information changed the distribution. Posterior distribution has thinner tails, meaning reduced variation.

We can also draw the likelihood function of the observation on the same plot and this can give us some sense of how the observation takes a role. It is helpful to remind that when the likelihood functions are drawn, the y-axis is not the density. Since it is not a probability function, the area will not be sum to 1. Y-axis will be Density/Likelihood.

set.seed(123)
data <- rbinom(20,1,0.60)

likelihood_bern <- function(obs, theta){
  N <- length(obs)
  x <- sum(obs)
  likelihood <- ((theta)^x)*((1 - theta)^(N-x))
  return(likelihood)
}

likelihood_s <- likelihood_bern(obs = data, theta = ix)

plot(ix,dbeta(ix,14.7,11.3),lty=1, t="l", lwd=2, xlab="Thetas", yaxt = 'n', ylab="Density/Likelihood")
lines(ix,dbeta(ix,2.7,3.3),lty=2, lwd=2)
par(new = TRUE)
plot(ix, likelihood_s, col='orange', lwd=2, type='l', yaxt = 'n', ylab="", xlab="")
legend("topright", lwd = 2, lty = c(2,1), c("Prior","Posterior"))
legend("topleft", lwd = 2, col= c('orange'), c("Likelihood"))

The posterior distribution we have reached has these statistics:

E[θ|Y] = 14.7 / 14.7 +11.3 = 0.5653

mode(θ|Y) = a-1 / (a-1)+(b-1) = 0.5708

Var(θ|Y) = a*b / (a+b+1)*(a+b)² = 0.0091

std(θ|Y) = 0.0953

Let’s see how different priors change the posterior distribution:

par(mfrow=c(2,2))

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,9,11),t="l",col="#397CA5", lwd=2, xlab="", ylab="Density")
lines(ix,dbeta(ix,4.5,5.5),col="orange", lwd=2)
lines(ix,dbeta(ix,2.7,3.3),col="#F84D65", lwd=2)
legend("topright",lwd = 1.5,col = c("#397CA5","orange","#F84D65"),c("Beta(9,11)","Beta(4.5,5.5)", "Beta(2.7,3.3)"),cex=0.5)
title(main="Beta Distributions with E[θ]=.45", ylab="Density", xlab="")

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,14.7,11.3),lty=1, t="l", lwd=2, yaxt = 'n', ylab="Density/Likelihood", xlab="")
lines(ix,dbeta(ix,2.7,3.3),lty=2, lwd=2)
par(new = TRUE)
plot(ix, likelihood_s, col='orange', lwd=2, type='l', yaxt = 'n', ylab="", xlab="")
legend("topleft", lwd = 2, col= c('orange'), c("Likelihood"), cex=0.5)
legend("topright",lwd = 2, lty = c(2,1),c("Prior","Posterior"), cex=0.5)
title(main="Using Prior beta(2.7,3.3)")

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,16.5,13.5),lty=1, t="l", lwd=2, yaxt = 'n', ylab="Density/Likelihood", xlab="")
lines(ix,dbeta(ix,4.5,5.5),lty=2, lwd=2)
par(new = TRUE)
plot(ix, likelihood_s, col='orange', lwd=2, type='l', yaxt = 'n', ylab="", xlab="")
legend("topleft", lwd = 2, col= c('orange'), c("Likelihood"), cex=0.5)
legend("topright",lwd = 2, lty = c(2,1),c("Prior","Posterior"), cex=0.5)
title(main="Using Prior beta(4.5,5.5)")

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,21,19),lty=1, t="l", lwd=2, yaxt = 'n', ylab="Density/Likelihood", xlab="")
lines(ix,dbeta(ix,9,11),lty=2, lwd=2)
par(new = TRUE)
plot(ix, likelihood_s, col='orange', lwd=2, type='l', yaxt = 'n', ylab="", xlab="")
legend("topleft", lwd = 2, col= c('orange'), c("Likelihood"), cex=0.5)
legend("topright",lwd = 2, lty = c(2,1),c("Prior","Posterior"), cex=0.5)
title(main="Using Prior beta(9,11)")

See how steeper priors would dominate the outcome:

ix <- seq(0,1,0.01)
plot(ix,dbeta(ix,45+12,55+20-12),lty=1, t="l", lwd=2, ylab="Density", xlab="")
lines(ix,dbeta(ix,45,55),lty=2, lwd=2)
legend("topright",lwd = 2, lty = c(2,1),c("Prior","Posterior"))
title(main="Using Prior beta(45,55)")

This was a quick, basic example using beta and binomial distribution and I am aiming to gather more examples on this platform. Thanks for reading.

A First Course in Bayesian Statistical Methods Peter D. Hoff (2009) (Springer Texts in Statistics)

Bayesian Data Analysis -Part I (Beta-Binomial Models)

Written by Fatih Boyar

No responses yet