Bayesian Data Analysis -Part II (Gamma-Poisson Models)

Fatih Boyar
5 min readDec 13, 2022

--

Poisson Distribution is another frequently seen in nature, specifically counting on a continuous scale, ie. the number of customers walking in a store for a period of time, plane landings on a day, etc. Because it has one parameter and it is the same amount for the expected value and variance, this distribution is so convenient. It is almost too good to be true!

Even though there is more than one distribution to take as a prior, the gamma distribution is conjugate to the Poisson distribution and the gamma distribution will be taken into account in most cases.

An uncertain positive quantity θ has a gamma(a,b) distribution if;

Gamma PDF

There are two approaches to the parameters of the gamma distribution. One is shape and scale and the other is shape and rate. Scale indicates the average time between events. Rate, on the other hand, indicates the rate of events in a given time. Scale is reciprocal to rate (think as b and 1/b respectively). PDF given above uses shape and rate, and so are gamma distribution functions in R. Here in this notation a is for shape and b is for rate. Therefore, the expected value and variance for gamma distribution are:

Let’s see the process in an example. Patients who came to the burn unit in Hospital A for a 20 days interval are counted as:

{2,2,4,0,0,1,1,1,1,0,2,2,0,1,5,0,1,0,3,0}

We can safely accept that this variable has a Poisson Distribution with an expected value of 1.3

n1 <- length(c(2,2,4,0,0,1,1,1,1,0,2,2,0,1,5,0,1,0,3,0))
sumy1 <- sum(c(2,2,4,0,0,1,1,1,1,0,2,2,0,1,5,0,1,0,3,0))
cat('MLE (mean):',sumy1/n1)

# output >> MLE (mean): 1.3
ix <- seq(0,6,1)
plot(ix,dpois(ix,1.3),col="#527f94", lwd=5, type='h', prob=TRUE, xlab="# of Patience on Average", ylab="Density")
title(main="Poisson Distribution of Y|θ")

Assume that we have prior information about the number of patients for the burn unit in the past. To practice, we can take different prior examples.

Prior_1: θ ~ Gamma(1,1)

Prior_2: θ ~ Gamma(5,5)

Prior_3: θ ~ Gamma(58,35)

Prior_1 and Prior_2 have the same expected value (1), but the variance for Prior_2 is much lower (1, and 0.2 respectively). Prior_3 has an expected value of 1.65 and a variance of 0.04 and it is to see how dominant this prior will be in the analysis.

ix <- seq(0,3,0.01)
plot(ix,dgamma(ix,1,1),t="l",col="#527f94", lwd=3, xlab="", ylab="Density", xlim=c(0,3), ylim=c(0,2))
lines(ix,dgamma(ix,5,5),col="#e0b024", lwd=3)
lines(ix,dgamma(ix,58,35),col="#da5d68", lwd=3)
legend("topright",lwd = 1.5,col = c("#527f94","#e0b024","#da5d68"),c("Gamma(1,1)","Gamma(5,5)", "Gamma(58,35)"))
title(main="Gamma Distributions", ylab="Density", xlab="# of Patience on Average")

To find the posterior distribution, apply Bayes’ formula:

Using Bayes’ Formula to get the posterior distribution

Again, this is evidently a gamma distribution with parameters (a+the sum comes from the data, b+number of observations of the data):

Parameters of the posterior gamma distribution

Now it is quite easy to find the parameters of the posterior distribution.

For prior Gamma(1,1), posterior is Gamma(27,21)

For prior Gamma(5,5), posterior is Gamma(31,25)

For prior Gamma(58,35), posterior is Gamma(84,55)

par(mfrow=c(3,1))

ix <- seq(0,3,0.01)
plot(ix,dgamma(ix,1,1),lty=2, t="l", lwd=2, ylab="Density", xlab="", ylim=c(0,2.75)) #prior
lines(ix,dgamma(ix,1+sumy1,1+n1),lty=1, lwd=2) #posterior
legend("topright",lwd = 2, lty = c(2,1),c("Prior~Gamma(1,1)","Posterior~Gamma(27,21)"))

plot(ix,dgamma(ix,5,5),lty=2, t="l", lwd=2, ylab="Density", xlab="", ylim=c(0,2.75)) #prior
lines(ix,dgamma(ix,5+sumy1,5+n1),lty=1, lwd=2) #posterior
legend("topright",lwd = 2, lty = c(2,1),c("Prior~Gamma(5,5)","Posterior~Gamma(31,25)"))

plot(ix,dgamma(ix,58,35),lty=2, t="l", lwd=2, ylab="Density", xlab="", ylim=c(0,2.75)) #prior
lines(ix,dgamma(ix,38+sumy1,35+n1),lty=1, lwd=2) #posterior
legend("topright",lwd = 2, lty = c(2,1),c("Prior~Gamma(58,35)","Posterior~Gamma(84,55)"))
Comparison between different priors

See how the third prior dominates the distribution and pulls it toward the 1.5 mean value.

Let’s use this method to compare two different hospitals. Hospital B counted the patients in burn unit for 20 days interval and got the results as:

{1,2,3,2,1,1,2,2,2,2,1,0,1,1,2,3,1,1,2,2}

n2 <- length(c(1,2,3,2,1,1,2,2,2,2,1,0,1,1,2,3,1,1,2,2))
sumy2 <- sum(c(1,2,3,2,1,1,2,2,2,2,1,0,1,1,2,3,1,1,2,2))
cat('MLE (mean):',sumy2/n2)

# output >> MLE (mean): 1.6

Assume we have the same prior (Gamma(5,4)) for both of the hospitals. After calculating the posteriors, we have:

{θ_1|Y1_1, Y2_1, … , Yn_1} ~ Gamma(31,24)

{θ_2|Y1_2, Y2_2, … , Yn_2} ~ Gamma(37,24)

ix <- seq(0,3,0.01)
plot(ix,dgamma(ix, 5+sumy1, 4+n1), col='#527f94', t="l", lwd=3, ylab="Density", xlab="") #posterior_A
lines(ix,dgamma(ix,5+sumy2, 4+n2), col='#FF6071', lwd=3) #posterior_B
legend("topright", lwd=2, col=c('#527f94', '#FF6071'), c("Posterior_HospitalA","Posterior_HospitalB"))
Posteriors for comparison

Now it seems Hospital B has more patients but there is significant overlap also. The probability that Hospital B has more patients than Hospital A is 0.535.

round(mean(dgamma(ix, 5+sumy1, 4+n1) < dgamma(ix, 5+sumy2, 4+n2)),3)
# output >> 0.535

We can interpret this result as, if we sample both the hospitals for 20 days, the probability that the Hospital B has more patience is 0.535.

In addition to this, here is another interesting question. In a future day, what is the probability that Hospital B accepts more patients than Hospital A? To approach this, we need the posterior predictive distribution. I am not going into the daunting mathematical detail of derivation but it is the same Bayesian updating principle. Posterior predective distribution is the integral of the posterior distribution and the predictive distribution for the new data.

After evaluation of this integral, we arrive to a negative binomial distribution. We are back at a discrete distribution. How convenient for the prediction!

Posterior Predictive Distribution
ix <- seq(0,6,1)

plot((ix-.03),dnbinom(ix, size=(5+sumy1), mu=(5+sumy1)/(4+n1)),col="#527f94", lwd=5, type='h', xlab="# of Patience", ylab="Density") # see the clever trick for the x axis
points((ix+.03),dnbinom(ix, size=(5+sumy2), mu=(5+sumy2)/(4+n2)),col="#FF6071", lwd=5, type='h')
title(main="Negative Binom Predictive Distributions")
legend("topright", lwd=2, col=c('#527f94', '#FF6071'), c("Predictive_HospitalA","Predictive_HospitalB"))

The probability for more patients for the Hospital B is higher. But what is the probability on total? We can approach this with a simple monte carlo simulation. Random sample 1000 using the posterior predictive distribution and see the results.

sample_A <- rnbinom(n = 1000, size = (5+sumy1), mu = (5+sumy1)/(4+n1))
sample_B <- rnbinom(n = 1000, size = (5+sumy2), mu = (5+sumy2)/(4+n2))
mean(sample_B > sample_A)
# output >> 0.427

mean(sample_B == sample_A)
# output >> 0.266

mean(sample_B < sample_A)
# output >> 0.307

We can see the overlap more clearly here. So interpret these result as, in a future day, the probability of more patients coming to Hospital B is 0.427 and the probability of the same number coming to both is 0.266. Also, the probability of Hospital A having more patients in a day is 0.307.

--

--

Fatih Boyar

Wondering around probability, data science and philosophy. Aim is to be less wrong. Rationally Curious.