3.1 Bias
Imagine repeatedly sampling and computing the estimate \(\hat{\theta}\) of the parameter \(\theta\) for each sample. In this thought experiment, \(\hat{\theta}\) is a random variable. We say that \(\hat{\theta}\) is biased if \(E(\hat{\theta}) \neq \theta\). We say that \(\hat{\theta}\) is unbiased if \(E(\hat{\theta}) = \theta\). We say that the bias of \(\hat{\theta}\) is \(E(\hat{\theta}) - \theta\).
For example, we can compute the bias of our ML estimator of \(\pi\) in the toothpaste cap problem.
\[ \begin{aligned} E\left[ \frac{k}{N}\right] &= \frac{1}{N} E(k) = \frac{1}{N} E \overbrace{ \left( \sum_{n = 1}^N x_n \right) }^{\text{recall } k = \sum_{n = 1}^N x_n } = \frac{1}{N} \sum_{n = 1}^N E(x_n) = \frac{1}{N} \sum_{n = 1}^N \pi = \frac{1}{N}N\pi \\ &= \pi \end{aligned} \]
Thus, \(\hat{\pi}^{ML}\) is an unbiased estimator of \(\pi\) in the toothpaste cap problem.
We need to be cautious about evaluating the frequentist properties of Bayesian estimates. Bayesian’s approach inference by describing prior beliefs and then updating those beliefs. This is a logical process. So long at the math is correct, we don’t really need to evaluate whether the posterior is “good” or “bad” or “better” or “worse”–it’s just the posterior.
However, setting philosophical orientations aside, nothing prevents us from evaluating the frequentist properties of a posterior mean.
\[ \begin{aligned} E\left[ \frac{\alpha^* + k}{\alpha^* + \beta^* + N}\right] &= \frac{1}{\alpha^* + \beta^* + N} E(k + \alpha^*) = \frac{1}{\alpha^* + \beta^* + N} \left[ E(k) + \alpha^* \right] \\ & = \frac{1}{\alpha^* + \beta^* + N} \left[ \sum_{n = 1}^N E(x_n) + \alpha^* \right] \\ & = \frac{1}{\alpha^* + \beta^* + N} \left[ \sum_{n = 1}^N \pi + \alpha^* \right] \\ & = \frac{N\pi + \alpha^*}{\alpha^* + \beta^* + N} \end{aligned} \] As \(N \xrightarrow{} \infty\), the expected value approaches \(\pi\), but the posterior mean is biased in finite samples. Remember that both prior parameters must be positive.
We get a nice, intuitive result, though. Remember that if \(\alpha = \beta\), then the beta distribution is symmetric about one-half. As the parameters grow larger, it becomes less variable around zero. You can see that for \(\alpha = \beta > 1\), the posterior mean becomes more biased toward \(\frac{1}{2}\) as \(\alpha\) and \(\beta\) grow larger.
3.1.1 Example: Sample Average
Similarly, we can show that, for a simple random sample, the sample average is an unbiased estimate of the population average.
\[ \begin{aligned} E\left[ \frac{\sum_{n = 1}^N x_n}{N}\right] &= \frac{1}{N} E\left[ \sum_{n = 1}^N x_n \right] = \frac{1}{N} \sum_{n = 1}^N E(x_n) \\ &= \frac{1}{N} \sum_{n = 1}^N \text{(pop. avg.)} = \frac{1}{N}N \text{(pop. avg.)} \\ & = \text{pop. avg.} \end{aligned} \]
3.1.2 Example: Poisson Distribution
Using math almost identical to the toothpaste cap problem, we can show that the ML estimator \(\hat{\lambda} = \text{avg}(x)\) is an unbiased estimator of \(\lambda\).
We can also illustrate the unbiasedness with a computer simulation.
<- 4.0 # the parameter we're trying to estimate
lambda <- 10 # the sample size we're using in each "study"
sample_size
<- 10000 # the number of times we repeat the "study"
n_repeated_samples <- numeric(n_repeated_samples) # a container
lambda_hat for (i in 1:n_repeated_samples) {
<- rpois(sample_size, lambda = lambda)
x <- mean(x)
lambda_hat[i]
}
# long-run average
mean(lambda_hat)
## [1] 4.00389