3.2 Consistency

Imagine take a sample of size N and compute the estimate \(\hat{\theta}_N\) of the parameter \(\theta\). We say that \(\hat{\theta}\) is a consistent estimator of \(\theta\) if \(\hat{\theta}\) converges in probability to \(\theta\).

Intuitively, this means the following:

For a large enough sample, the estimator returns the exact right answer.
For a large enough sample, the estimate \(\hat{theta}\) does not vary any more, but collapses onto a single point and that point is \(\theta\).

Under weak, but somewhat technical, assumptions that usually hold, ML estimators are consistent. Under even weaker assumptions, MM estimators are consistent.

Given that we always have finite samples, why is consistency valuable? In short, it’s not valuable, directly. However, consistent estimators tend to be decent with small samples.

But it does not follow that consistent estimators work well in small samples. Consider the estimator \(\hat{\pi}^{Bayes}\). By appropriate (i.e., large) values for \(\alpha^*\) and \(\beta^*\), we can make the \(E(\hat{\pi}^{Bayes})\) whatever we like in the \((0, 1)\) interval. But \(E(\hat{\pi}^{Bayes})\) is consistent regardless of the values we choose for \(\alpha^*\) and \(\beta^*\). Even though posterior mean is consistent, it can be highly biased for finite samples.

However, as a rough guideline, consistent estimators work well for small samples. However, whether they actually work well in any particular situation needs a more careful investigation.