So far, we have been concerned with parameters of probability distributions. We never questioned the fit of the distribution itself against empirical data. For instance, we might assume that a population is normally distributed, and we may estimate and test its expected value and variance. However, normality should not be taken for granted, just like any other claim about the underlying distribution. Sometimes, specific knowledge suggests strong reasons that justify the assumption; otherwise, this should be tested in some way. When we test whether experimental data fit a given probability distribution, we are not really testing a hypothesis about a parameter or two; in fact, we are running a nonparametric test. The chi-square test is one example of such a test.
The idea is fairly intuitive and basically relies on the idea of a relative frequency histogram, although the technicalities do require some care. The first step is to divide the range of possible observed values in J disjoint intervals, corresponding to bins of a frequency histogram. Given a probability distribution, we can compute the probability pj, j = 1,…, J, that a random variable distributed according to that distribution falls in each bin. If we have n observations, the number of observations that should fall in interval j, if the assumed distribution is indeed the true one, should be Ej = npj. This number should be compared against the number Oj of observations that actually fall in interval j; a large discrepancy would suggest that the hypothesis about the underlying distribution should be rejected. As does any statistical test, the chi-square test relies on a distributional property of a statistic. It can be shown that for a large number of samples, the statistic
has (approximately) a chi-square distribution. We should reject the hypothesis if χ2 is too large, i.e., if , where
- is a quantile of the chi-square distribution
- α is the significance level of the test
- m is the number of degrees of freedom
What we are missing here is m, which depends on the number of parameters of the distribution that we have estimated using the data. If no parameter has been estimated, i.e., if we assumed a specific parameterized distribution prior to observing data, the degrees of freedom are J − 1; if we estimated p parameters, we should use J − p − 1.
The idea of the test, as we stressed, is pretty intuitive. However, it relies on approximated distributional results that may be critical. Another tricky point is that the result of the test may depend on the number and placement of bins. Rules of thumb have been proposed and are typically embedded in statistical software. Nevertheless, we should mention that there are other general strategies to test goodness of fit, like the Kolmogorov–Smirnov test, as well as ad hoc testing procedures for specific distributions.
Leave a Reply