The general idea of statistical inference, is to find out a certain "truth" about a population, by investigating a sample, rather than the entire population. The investigation can be descriptive (for example to find out the true occurrence of a disease) or analytical (for example to test the hypothesis that people who have eaten home preserved green olives are more at risk of developing botulism that those who did not eat those olives).

Statistical Inference is the process of drawing conclusions about the entire population, based on the investigation of a sample. So it is a form of generalisation.

This process differs from causal inference, which is explained elsewhere.

Significance tests

In order to make the conclusions objective, statistical tests are usually applied, with the aim to reach a decision ('yes' or 'no') on a difference (or 'effect'), on a probabilistic basis, on observed data. Such statistical tests are also called significance tests, which all have in common that they require a Null Hypothesis (H0): "There is no difference (no effect) between the groups that we compare".

A Null Hypothesis (H0) will always have a complementary Alternative Hypothesis (H1): "There is a difference between the groups that we compare" (in other words: the Null Hypothesis is not true).

The aim of a significance test is to help us decide to reject the Null Hypothesis or not.

In our example, we could write the Null Hypothesis like this:

"There is no difference in occurrence of botulism in the population between the people that have eaten home preserved green olives (=exposed) and those that did not (=unexposed)".

Such hypothesis makes it easier to design a study to test this: we need to take a representative sample of the people that were exposed and a representative sample of those who were unexposed. In both samples we measure the occurrence of botulism, and we compare the results.

The next challenge is: how different do the results need to be to make us decide to reject the H0?

This is the point where the p-value will help our decision. This value will tell us what is the probability (p) to find the difference that we have observed (between our samples) if the Null Hypothesis H0 is true. The lower this p-value, the lower the probability that chance alone can explain the difference between the results in our samples when there really is no difference in the total population.

This requires that we investigate and quantify the probability to be different from the expected.


Making a decision on H0.

If we have convinced ourselves that the occurrence of botulism is significantly different between the exposed (who ate olives) and the non-exposed, then we can decide to reject the Null Hypothesis.

Now in taking a decision on H0, we can make two possible errors:

  • The null hypothesis is true but rejected: Type I error (α-error)
  • The alternative hypothesis is true, but the null hypothesis is not rejected: Type II error (β-error)

Please note that statistical tests only allow us to decide to reject H0 or not to reject. This is different from deciding to accept H0, or accept H1.

Problems in applying significance tests in observational studies

In these examples we have applied significance tests to an observational study: an outbreak has occurred within a population at risk (guests in a restaurant) and retrospectively we tests hypotheses on data observed from events that took place before we formulated the hypotheses.

One of the criticisms often given regarding the interpretation of such epidemiological studies is that no random assignment of subjects to groups (exposed, non-exposed) took place. The aim of randomisation is to get an equal distribution of other risk factors which have not been measured (or even discovered). The gold standard for such studies is the randomised controlled trial, preferably where the investigators and subjects are blinded to the assignment in exposed and unexposed.

In such designs where everything except the exposure of interest is randomised, the significance tests produces a p-value that truly reflects the probability that chance produced the differences in results between study groups.

In observational studies, we have to be aware that we observe 'experiments of nature' (such as outbreaks) where the assignment of people to exposed and non-exposed is rarely a fully random process. For this reason, many critics say that the p-value in such circumstances should be considered to have a descriptive nature and caution should be exercised in case of statistical inference.

Part of this problem is related to concepts of bias and confounding.