Adding a p-value to the description of our study results is useful since it suggests how high or low the probability is that the difference that we observed between the groups was due to chance. However epidemiological application needs more than a decision as to whether chance alone could have produced association (Rothman et al, 2008). Therefore we are also interested in an estimation of an effect measure (e.g. RR, OR) rather than significance testing.
In the fictitious example of a botulism outbreak among guests of a restaurant, we have the null hypothesis H0 that "there is no difference in occurrence of botulism among guests that ate home preserved green olives and guests that did not eat olives". When we consider the following result of the outbreak investigation:
"The relative risk (RR) of botulism among diners who ate home preserved green olives was 3.6 (p=0.016)."
Are we now confident in our results? Well, confident enough to reject H0, since we find it highly improbable (1.6%) that our finding was caused by chance.
OK, yet critics may say: "we may agree that there is probably a difference, but how confident are you that there is truly a difference of a magnitude 3.6? This is only an estimate ! In reality it may be lower.
If we want to answer to this question, we can calculate the confidence interval around or estimated effect.
The range of values, on basis of the sample data, in which the population value (or: true value) may lie is called 'Confidence Interval' (CI).
Example: a 95% CI includes the true value with a certainty of 95%.
Another way of expressing is that if the investigation (data collection and analysis) could be repeated many times, then 95% of these analyses should include the true value of the effect measure within the confidence intervals.
Imagine that we discover in our outbreak investigation example, that the home preserved green olives were part of a very large batch, that was distributed among 100 very similar restaurants within the same geographical area. This would offer us an opportunity of repeating the outbreak investigation that we did in the restaurant X in theory 99 times more. At the end of these many investigations, we would end up with 100 separate outbreak investigations, that could be considered as 100 repeated 'experiments of nature' performed in the a population.
Intuitively we assume that it is highly unlikely that in each of these 100 investigations, the Relative Risk will be exactly 3.6. However we will be quite confident (in fact 95% certain) that the true relative risk will be included in 95% of the confidence intervals that we have calculated.
Please note that again it is a common convention to chose 95% as a confidence interval, but this might as well have been 90%, 99% or even 93.37%: it is a matter of choice.
Each confidence interval has a lower limit and an upper limit, which are in fact derived from the point estimate plus or minus a 'deviation'. This deviation can be symmetrical or asymmetrical and depends on the variability of the data, the sample size and the level of confidence chosen. The deviation comprises the Standard Error.
In principle, high data variability will lead to large confidence intervals. The larger the sample size, the smaller the confidence interval. And finally, the higher the confidence level chosen, the larger the interval will be.
If the Null Hypothesis is included within the CI, then we should consider it non significant. For example, in case of a relative risk, the null hypothesis is that RR=1.0 (and the same goes for Odds Ratio).
Most analytical software will give the confidence intervals automatically, together with the point estimates and p-value.
In the example of the outbreak of botulism among the 135 guests at Restaurant X, the results may be given as follows:
"The risk developing botulism was higher among diners who ate home preserved green olives (RR=3.6, 95% CI: 1.17 to 11.07)."
This can be interpreted that the true relative risk for olive eating and botulism is with 95% probability between 1.17 and 11.07.
A confidence interval represents the range of effects that are compatible with the data. The CI provides