Case cohort studies


 In case cohort studies we aim to achieve the same goal as in cohort studies but more efficiently using a sample of the denominators of the exposed and unexposed cohorts. [1] Properly conducted case cohort studies provide information that should mirror what could have been learned from a cohort study.

We will call "source population" the population which gives rise to cases. The source population includes exposed and unexposed cohorts and in that source population we could have conducted a cohort study comparing risk or rates of disease between exposed and unexposed cohorts.

If instead we decide to do a case cohort study, we will include the same cases, and classify them as exposed or unexposed. But, instead of getting exposure information from all individuals constituting the denominators of exposed and unexposed cohorts, we will only use a sample of them. The purpose of this sample is to estimate the relative size of exposed and unexposed components of the source population (the proportion of exposed in the source population at the beginning of the cohort).

To do so we select a random sample from the entire source population. If that sample is unbiased (sampling done independently from exposure status) we expect (disregarding sampling variation) the distribution of exposed and unexposed persons in the sample to reflect the exposure distribution in the source population at the beginning of the cohort. This is an important aspect of case cohort studies. The sample should be representative of the population giving rise to cases (the source population) regarding exposure.

One way to imagine case cohort studies is therefore to think of them as nested within cohorts of exposed and unexposed people. Any case cohort study could be thought off as nested from the source population. The sample group (control group) is a sample of the denominator present at the beginning of the cohort.

From a cohort study measuring risk of disease in exposed and unexposed cohorts we can draw the following results table:

Table 1

Exposure cases Population at risk IP Risk ratio
Yes  Ne                                a/Ne a/Ne / c/Nu
No Nu c/Nu

 If instead of studying the entire denominators of exposed and unexposed we were sampling them (let's say 10%) we would have the following table:

Table 2

Exposure Cases Sample from source population
Yes a Ne/10
No b Nu/10


Obviously from the above table the risk of disease cannot be computed since denominators sampled from exposed and unexposed cohorts are only a sampling fraction of these two populations. However, if risk can no longer be computed for exposed and unexposed, the risk ratio remains the same. If in the risk ratio calculation we replace the denominators by the 10% samples representing them, we obtain the same value for the risk ratio.


When the sample is randomly selected from the source population the risk ratio computed using the sample equals the risk ratio computed within the entire cohorts.

Since we are randomly selecting controls from the source population as it was at the beginning of the study (before disease occurrence), it may happen that persons who will later become a case will be selected as controls. Therefore some persons may appear both in the case and control groups. This should not come as a surprise. In a cohort study cases are counted in the numerator and denominators of exposed and unexposed. The same applies to case cohort studies since we use a sample of exposed and unexposed people of the source population.  We are not concerned by the disease status of the control group but by its exposure status. The aim of the control group is to properly reflect the exposure in the source population and this source population originally includes people who will later become cases. Excluding future cases would lead to overestimating the risk ratio, this particularly when disease occurrence is high.

When to conduct a case cohort study?

Case cohort studies are not very popular. Their concept in not well understood to the point that some journals would reject a case cohort study on the reason that the control group includes cases. Case cohort studies are a very suitable design when disease incidence is high. They provide a direct estimate of the risk ratio. They are not suited when exposure changes over time (if  exposure is measured at the beginning of a follow up period and differs from the overall exposure experience during the entire study period).


[1] Rothman KJ Epidemiology. An introduction. Oxford University Press, New York, 2002.