Case cohort studies

In case-cohort studies, we aim to achieve the same goal as in cohort studies, but more efficiently, using a sample of the denominators of the exposed and unexposed cohorts [1]. Properly conducted case-cohort studies provide information that should mirror what could have been learned from a cohort study.

We will call "source population" the population which gives rise to cases. The source population includes exposed and unexposed cohorts and in that source population we could have conducted a cohort study comparing risk or rates of disease between exposed and unexposed cohorts.

If, instead, we decide to do a case-cohort study, we will include the same cases and classify them as exposed or unexposed. In other words, we start by choosing the cases, which is a case-control study characteristic. Instead of getting exposure information from all individuals constituting the denominators of exposed and unexposed cohorts, which would have been a cohort study characteristic, we only use a sample of them. The purpose of this sample is to estimate the relative size of exposed and unexposed components of the source population (the proportion of exposed in the source population at the beginning of the cohort).

To do so, we select a random sample from the entire source population. If that sample is unbiased (sampling done independently from exposure status) we expect (disregarding sampling variation) the distribution of exposed and unexposed persons in the sample to reflect the exposure distribution in the source population at the beginning of the cohort. This is an important aspect of case-cohort studies. The sample should be representative of the population giving rise to cases (the source population) regarding exposure.

One way to imagine case-cohort studies is therefore to think of them as nested within cohorts of exposed and unexposed people. Any case cohort study could be thought off as nested from the source population. The sample group (control group) is a sample of the denominator present at the beginning of the cohort.

From a cohort study measuring risk of disease in exposed and unexposed cohorts we can draw the following results table:

Table 1

Exposure cases Population at risk IP Risk ratio
Yes  Ne                                a/Ne a/Ne / c/Nu
No Nu c/Nu

 If, instead of studying the entire denominators of exposed and unexposed, we were sampling them (let's say 10%) we would have the following table:

Table 2

Exposure Cases Sample from source population
Yes a Ne/10
No b Nu/10

Obviously, the risk of disease cannot be computed from the above table, since denominators sampled from exposed and unexposed cohorts are only a sampling fraction of these two populations. However, if risk can no longer be computed for exposed and unexposed, the risk ratio remains the same. If in the risk ratio calculation we replace the denominators by the 10% samples representing them, we obtain the same value for the risk ratio.


When the sample is randomly selected from the source population the risk ratio computed using the sample equals the risk ratio computed within the entire cohorts.

Since we are randomly selecting controls from the source population as it was at the beginning of the study (before disease occurrence), it may happen that persons who will later become a case will be selected as controls. Therefore some persons may appear both in the case and control groups. This should not come as a surprise. In a cohort study cases are counted in the numerator and denominators of exposed and unexposed. The same applies to case cohort studies since we use a sample of exposed and unexposed people of the source population.  We are not concerned by the disease status of the control group but by its exposure status. The aim of the control group is to properly reflect the exposure in the source population and this source population originally includes people who will later become cases. Excluding future cases would lead to overestimating the risk ratio, this particularly when disease occurrence is high.

When to conduct a case cohort study?

Case-cohort studies are not very popular. Their concept in not well understood to the point that some journals would reject a case cohort study on the reason that the control group includes cases. Case cohort studies are a very suitable design when disease incidence is high. They provide a direct estimate of the risk ratio. They are not suited when exposure changes over time (if  exposure is measured at the beginning of a follow up period and differs from the overall exposure experience during the entire study period).

NB. Case-cohort studies are a type of case-control studies, where controls are simply representative of the source population in terms of exposure (as controls should always be). In literature, you may find "case-cohort studies" quoted as "case-control studies" and "traditional case-control studies" quoted as "case-non-case studies", since, in the latter, controls are actually non cases. 


1. Rothman KJ Epidemiology. An introduction. Oxford University Press, New York, 2002.

2. Le Polain de Waroux O, Maguire H, Moren A. The case-cohort design in outbreak investigations. Euro Surveill. 2012;17(25):pii=20202. Available online: