Field Epidemiology Manual Wiki

Selection bias and cohort studies

Last modified at 10/27/2014 11:21 AM by Vladimir Prikazsky

Selection bias may occur in cohort studies if the exposed and unexposed groups are not truly comparable [1], e.g. comparing an occupational cohort with the general population.

Selection biases in cohort studies include: healthy worker effect, diagnostic bias, non-response bias and loss to follow-up.

Healthy worker effect

The healthy worker effect (HWE) bias is an example of a selection bias that underestimates the mortality/ morbidity related to occupational exposures [2]. This bias reflects the healthier status of the workforce compared to the general population (which includes people who are too sick to work), so that a direct comparison of the workforce with the general population will be biased. It is a problem for those who study occupational cohorts.

The healthy worker effect phenomenon often leads, paradoxically, to lower mortality/ morbidity rates observed in subjects exposed to workplace toxins compared to the general population. Any excess risk associated with an occupation will tend to be underestimated by a comparison with the general population [1], leading to an underestimation of relative risk (RR) for occupational exposure and disease.

The following table illustrates the incidence rate of disease X in an exposed group of workers compared with the incidence rate in the general population (see the 'Total' row in the table).

Person-time (years) Cases Cases / 100 years
Exposed workers 50,000 500 1.0
General population Total 500,000 7,000 1.4
Workers 450,000 4,500 1.0
Non-workers 50,000 2,500 5.0

In this hypothetical example, the incidence rate observed among exposed workers is 1 case/100 years compared to 1.4 cases/ 100 years in the general population, suggesting that exposed workers have a lower rate of illness than the general population. The general population, however, is composed of two groups: people that are healthy enough to work (workers), and many people who cannot work because of ill-health (non-workers). The group that is too sick to work is included among the non-workers in the table, and results in non-workers having a higher incidence than the remainder of the general population that comprises current workers [2].

In the above example, we observe that the incidence rate among workers in the general population is the same as that of exposed workers at our study site. But, because the non-workers in the general population have a rate that is five times as great as workers, this results in the overall rate in the general population being greater than that of exposed workers.

As a consequence, any study comparing rates of disease X between exposed workers and the general population would give a biased estimate (with the exposed workers having a substantially lower rate of disease X than the general population), due to the 'healthy worker effect' selection bias.

Two components of HWE bias have been suggested [3]:

  1. healthy worker hire effect: the selection of healthier workers at hire, either due to self-selection (e.g. perceived health status) or employer selection (e.g. healthier subjects at lower risk of disease being employed preferentially)
  2. healthy worker survivor effect: once hired, less healthy workers are more likely than healthy co-workers to leave high-exposure jobs, either by ending employment or being transferred out. While this selection away from exposed jobs may reduce the impact of exposure in a given patient (protecting that person's health), it may lead to the false (biased) conclusion that the higher-exposure jobs are safe.

Factors that determine the size of the HWE bias [3][4] have been identified for mortality studies (some of which may also affect this bias in morbidity studies), and include:

  1. sociodemographic factors: gender, age at hire, ethnic group, community unemployment rate
  2. employment factors: occupational class, length of employment, time since hire/length of follow-up, time since termination
  3. outcome factors: cause of death

Efforts should be made to avoid bias from the HWE.

Diagnostic bias

Diagnostic bias can also occur in cohort studies if the diagnosis depends on the knowledge of the exposure status.

Example: in a cohort study of risk factors for mesothelioma, understanding that identification of mesothelioma is based on a difficult histological diagnosis, histopathologists may be more likely to diagnose a biopsy as mesothelioma if a history of asbestos exposure is reported.

Non-response bias

In a cohort study, non-response matters only if it is associated with both the exposure and the outcome/ disease (see also non-response bias in case-control studies). Efforts should be made to prevent non-response bias.

Example: the table below illustrates the results of a hypothetical cohort study where the following scenarios occur:

  1. all exposed and unexposed participate in the study (i.e. no non-response)
  2. non-response is associated with outcome (being a case)
  3. non-response is associated with exposure
  4. non-response is associated with both exposure and outcome (being an exposed case)
All respond
   Total          Cases       Non-cases Rate / 1000 Rate ratio (RR)
Exposed   10,000 100 9,900 10 10
Unexposed 10,000 10 9,990 1 reference
Non-response among cases (only 10% respond)
  Total         Cases      Non-cases Rate / 1000 Rate ratio (RR)
Exposed  9,910 10 9,900 1 10
Unexposed 9,991 1 9,990 0.1 reference
Non-response among exposed (only 10% respond)
  Total         Cases      Non-cases Rate / 1000 Rate ratio (RR)
Exposed  1,000 10 990 10 10
Unexposed 10,000 10 9,990 1 reference
Non-response among exposed cases (only 10% respond)
  Total        Cases     Non-cases Rate / 1000 Rate ratio (RR)
Exposed 9,910 10 9,900 1 1
Unexposed 10,000 10 9,990 1 reference

Loss to follow-up

This bias reflects differences in completeness of follow-up between comparison (exposure) groups i.e. exposed and unexposed. It is a problem for cohort studies as the length of time a cohort needs to be followed up can make if difficult to follow all subjects until the end of the study e.g. due to people moving, losing contact etc. If subjects are lost randomly (in both exposure groups), this does not create loss to follow-up bias [5] (we will just have a smaller sample size/ study population on which to base our RR calculation, and wider confidence intervals [5]).

Loss to follow-up bias occurs if the loss of follow-up is associated with both exposure and outcome e.g. associated with exposed cases. It behaves similarly to non-response bias in cohort studies. Differences in loss to follow-up between exposure groups can lead to bias as the people who are lost to follow-up may be more (or less) likely to have developed the outcome of interest [1].

Example: in a cohort study looking at smoking as a risk factor for development of lung cancer, loss to follow-up bias occurs if smokers who have lung cancer are more likely to be lost to follow-up (e.g. if they are more likely to die from lung cancer) than non-smokers with lung cancer.

Loss to follow-up among exposed cases (50% smokers with lung cancer lost to follow-up)
Total Cases Non-cases Rate / 1000 Rate ratio (RR)
Exposed (smokers) 955 45 910 47 4.7
Unexposed (non-smokers) 1,000 10 1,000 10 reference



1. Bailey L, Vardulaki K, Langham J, Chandramohan D. Introduction to Epidemiology. Black N, Raine R, editors. London: Open University Press in collaboration with LSHTM; 2006.

2. Rothman KJ. Epidemiology - An Introduction. New York: Oxford University Press; 2002.

3. Le Moual N, Kauffmann F, Eisen EA, Kennedy SM. The healthy worker effect in asthma: work may cause asthma, but asthma may also influence work. Am J Respir Crit Care Med. 2008 Jan 1; 177(1):4-10. Epub 2007 Sep 13.

4. Baillargeon J. Characteristics of the healthy worker effect. Occup Med. 2001 Apr-Jun;16(2):359-66.

5. Giesecke J. Modern Infectious Disease Epidemiology. 2nd ed. London: Arnold; 2002.