All epidemiological studies, even randomised clinical trials, are susceptible to bias (systematic error). The objective of the epidemiologist will be to minimise these biases. This can be done by considering, at the different stages of development and execution of a study, where and how bias may occur: the design stage (protocol writing), subject selection (case/control, exposed/unexposed, intervention/control group etc), data collection, data analysis and interpretation of results.

At the design stage, bias should be considered at the time of protocol writing. A lot of care should be given, at this stage of development of the study, to forecasting all potential selection and information biases that may be encountered. Despite all precautions taken, some biases will persist. They then need to be taken into account in the interpretation of the results of the study. 

When writing the report or manuscript, sources of potential bias in the study absolutely need to be openly discussed. Particularly, the first part of the discussion section of a scientific paper should include a detailed paragraph in which authors discuss all potential biases which could have falsely led to the study results. If possible, the direction of the bias (overestimation or underestimation) and the magnitude of the bias also should be discussed.  

While case-control and cohort studies are both susceptible to bias, the case-control study is affected by more sources of bias. Through our study design, we can try to minimise selection bias and prevent information bias in cohort and case-control studies.

How to minimise selection bias

In epidemiological studies, all efforts should be made to avoid biasing the selection of study participants. Selection bias can be reduced by paying attention to the following:

  1. The study population should be clearly identified i.e. clear definition of study population.
  2. The choice of the right comparison/ reference group (unexposed or controls) is crucial
    • for example, in an occupational cohort study, rather than comparing workers with the general population (which includes people who are too ill to work), ensure all subjects in the comparison are workers, and avoid bias from the Healthy Worker Effect (HWE) [1]. Compare workers in a specific job with those in jobs that differ in occupational exposures or hazards e.g.
      • select an external comparison group from another workforce e.g. in a situation where all workers of an occupational cohort had some degree of exposure [2]
      • select an internal comparison group within the same workforce e.g. if some workers had exposure while others did not [2].
    • exposed and unexposed groups should be identical but for the exposure
    • in a retrospective cohort study, the selection of exposed and unexposed groups should be done without knowing the outcome (disease status).
    • the control group should reflect the exposure of the population which gave rise to the cases
    • controls should be selected independently of the exposure status
      • for example, non-response bias happens when participation into a study is related to the exposure status
    • precise case definition and exposure definition should be used by all investigators.
  3. In an intervention study, select participants through randomisation, so that they have an equal chance of receiving the intervention.
    • this allocation to intervention and control groups should rely on a mechanism that is not within the control of the study participant or the investigator, termed 'allocation concealment' [2], thus avoiding a situation where the investigator might be more inclined to allocate sicker patients to the intervention/ treatment arm of the study, and less ill patients to the control arm
    • whether the randomisation has been successful or not can be checked by comparing baseline factors between the intervention and control groups afterwards, and seeing if the groups are similar in all other respects apart from receiving the intervention [2].

Preventing non-response bias

Non-response bias can be prevented by achieving high response rates (≥80% by convention) [3]High response rates may be facilitated by:

  • offering incentives to participate in the study  e.g. entry into a raffle for a prize
  • making it easy to contribute e.g. by using questionnaires that are not too long and don't take too much time to complete (see the chapter on Questionnaire Design for further hints on creating a well-designed questionnaire)
  • setting aside protected time for the study e.g. in a school-based questionnaire study, asking teachers to allow pupils to complete the questionnaire during a class period rather than giving them the questionnaire to take home with them
  • sending reminders e.g. a first reminder by post at 1 week and a second reminder at 2 weeks after the initial questionnaire.

Information on characteristics of the non-responders should be obtained if possible e.g. by getting a subset of non-participants to complete a non-response questionnaire (NRQ), or by getting some demographic information on non-respondents, if this is possible (so that they can be compared with respondents). This can give important insights into the extent of selection bias. However, it should be noted that obtaining this information on non-respondents is time-consuming and not always successful.

  • For example, in a case-control study by Vrijheid et al of mobile phone use and development of brain tumour [4], selection bias factors were estimated based on the prevalence of mobile phone use reported by non-participants from NRQ data. In this particular example, non-participation in the study seemed to relate to less prevalent use of mobile phones, and the investigators estimated that this could result in an underestimation of the odds ratio for 'regular mobile phone use' by about 10% [4].

Preventing information bias

Information (measurement) biases can be easier to prevent and measure than selection biases [3].

They can be prevented by:

  1. Using standard measurement instruments e.g. questionnaires, automated measuring devices (for measurement of blood pressure etc)
  2. Collecting information similarly from the groups that are compared
    • cases/ controls, exposed/ unexposed
    • several sources of information can be used to validate each other, but all sources should be used for each subject
  3. Use multiple sources of information
    • questionnaires (e.g. postal/ online/ face-to-face via interview)
      • should favour closed, precise questions and avoid open-ended questions
      • test the same hypothesis using different questions
      • field-testing / piloting of questionnaire in order to improve and refine it
      • standardise interviewers' techniques through training (with the questionnaire) to ask questions the same way
    • direct measurements
    • registeries (e.g. cancer registeries etc)
    • case records (e.g. from GPs, hospital notes etc)

Preventing interviewer/ observer bias

  • 'Blinding' of investigator / interviewer to the study participant's outcome/ exposure status
    • in case-control studies those who are determining the exposure status of a study participant should be unaware of whether the participant is a case or a control  
    • collecting information about exposure prior to definitive diagnosis / knowledge of outcome
      • e.g. in a nested case-control study, information on exposures is likely to have been collected at baseline, before cases were diagnosed, rather than data on exposure and outcome being recorded at the same time, thus reducing observer (and recall) bias [2]
    • in cohort studies, data on outcomes should be collected without knowledge of exposure status of a participant i.e. 'blinding' of interviewer to exposure status
  • 'Blinding' of study participant (more difficult) by not revealing the exact research question in a study [2]
  • 'Blinding' the interviewers to the study hypothesis
  • Establishing explicit, objective criteria for exposures and outcomes [3]
  • Using standard questionnaires, with good questionnaire design; the questionnaire should be valid and reliable [2]
  • Using a small number of interviewers to prevent too much variation between observers [2]
  • Training interviewers to ask questions the same way

Preventing recall bias

Approaches taken to prevent recall bias include:

  • improving timeliness of information gathering, so that the interval between the event/ illness of interest and the study (the recall period) is as short as possible, thus reducing non-differential recall bias; data on exposures should be collected as near as possible to the time of exposure
  • framing questions to aid accurate recall [1], so that inaccurate recall is limited among controls as well as among cases, thus reducing differential recall bias
  • taking a different control group that will not be subject to the same incomplete recall i.e. using as controls individuals with a disease considered to have a similar impact on recall to the one being studied
    • e.g. case-other disease approach: to reduce maternal recall bias in a case-control study, select as controls mothers of babies born with birth defects other than the one under study who may have recall of early pregnancy exposures similar to case mothers [1][5]; however, McCarthy suggests this approach should be treated cautiously, as knowledge by mothers of different hypotheses regarding causes for different birth defects would mean that recall could still be differentially biased, and exposures relevant to the birth defect of the control group mean that these cases don't represent the real exposure experience of the population under study [5]
    • e.g. case-case study design: in analysing an outbreak of a salmonella strain, we could use exposure data from a recent outbreak of another salmonella strain as a 'control', instead of looking for controls in the present outbreak [5]; the notified cases from a previous outbreak are more representative of the background population of the diagnosed salmonella cases in our outbreak, namely the subpopulation of people who would present to a doctor when they have gastroenteritis and have a specimen taken - they may have a different quality of recall than individuals who don't do this despite having similar symptoms [5].
  • using information from medical records/ other independent sources recorded before the diagnosis/ disease outcome was known rather than information from questionnaires collected after the outcome [1], i.e. use objective records rather than relying on recall; see [6] for an example of a nested case-control study where symptoms were recorded at the time they were reported rather than being recalled retrospectively.



1. Rothman KJ. Epidemiology - An Introduction. New York: Oxford University Press; 2002

2. Bailey L, Vardulaki K, Langham J, Chandramohan D. Introduction to Epidemiology. Black N, Raine R, editors. London: Open University Press in collaboration with LSHTM; 2006.  

3. Sackett DL. Bias in analytic research. J Chronic Dis. 1979; 32(1-2):51-63

4. Vrijheid M, Richardson L, Armstrong BK, Auvinen A, Berg G, Carroll M, et al. Quantifying the impact of selection bias caused by nonparticipation in a case-control study of mobile phone use. Ann Epidemiol. 2009 Jan;19(1):33-41.

5. McCarthy N, Giesecke J. Case-case comparisons to study causation of common infectious diseases. Int J Epidemiol. 1999 Aug;28(4):764-8.

6. Black C, Kaye J, Jick H. Relation of childhood gastrointestinal disorders to autism: nested case-control study using data from the UK General Practice Research Database. BMJ. 2002 Aug 24;325(7361):419-21.