Information (or measurement) bias refers to a systematic error in the measurement or classification of participants in a study . It occurs when the accuracy of information collected about or from study participants is not equal between cases and controls (i.e. differences in accuracy of exposure data), or, between exposed and unexposed (i.e. differences in accuracy of outcome data). Lack of accuracy could mean that study subjects are assigned into the wrong category of exposure (exposed/unexposed) or outcome (case/control), or both. All attempts should be made to minimise or prevent information bias.
The term "misclassification" is frequently used to describe this bias. Cases and controls can be misclassified. Exposed and unexposed as well e.g. a heavy smoker who is categorised as a light smoker is misclassified. Misclassification results in an incorrect estimation of the association between exposure and outcome, the size and direction of this depending on the type of misclassification of exposure or outcome. The mechanism of misclassification can be differential or non-differential.
This occurs when one group of study participants is more likely to be misclassified than the other . Misclassification of exposure is differential if it differs according to a person's disease status (e.g. if cases are more or less likely to be classified as being exposed than controls (case-control study)). Misclassification of outcome (disease) is differential if it differs between exposed and unexposed (e.g. if a person's exposure status makes them more or less likely to be classified as having the disease (cohort study)). Differential biases can either increase or decrease the measured effect.
Non-differential (random) misclassification occurs when there is an equal likelihood of both groups (cases or controls, exposed or unexposed) being misclassified . With this type of misclassification, either exposure or outcome (or both) is misclassified , but the misclassification is independent of a person's status for the other variable. Misclassification of exposure is non-differential if it is similar among cases and controls i.e. the exposure (mis)classification is not related to the person's disease status. Misclassification of outcome (disease) is non-differential if it is equal between exposed and unexposed i.e. the outcome (mis)classification is not related to the person's exposure status.
The consequence of non-differential misclassification of a dichotomous exposure (e.g. exposed/unexposed) is - if there is an association - a weakening/ dilution of the measure of association (e.g. decrease the true value of an OR or RR), even to the point where a significant difference becomes insignificant . It produces an estimate of effect - if there is an effect - that is diluted or closer to the no-effect or null value than the actual effect i.e. a "bias towards the null". If there is no association to begin with, then random misclassification of the exposure will not bias the estimate of the measure of association, or create a bias that makes a factor seem significant for development of disease .
According to Rothman , if the exposure is not dichotomous, there may be bias away from or towards the null value; it depends on the categories to which individuals are misclassified. However, in general, random misclassification between two exposure categories will make the estimates of measures of association for those categories converge towards one another .
Misclassifications might be introduced by the observer (interviewer bias, biased follow-up), by the study participants (recall bias, prevarication), or by measurement tools such as questionnaires or instruments such as weighing scales or blood pressure cuffs.
Observer bias occurs when data gathering is influenced by knowledge of the exposure or outcome/disease status of the subject, or by the hypothesis under study .
Interviewer bias happens when interviewers ask questions differently about exposure to cases and controls in a case-control study, or, ask questions differently about outcome to exposed and unexposed in a cohort study. Knowledge of the patient's disease/outcome status may influence both the intensity and outcome of a search for exposure to the putative cause (Sackett described this as 'exposure suspicion bias' ).
Example: in an EU-wide foodborne outbreak of listeriosis, British investigators in a case-control study may probe listeriosis cases about consumption of a suspected food item (French non-pasteurised milk soft cheese) more than controls. This can lead to an overestimation of 'a', falsely increasing the odds ratio (OR).
Interviewer bias may also happen when different interviewing techniques (e.g. self-administered questionnaires (postal or email or web-based) or interviewer-administered questionnaires (by phone interview or face-to-face) or proxy) are used for cases and controls. Different approaches can be taken to prevent interviewer bias.
In this type of differential misclassification, unexposed people are less likely to be diagnosed for disease than exposed people.
Example: in a study looking at risk factors for mesothelioma, which can be difficult to diagnose histologically, a histopathologist may be more likely to report on a biopsy specimen as mesothelioma if a history of asbestos exposure is reported. The diagnosis of mesothelioma might be less likely to be reported among those without a history of asbestos exposure, leading to a differential misclassification of disease.
Recall bias is a systematic error that occurs due to differences in accuracy or completeness of recall of past events/ exposures (e.g. between cases and controls), that is not independent of outcome/disease (or exposure) status , e.g. a person may be more likely to recall an exposure to a potential risk factor if they become ill (become a case). It is a differential misclassification because the information on exposure is misclassified differentially for those with and without disease . It has also been described as response bias , and responder bias or reporting bias .
Example: in a case-control study to identify the vehicle of a foodborne outbreak of Salmonella, study participants are interviewed to obtain exposure information after (Salmonella) disease has already occurred. Cases may be more likely to remember exactly what they ate than controls, since they may already have suspected a particular food (e.g. eggs), and/or thought about the possible dishes that could be responsible. This would result in an increase in the measured OR for the suspected food item.
Example: in a case-control study of babies born with birth defects/ malformations, mothers who have given birth to a baby with a malformation may be more likely to recall accurately many exposures/ events during early pregnancy e.g. taking non-prescription drugs, experiencing trauma, having a febrile rash etc. The adverse pregnancy outcome serves as a stimulus for the mother to remember and consider potential exposures, a stimulus that mothers who give birth to normal babies don't have . This particular type of recall bias has been described as maternal recall bias .
Example: case-control studies on self-reported sun exposure as a risk factor for melanoma have been described as having the potential for recall bias as there is a lot of public awareness about the relationship of melanoma with ultraviolet radiation .
Note: as described by Rothman , this type of recall bias (a differential misclassification) is distinct from the general problem - which to some extent affects all people - of remembering and reporting exposures accurately, which tends to result in a non-differential misclassification. Different approaches can be taken to prevent recall bias and to reduce maternal recall bias.
This happens when some subjects deliberately lie when responding to the interviewer. According to how the subjects respond, this could increase or decrease the measure of effect.
Example: in a case-control study looking at risk factors for death among elderly people during a heatwave, interviewed relatives may deny all behaviour which would suggest isolation/ abandonment of their elderly relatives. As a result, 'isolation' as a risk factor for heatwave-related death may be under-reported by relatives of elderly people who have died. Underestimation of 'a' will result in an underestimation of the measure of effect, in this case the odds ratio (OR).
1. Bailey L, Vardulaki K, Langham J, Chandramohan D. Introduction to Epidemiology. Black N, Raine R, editors. London: Open University Press in collaboration with LSHTM; 2006.
2. Rothman KJ. Epidemiology - An Introduction. New York: Oxford University Press; 2002.
3. Giesecke J. Modern Infectious Disease Epidemiology. 2nd ed. London: Arnold; 2002.
4. Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.
5. Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32(1-2):51-63.
6. Parr CL, Hjartåker A, Laake P, Lund E, Veierød MB. Recall bias in melanoma risk factors and measurement error effects: a nested case-control study within the Norwegian Women and Cancer Study. Am J Epidemiol. 2009 Feb 1;169(3):257-66. Epub 2008 Nov 14
7. Gefeller O. Invited commentary: Recall bias in melanoma - much ado about almost nothing? Am J Epidemiol. 2009 Feb 1;169(3):267-70; discussion 271-2. Epub 2008 Nov 14