A set of training materials for professionals working in intervention epidemiology, public health microbiology and infection control and hospital hygiene.
You can't make decissions on this page's approval status because you have not the owner or an admin on this page's Group.
Draft provided by: James Stuart, Alain Moren
comparisons is fundamental to epidemiological investigations and studies. We need to compare risk or rates
of illness in exposed and unexposed group, or odds of exposure in cases and
controls. Without making comparisons with a
reference group, we cannot say from data analysis that an association with a
given outcome is anything other than spurious. Such a reference group is
designated as the control group in case control studies and the unexposed group
in cohort studies (see Chapter X). For the field epidemiologist, difficulties more often arise in choosing
controls for case control studies than in choosing an unexposed group in cohort
studies. This lecture will focus mainly on the former.
It is helpful first to be clear about who the cases are, in other words,
to start with a case definition. The case definition then helps to define
the population from which the cases arise. This population is also the population
from which controls should be drawn.
The most important principle to follow is that controls should be representative of the population from which cases arise, the source
population. Case can be defined in any way that the investigator decides, but
this definition is key to determining the source population of cases, and hence
the source population of controls.
Controls should then have the following
be representative of the exposure distribution in the source population
have an equal chance of being identified as cases if they had the disease
have the same exclusion and
restriction criteria as cases
definition: resident of London
aged under 10 years with faecal isolate of E. coli O157 during June 2006.
travel abroad in the week before onset of illness.
(i) The source
population for cases is residents of London
in June 2006 aged under 10 years. Controls should be representative of this
source population with regards to the exposure of interest.
(ii) Since E.
coli is a severe infection of children, we would expect all children in London to have a
similarly high chance of being detected as cases if they had this infection.
However there may be variations in proportion of cases diagnosed by
geographical area through variation in factors such as health seeking
behaviour, primary care sampling, diagnostic facilities. This may introduce a
selection bias when we come to choose controls as it will be difficult to identify
this same source population. This bias will not matter unless the proportion
exposed differs between cases identified for our study and those cases who
(iii) In this
definition cases have been excluded if they travelled abroad in the week before
onset of illness. An equivalent suitable exclusion period for controls might be
travel abroad in the week before interview. However, if cases mostly arise
during school term, and if controls are interviewed in the summer holidays,
some controls may be excluded unnecessarily. Another option might be to exclude
those who travelled abroad in June. Or, if individually matched on potential
time of exposure, travel exclusion could be restricted to the dates of the week
before onset if illness of the matched case.
definition: resident of London
aged under 10 years during June 2006.
travel abroad in the week before interview.
Let us now
return to the important decision about selecting as controls a sample that is
representative of the source population.
from a population register/list/directory or stratified by some characteristic
such as age/sex/general practice, known as matching (Chapter Y)
b. Neighbourhood controls
a. Population controls. As the aim is to obtain a
random sample of the population that gives rise to cases, it is preferable to
seek controls from a population register. A random sample of this population
should be achievable if the register
has a high level of completeness
contains the cases (it should be possible to check that all the cases are
identified in the register)
can identify the parameters for the control definition (in this example,
city residency and age)
is accessible to the investigator, then
If a register
is not available or is not suitable, other methods of population sampling can
be considered. A commonly used method is random digit dialling. This involves
phoning random numbers (cold calling), a system that has the advantage of speed
and convenience but has important limitations. The source population is limited
to those who have a phone and to those who are available to answer. It may be difficult to be sure
that the relevant geographical area is covered, or alternatively one may find
that such a large area is covered by the phone listings that it is difficult to
find controls from the (smaller) source area. This is more of a problem if
phone numbers are used that do not have an area code e.g. mobile phone
numbers. Co-operation from those
receiving such calls may be low.
b. Neighbourhood controls. This involves selecting controls
from the same neighbourhood as the cases i.e. they are matched for
neighbourhood. One advantage is that there is no need for a population
register. Also, controls are likely to be similar to cases in respect of
socio-economic factors. This may be helpful if we wish to control for such
complex factors and if we cannot measure them sufficiently. We cover this in
more detail during the lecture on matching (Chapter Y) .
are that low
co-operation (selection bias), may be time consuming and expensive (low
efficiency), and that if we wish to measure the risk associated with
socio-economic factors, we may not be able to do so. In case control study of a disease that has a
socio-economic gradient, e.g. invasive meningococcal disease, picking
neighbourhood controls may not show any association between illness and level
of income. People living in the same neighbourhood control are likely to have
the same or similar socio-economic characteristics.
Friend controls are another
way of selecting matched controls. Where
speed of investigation is of the essence, eg. in a suspected outbreak of E.coli
O157, friends offer a rapid and
convenient means of finding controls. Similarity of socio- economic
characteristics and social behaviours have the same advantages and
disadvantages as neighbourhood controls. In investigations of outbreaks of food
borne infection, our aim is to identify a common source. Although friends may
be more likely to share similar food habits as their corresponding case leading
to an underestimate of the strength of association, the relative risk estimates
can still be very high (Killalea). More of a problem may be a reluctance on the
part of the case to give the names of friends to be interviewed (Boccia).
d. Family controls are
rarely used in field epidemiology as exposures in family controls are often so
similar to those of the cases that the association of interest may not be shown
Hospital controls are useful if the cases have all been
admitted to hospital or are on a specific disease register. Controls are easily
identified and available at low cost from the same dataset that contains the
cases e.g. hospital episode statistics, cancer register. Disadvantages may be that there are different catchment populations
for different diseases so that the controls are not representative of the
source population for the cases. More particularly the same causative factors
can be responsible for the disease under study and other diseases that result
in hospital admission. This will reduce the chances of showing a true association
with the causative factor (bring the OR towards 1). In the study of any disease
caused by smoking, selection of hospital controls would have a high chance of
selecting people who were admitted with other conditions caused by smoking.
(a) Controls in different types of
case control studies: case cohort, traditional case control, density case
Lets us come
back to the one of the characteristics of the control population, that they
should be representative of exposures in
the source population. In selecting
controls for a case cohort study, a random sample of the source population
should, if done correctly, be representative of the exposure distribution in
the population that gives rise to the cases. In a traditional case control
study, where cases are excluded from the control selection, a bias has been
introduced as the exposure distribution in potential controls is no longer
representative of the source population. If the attack rate is low, this bias
will also be low, but if attack rate is high, the potential for bias will also
be high (Chapter X). In a density case control study where cases occur over a
long time period, controls should be selected from the source population still
free of disease at the time the case occurs. In this way they should be
representative of the person time experience of the source population
Does failure to
identify those with mild or asymptomatic infection as cases introduce bias?
This situation is analogous to non- response among cases. If the exposures
among symptomatic and asymptomatic cases are the same, then no bias is
introduced. There is only a reduction in power of the study. There is no
difference in control selection as controls should be representative of the
In a hypothetical
case control study with 40 cases and 40 controls, and 50% exposure among
cases, Odds Ratio = 600/ 200 = 3.0
If we only detect
20 cases with the same number of controls , the Odds Ratio is unchanged
(300/100 = 3.0) as long as % exposure is the same in detected and undetected
(c) Immune subjects
If some of the
population is immune at the start of the study, then they are not eligible to
be cases. They should then also be excluded as controls as they are not part of
the source population. In practice we do not usually know who is immune. Again this may not matter if % exposed is the
same in immune and non-immune cases. However it may be that subjects are immune because they have
already been cases in the past and that they have a similar level of exposure
to the risk factor that caused the cases in the outbreak under study . This
introduces bias that reduces the OR towards 1 and may result in a failure to
detect a true association, especially if the proportion immune is high. For
example, the inclusion of immune subjects in the control group is thought to
explain the results of some case control studies that fail to show an
association between contaminated drinking water and cryptosporidiosis (Hunter).
(d) Power and sample size in
case control studies
A question often
arises about the number of controls given a limited number of cases.
Statistical programmes like Epi-Info can be used to estimate the sample size
required to detect a specified odds ratio. It is unusual to select more than 3 or 4 controls
per case as little statistical advantage is gained beyond this number (Kirkwood and Sterne,
Figure). Alternatively we could show
that power increases and plateaus with an increasing number of controls per
case. The graph would then have the same shape but inverted.
now review the control definition for the investigation of the E.coli
O157 outbreak, a decision is taken to select population controls from the same
general practice as the case. These
controls will have
some geographical and social similarities to the cases, but are likely to
provide a representative sample of the population giving rise to the cases.
controls per case will be selected at random from the same primary care
register as the case.
For cohort studies, the field epidemiologist is likely to be involved in
retrospective studies. In other words the investigation takes place after both
exposure and disease have occurred. The commonest situation is an outbreak of
food poisoning after a clearly defined event such as a party or wedding. Following the same principles as for the case
control study, it is first essential to define the source population. This population then forms the cohort,
usually defined as those who attended the function in question. Individuals within the cohort are then
classified into exposed or unexposed, for example, according to whether they ate or did not eat
specified items of food or drink. The unexposed constitute the reference group
for each item.
arise about whether the unexposed should include those who did not eat any
food. As for case control studies, this depends on your definition of the
source population. Is the cohort defined as everyone who attended or everyone
who attended AND who ate something? As
the number who did not eat anything will probably be small, it may be sensible to include
them. If we should discover a substantial proportion of cases among those who
attended but did not eat any food, food may not be the source of the
What happens if
everyone ate the food in question i.e. there is no unexposed group? Luckily for
the epidemiologist, our investigations involve human behaviour which usually
offers a rich variety of exposures. In a food borne outbreak where everyone ate
the delectable tiramisu, we then rely on trying to measure different levels of
exposure (different amount of Tiramisu consumed). The reference group then
becomes those with the lowest level of exposure.
Define the source
population. It is helpful to imagine what could have been the cohort study we
could have done instead. The total of exposed and unexposed represents the
Aim for a sample that is
representative of the source population
Review advantages and
disadvantages of available options, taking account of urgency and available
Controls selected from
population list preferable, but not always feasible
No control group is
Make a decision and do
1. Rothmann KJ. Epidemiology: an
University Press 2002.
Hennekens CH, Epidemiology in Medicine. Lippincott-Williams and Wilkins 1987.
MB. Field epidemiology. Oxford
University Press 1996.
Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in
case control studies I-III. Am J Epidemiol 1992; 135: 1019-50.
5. Kirkwood, B. R., &
Sterne, J.A.C. Essential Medical Statistics (2nd Ed). Blackwell
6. Killalea D, Ward LR, Roberts D, de Louvois J,
Sufi F et al. International epidemiological
and microbiological study of outbreak of Salmonella agona infection
from a ready to eat savoury snack - I: England
and Wales and the United States. BMJ 1996; 313:1105-7.
Join the discussion about this article in the forum!
sdesai posted on 9/21/2010 10:45:08 AM:
This chapter is well-written, covers the subject area comprehensively and is easy to read. I have made a few minor editorial changes to the text and below are a few comments and suggestions on specific sections of the chapter.
1. Should biases be mentioned here with a link to the chapter on biases? It is an important consideration when selecting controls and maybe a sentence or two could then really highlight the role of controls.
1. In the text you have differentiated controls by random sampling and by matching and I think it would also be clearer if you made this separation at the beginning. Your options could be:
1. Unmatched controls/Randomly selected
a. Population etc
2. Matched controls
a. Neighbourhood etc
This way the text has the same chronological order as the above list.
2. I think it would be a shame not to include control selection in case-case, and case-cross over designs as these are used even if not as commonly as classical case control studies. Their inclusion would complete the picture of control selection.
3. Would it be useful to provide links/references to articles for each type of control selection? For case-case you could use
a. Aiken et al Risk of Salmonella infection with exposure to reptiles in England, 2004-2007. Euro Surveill. 2010; 15(22).
b. McCarthy and Giesecke. Case-case comparisons to study causation of common infectious diseases. Int J Epidemiol 1999; 28:764-8.
For case-crossover you could use
a. Soverow et al. Infectious disease in a warming world: how weather influenced West Nile virus in the United States (2001-2005). Environ Health Perspect. 2009; 117:1049-52.
There is an article by Grimes that might be nice to reference (Grimes DA and Schulz KF. Compared to what? Finding controls for case-control studies. Lancet. 2005;365:1429-33).
“Special considerations in control selection”
1. I think it would be useful to have links to other sections of the manual embedded into the text for “case cohort, traditional case control, density case control”.
“Developing a control definition”
1. I feel it would be more appropriate if this section came straight after the summary page as for me it is more logical to define controls and then determine how to select them.
You need to be logged in to post comments.
You can log in here. You can register here if you haven't done so yet.