The information provided by surveillance systems is typically in the form of descriptive statistics on the frequency and distribution of cases, and temporal trends in these. The information on the distribution of cases may include categorisation by geography, demographic characteristics, occupational and other risk factors. Surveillance systems, particularly for infectious disease and other environmental threats to health, may collect, analyse and disseminate information on hazards (e.g. sources of environmental contamination), exposures (e.g. occupational exposures to blood borne viruses), control or prevention measures (e.g. vaccinations given), as well as information on cases of disease or injury. The objective of providing such information is to enable the recipients to :
In order to inform action that will result in improved control or prevention, surveillance must provide information that is not only timely and accurate, but also that is interpreted and presented in a format and through channels that are appropriate to those who have responsibility for taking action. The range of target audiences is large, but can be categorised broadly as follows:
Of these groups, the public are often overlooked as a potential target audience for surveillance outputs, and yet many public health actions that have the greatest potential for improved control and prevention, such as improved food hygiene, safer sexual practice, and uptake of vaccinations require significant public action.
The analysis of surveillance data can range from producing simple tabulations of descriptive statistics by time, place and person, to sophisticated time trend analyses and analyses within geographical information systems.
Although many significant outbreaks are first detected and reported by clinical staff or members of the public, before they are identifiable through surveillance systems (largely because of the delays inherent in surveillance systems), surveillance remains an important mechanism for detecting outbreaks, particularly of uncommon infections and diffuse outbreaks (i.e. outbreaks occurring over wide geographical areas, with relatively small numbers in any one locality).
The need to be able to detect emerging epidemics or outbreaks at an early stage in their evolution is an important element of communicable disease control. As a result a number of analytical techniques have been developed that can be applied to surveillance data to detect possible outbreaks or to assess the statistical significance of an apparent increase in reports. The development of typing schemes, such as serotyping, phage-typing and newer molecular techniques, means that we can now undertake surveillance of several distinct subtypes of an organism where previously it was only possible to detect and monitor the organism at the species level. For many organisms this means that surveillance is now focused on smaller numbers of many different subtypes, which are generally indistinguishable clinically. This provides new opportunities for surveillance, through analysis of the data on different subtypes, to detect outbreaks that might not be detected through the more traditional route of alerting by clinicians. This has been the case for salmonella surveillance for several years now, where the ability to undertake surveillance of many different serotypes and phage-types of salmonella has greatly increased the ability to detect outbreaks that might otherwise have not been detected until considerably later or not at all (see box).
Between March and June 2006, the Health Protection Agency (HPA) Centre for Infections (CfI) received 56 Salmonella Montevideo isolates from cases of infection in England and Wales. During the same time period in 2005, the CfI received 14 isolates of Salmonella Montevideo. Of the cases identified in March - June 2006, 49 were primary cases, of which 37 shared the pulsed field gel electrophoresis (PFGE) profile SmvdX07. Cases were distributed widely across the country.
The HPA CfI attempted to contact all cases and detailed food histories were obtained from 15 cases, all of which were confirmed to have the SmvdX07 profile. Thirteen (87%) of the cases interviewed reported eating products from one particular manufacturer. The clustering in time of this particular subtype indicated that the cases were part of an outbreak. Two S. Montevideo strains isolated from samples taken immediately before the onset of illness amongst the first human cases from factories of the manufacturer whose products had been eaten by 13 of the 15 interviewed cases were also confirmed as PFGE profile SmvdX07. No other common brands, retail outlets, catering chains or single food types were identified as common factors
The frequency of cases of S Montevideo PFGE SmvdX07 decreased following the voluntary recall of a number of chocolate products, produced by the implicated manufacturer. These were considered as potentially contaminated with S Montevideo PFGE SmvdX07 after a risk assessment of the results of microbiological sampling and environmental investigations at a number of factory premises.
After carefully considering all the available evidence the Outbreak Control Team concluded that consumption of products made by the manufacturer was the most credible explanation for the outbreak of S Montevideo.
Simple graphs can be used to show trends over time, and to compare those trends between different geographic, demographic or exposure groups. The calculation of rates, based on appropriate denominators, and the graphing of these can similarly show how risks have changed over time and between different groups. Statistical techniques that have been applied to surveillance data, for the purpose of detecting outbreaks or assessing the significance of observed changes in frequency, include the Cusum technique , particularly for rare events, the scan statistic , and more complex modelling approaches .
Interpretation of such analyses may need to take into account issues such as the seasonality of many communicable diseases and the periodicity, which may stretch over several years, shown by several diseases that predominantly occur in childhood. Discontinuities in long term time trends may be the result of interventions, such as the introduction of a new vaccine, but may also arise as the result of changes to factors unrelated to the true incidence of disease, such as the introduction of new diagnostic tests, changes in clinical practice that result in increased case ascertainment (e.g. the introduction of a new screening programme), or changes in coding systems (e.g. changes to the ICD system have resulted in significant discontinuities in trends in deaths attributed to some causes). Reporting delay can be an important factor in some surveillance systems, where there can be significant delays between onset or detection of disease and the date of reporting to the surveillance system. This can be adjusted for if the delay varies little over time, but those interpreting the data must be aware of such delays, since the data could otherwise be incorrectly interpreted as showing a fall in case numbers.
Analysis by person can include tabulation, graph display, or statistical comparisons of counts or rates by age, sex, ethnicity or other risk or exposure factor. This type of analysis can provide pointers to the aetiology or risk factors for acquisition of disease, and is increasingly used to demonstrate and monitor inequalities in morbidity between different population groups. Interpretation of apparent differences between populations or population sub-groups must take into account reporting biases. One of the most common reporting biases seen in surveillance systems is in relation to age, where particular age groups are relatively over or under reported. The very young and the very old are often better represented in surveillance data collected from laboratories, since these age groups are more frequently subject to laboratory investigation for some common forms of infectious disease, such as respiratory or gastrointestinal infection. In the case of laboratory reporting of rubella infection, in contrast, it is women of child-bearing age who are often over-represented compared to other age and sex groups, because they are more likely to be investigated and reported. In countries with a significant mix of private and public healthcare services, particular population groups may preferentially attend one type of service compared to another, which would give rise to potential bias if surveillance was based on data from one type of service only, or if reporting was consistently better from one type of service. Misclassification and data errors can also have an impact on comparisons between different population groups.
Geographical information systems are increasingly being used to analyse surveillance data. These systems can be used either to increase the visual impact of display of geographical variations (i.e. to produce maps that show how cases are distributed geographically), or can be used for spatial analysis of surveillance data, testing for geographical clustering.
Common problems in the geographical analysis of surveillance data are missing information on the geographic location of cases, and the geocoding of data to the source of the report rather than the likely source of acquisition of infection. For example, an analysis of data from an outbreak of salmonella infection in England and Wales in 2000 shows a considerable difference in the geographic distribution of cases and of the laboratories that submitted reports on their salmonella infections (figure 1).
Figure 1. Geo-spatial analysis: Salmonella Typhimurium DT104 Outbreak, England & Wales, August 2000
Surveillance can only achieve its purpose of providing information for action if the information reaches those who have the responsibility for taking action. Although significant thought and investment is often put into data collection and analysis, when developing surveillance systems, the equally important process of dissemination of the resulting information can sometimes be given less attention.
The production of regular and timely surveillance outputs, and their dissemination in an appropriate format with relevant interpretation, requires significant investment. Development of outputs should be undertaken through close consultation with the target audience for the output, to ensure that they are fit for purpose. Some users of surveillance outputs will only require high level summaries that focus on key messages about overall changes in frequency in distribution, while others may require detailed line listings of cases in order to inform their own operational activities. Some users may wish to be able to manipulate surveillance data in their own systems (e.g. in their local geographical information systems), where they can undertake linkage or ecological analyses against other data that they hold. It is only through regular consultation with the relevant stakeholders that surveillance systems managers can ensure that their outputs continue to meet with recipients' requirements.
Advances in information technology, particularly browser-based web technologies provides the opportunity of making surveillance outputs available, or even pushing them through email or technologies such as RSS, to a large audience as soon as the outputs are ready. This is clearly of benefit in terms of speed and cost of delivery, but such benefits will only be realised if the outputs are relevant and easily understood by the intended audience - if not, they are likely to be overlooked in the face of increasing information overload.
1. Teutsch SM, Churchill RE. Principles and practice of public health surveillance. 2nd ed. Oxford, New York: Oxford University Press, 2000.
2. Gallus G, Mandelli C, Marchi M, Radaelli G. On surveillance methods for congenital malformations. Statistics in Medicine 1986; 5: 567.
3. Wallenstein S. A test for detection of clustering over time. Am J Epidem 1980; 3: 367
4. Farrington CP, Beale AD, Andrews NJ, Catchpole MA. A statistical algorithm for the early detection of outbreaks of infectious disease. J R Statist Soc 1996; 159: 547-63