A set of training materials for professionals working in intervention epidemiology, public health microbiology and infection control and hospital hygiene.
Need help with your investigation or report writing? Ask the Expert. Free advice from the professional community.
You can't make decissions on this page's approval status because you have not the owner or an admin on this page's Group.
The first step in analyzing surveillance data is to assess its quality by detecting data entry errors, inconsistent data and incomplete reporting. This is achieved by computing the frequency distributions of the variables in the data set. A review of these frequency distributions allows detecting and correcting data entry errors and missing fields.
It is not uncommon to notice a round digit attraction on numeric fields such as age (ages ending in 0 and 5 being more represented than expected) or dates (day 01, 15, 10 and 20 being overrepresented compared to other days of the month). Such a lack of precision on the data cannot be corrected at time of analysis, but needs to be taken into consideration when interpreting data plotted by age or date.
When several date fields are part of the data set, such as date of onset, admission, confirmation or notification, calculation of delays between these sequential steps may highlight data entry errors (e.g. large delays due to an error on the year) or inconsistencies (e.g. negative delays due to confirmation occurring before onset).
Distribution frequencies by diseases and age or sex may contribute to detecting additional errors (e.g. neonatal tetanus among adults).
Not all errors can be corrected at the time of analysis. However, it is crucial to get a good understanding of the quality of the data and its limitation prior to analyze and interpret results.
To design an effective surveillance system, it is necessary to define for each disease, which are the surveillance indicators best suited to trigger signals and which value of the indicator (threshold) is considered abnormal or unusual.
Indicators can be expressed as absolute numbers (usually appropriate for rare diseases with immediate notification), as proportions of notifications for a disease (proportional morbidity in the absence of denominators) or as incidence rates (weekly notification of the number of cases using population as denominator, in case of common disease).
Indicators have to be defined in terms of time and place (e.g. number of cases/week/district).
Thresholds are values of indicators above which the disease pattern is considered as abnormal or unusual and may require a public health intervention. For most epidemic-prone diseases under immediate notification, the threshold is set to 1 as the occurrence of a single case is considered as requiring a public health intervention (e.g. AFP, rabies, plague...). For more common diseases, thresholds can be set on the rate observed over a given time period (e.g. meningitis in Africa), or based on an increase in comparison with baseline data (e.g. influenza-like illness). Methods for setting thresholds are presented in chapter Methods for setting thresholds in time series analysis.
At this stage, it is also important to define indicators to monitor better the surveillance process (e.g. timeliness, completeness).
Join the discussion about this article in the forum!
Arnold Bosman posted on 9/28/2011 6:04:11 AM:
Very rich page with many relevant tips and tricks for checking data quality.
It would be great if there were examples included (graphs or tables) to illustrate how to really spot the error. For example a graph with real data showing digit attraction would be helpful, and then a short description of why this was a problem and how it was solved.
I will try to look for some and invite other readers to do so too
You need to be logged in to post comments.
You can log in here. You can register here if you haven't done so yet.