# 10 common errors in secondary analyses of surveillance data

**1. ****Lack of focus on one specific disease or health problem**

*Description of the error *

The report lacks focus on a specific disease or health problems and reviews superficially many diseases under surveillance.

*Rationale to change*

Surveillance data analysis is a careful, systematic exercise that requires focus to generate information useful for decision-making.

**2. ****Failure to report the methods used **

*Description of the error *

The report does not mention what analysis methods were used to analyze the surveillance data.

*Rationale to change*

The count, divide and compare approach cannot be considered obvious or intuitive. The author needs to write explicitly all the steps taken in the data analysis. A description of the methods is all the more needed if sophisticated analysis techniques are used.

**3. ****Failure to calculate population-based incidence**

*Description of the error *

The report presents absolute number of cases in the absence of calculation of rates.

*Rationale to change*

The "Count, divide and compare" approach is key to surveillance data analysis. Skipping the "divide" step prevents any sort of comparison. A number of cases over time will not reflect the growing population. A map of the number of cases by geographical area does not adjust for population densities. A distribution of cases by age and sex does not reflect the population structure.

**4. ****Failure to use maps to display geographical observation **

*Description of the error *

Distribution of cases by geographical area is presented in table or graphic format.

*Rationale to change*

Map is the primary tool to use to reflect the spatial distribution of cases. It is the only way to present in two dimensions the way that cases occupy the space.

**5. ****Failure to use graphs to present time series**

*Description of the error *

Tables of numbers are used to present incidence over time

*Rationale to change*

Time series are best presented using line graphs to present the rates over time

**6. ****Display of raw data / insufficient data reduction**

*Description of the error *

The reports display data insufficiently analyzed in the form of large, complex tables from which no trend can be seen.

*Rationale to change*

Surveillance data analysis is about data reduction so that raw data can be processed into information that can be used for decision-making. This systematic, careful and scientific process must generate outputs in the form of graphs (e.g., time series), tables (e.g., incidence by age and sex) and figures (e.g., maps) that display the message in a clear, summarized, explicit and scientifically honest manner.

**7. ****Misuse of statistical tests **

*Description of the error *

Statistical tests are used excessively and inappropriately, including for testing hypotheses on the data that generated them.

*Rationale to change*

Surveillance data analysis is mainly done to generate hypotheses. Their use to test hypotheses but be careful. A test can be used to determine whether a specific distribution may have occurred by chance or not: However, if statistical testing is at all needed, the author must always be aware of the following quick checklist:

- Is it the right test?

- Is the test calculated correctly?

- Is the interpretation of the results of the test appropriate?

**8. ****Analysis by more than one criteria at a time**

*Description of the error *

The analysis breaks down the data immediately by more than one criterion at a time (e.g., by time and space or by person and time).

*Rationale to change*

Data analysis goes as in peeling an onion and is done one step at a time. Initially, when looking at the data for one of the three criteria (time, place and person), the two others must be kept constant. When looking at the incidence over time, use all population subgroups and the whole geographical area. When looking at the incidence by area, use an average of the whole study period (or the last year) and all population subgroups. When looking at the incidence by population sub-groups, use an average yearly incidence or the last year and include the whole geographical area. It is only when the data has been examined systematically through these steps that more advanced analysis can be made to understand the patterns that emerge (e.g., if the incidence goes up, an analysis by population group over time or an analysis by areas over time can point to where the increase in the number of cases comes from.).

**9. ****Over-interpretation of surveillance data**

*Description of the error *

The analysis is over-interpreted with final conclusions not supported by the data.

*Rationale to change*

In most cases, surveillance data are analyzed to generate hypotheses. Thus, in most cases, they cannot be used to test hypotheses. Never should they be used to test the hypotheses that they generated. That would be the worse error possible. For example: There is a peak of disease in the summer. Hence, the hypothesis is generated that the disease is more common in the summer. The same data is then used in a chi-square to compare rates in the summer with rates during other seasons.

**10. ****Poor recommendations**

*Description of the error *

The recommendations are absent or not based upon the data presented.

*Rationale to change*

Recommendations need to be focused, based upon the results presented, specific, feasible, ethical, and practical. In field epidemiology, it is best to always try to propose recommendations in the form of (a) additional investigations and / or (b) public health action.