A set of training materials for professionals working in intervention epidemiology, public health microbiology and infection control and hospital hygiene.
You can't make decissions on this page's approval status because you have not the owner or an admin on this page's Group.
In order to use information for decision making epidemiologists
need first to organise collected data in a standard format allowing summarising
many observations. Tools available include tables, graphics or diagrams. They
facilitate description and interpretation of distributions, trends, and
relationships in the data. This data organisation also serves the purpose of
communicating results to various audiences. This is a rigorous task and, even
if there are no fixed rules, some guiding principle can be defined.
A table is a data set organised in rows and columns. The
simplest table includes 2 columns. The first column lists the categories in
which data are grouped. The second column shows the number of events or
individuals falling in each category. A third column may show the percentage of
the total that each category represents.
Number of cases of disease X by age groups, among residents of
Age groups Number
0-9 years 422
10- 19 years 783
20-29 years 565
30-39 years 904
40-49 years 237
50-59 years 676
60-69 years 898
70-79 years 239
80 and more years 120
The above table shows case count according to only one
variable (age groups). Data could be segregated across a second or several
other variables. This can be illustrated as follows.
Number of cases of disease X by age groups, sex and
X/Y characteristic, among residents of sample-city, 2012.
Age groups Gender X Y Total
10- 19 years
80 and more years
illustration of foot notes.
Two by two tables
Cohort studies and case control studies are classical
methods used by epidemiologists to identify association between an exposure and
a disease. The crude results of such studies are frequently presented as
contingency or 2 by 2 tables. They can be illustrated as follows.
Table III: Cases of disease X according to consumption
of food X, among customers of restaurant Y, 29 February 2012.
Case control study
Table IV: Cases of disease X and controls according to
consumption of food X, among customers of restaurant Y, 29 February 2012.
Although epidemiologists cannot analyse data before
they are collected they usually prepare their analysis by designing dummy
tables (empty shells) which will later figure the results. This is an important
part of any plan of analysis. It allows making sure that responses to be
obtained will fit with the study design, the hypothesis tested and the way questions
Dummy tables for food specific
Table V: Cases of gastroenteritis according to
consumption of specific food items and beverages , among customers of
restaurant X, date.
Did not eat
95 % CL*
A graphic is a way to visualize quantitative data
using a system of coordinates. It helps us to see magnitude, trends,
differences and similarities in the data. It is a key aspect in scientific
communication whatever the audience.
In epidemiology we use rectangular coordinates. They
include a vertical and a horizontal line with specific units of measurements
and which intersect at a right angle. These are the x (horizontal) and y
(vertical) axis. The scale used for x is arithmetic. Scales used for y can be arithmetic
or logarithmic. We usually express y according to values of x. X (also called
independent variable) usually represents classes of x or time. Y (the dependent
variable) represents counts, proportions or rates.
Arithmetic line graphs
An arithmetic line graph shows distribution of an
event (y) according to x (frequently time in epidemiology). Several events,
several series of data, can be shown in the same line graph. The scale used in
the x axis depends of the interval (time) used to collect data. The y unit of
measurement depends on the magnitude of the highest value for y.
Example of a line graph showing number of tetanus
cases reported in France
from 1945 to 2003.
InVS, Saint Maurice, France
Tips to select scales on axis:
The following graphic is incorrect.
Arithmetic line graphs can shows several categories of the same
characteristic (age groups) on the same graph. The following example show
reported incidence rates of gonorrhoea in Sweden by sex.
It is however difficult to show many series of data on
the same line graph. The following example is a further break down of the above
data in 6 age groups. Interpretation is becoming more difficult.
Semilogarithmic-scale line graphs
If we use a logarithmic scale on the y axis and if the
x axis remains the same (arithmetic scale), we create a semi-logarithmic scale
line graph. With a logarithmic scale on the y axis we represent the relative
change of y over time rather than its absolute change over time. Semi-logarithmic
scale line graphs are used to present and interpret rates of change over time
rather than magnitude of change. They also allow showing very different
magnitudes and ranges of rates between two lines (e.g. high incidence and low mortality
rates for the same disease).
Semi-logarithmic scale paper
The following characteristics are noteworthy:
Source: Isituto Superiore di Sanita, Rome
The following example shows occurrence of cases and
deaths of Measles in England
from 1940 to 2002. On an arithmetic scale line graph it is impossible to see
the trend in death rates over time. The different magnitude of rates between
incidence and mortality does not allow showing both on the same graph. The
solution is to use a logarithmic scale for the y axis. Doing so, we allow very different
rates to be shown.
CDSC, HPA, Colindale, UK.
The following example
The decision to use arithmetic or
semi-logarithmic-scale line graphs depends on what we want to show, absolute
magnitudes or rates of change over time.
A histogram shows the frequency distribution of a
continuous variable. Adjoining columns are used to represent the number of
observations in each class interval of the distribution. The surface of each
column is proportional to the number of observations in the column. There
should be no scale break in the x axis otherwise the graph would not represent
100% of the data and surface units would no longer be proportional to the number
In intervention epidemiology histograms are frequently
used to present occurrence (distribution) of onsets of illness according to
time. This is frequently called an epidemic curve even if it is not a curve.
Several principles apply:
The following histogram shows cases of tetanus
reported after the Tsunami in Banda
Source: Prof. Leegross, WHO
We may show a second or several additional variables
on a histogram by shading the different components of a bar. However two many
components in a bar are difficult to interpret. In this case it is better to do
one histogram for each component.
Source: Prof Leegross, WHO
Histograms with unequal class interval can also be
constructed. They are more difficult to do and to interpret. Whatever the
interval the unit of surface used should always be proportional to the amount
of information (number of cases).
A frequency polygon shows a frequency distribution. It
is constructed from a histogram. The frequency polygon is a polygon joining the
mid points of the top of the bars of a histogram. The first point is on the x
axis (y = 0) and is placed in the middle of the interval which precedes the
first bar of the histrogram. The last point is located on the x axis in the
middle of the interval immediately following the last bar of the histogram. The
important issue is that, by joining the mid point of each bar and the x axis at
each end, the surface under the frequency polygon is exactly the same as the
surface of the histogram. Therefore the principle of the histogram is
respected. The same surface represents the same amount of data (cases).
Frequency polygons represent an easy way to show
several histograms on the same graphic.
Bar graphs are methods to display information using
only one coordinate. They are mainly used to compare data between discrete
The simplest bar chart displays data from a table with
one variable. Each bar represents one category. Bar graphs can be organised
horizontally or vertically. Vertical bars differ from histograms since they are
separated by a space. The height of the bar is proportional to the number of events
(e.g. cases) in the category. But the surface is not always proportional to the
width of the category on the x axis (e.g. different width of age groups). If there
is a logical order between categories it should be respected. Otherwise
categories can be organised with the decreasing or increasing values of
respective bars. Variables in a bar graph are discrete (sex, region, race,
etc.) or continuous but organised in categories (e.g. age groups). The x axis
does not need to be continuous. The following bar graph shows distribution of
number of EPIET fellows by country
Grouped bar graphs
Sometimes several sub categories can be shown and placed
close to each other in a larger category (grouped bar graphs). The following
group bar graph represents the distribution of cases of Ebola haemorrhagic
fever in Bumba zone, Zaire
in 1976. The graphic shows two variables, age groups and sex.
The following example illustrates the difficulty to
interpret too many bars on the same grouped bar graph.
Stacked bar graphs
An alternative to the grouped bar graphs is the
stacked bar graph. On this type of graph a bar is sub-divide in components. The
height of each component is proportional to the part it takes in the bar. The
following stacked bar graph shows the distributin of cases of Salmonella Typhimurium
infection by age and sex in Norway.
In each age group category the bar is divided into two sub-categories, males
Source: National institute of health, Norway
Component bar graphs
Component bar graphs also called proportional
component bar graphs are different from stacked bar graphs in the sense that
they represent proportions rather than absolute values. Each bar has the same
height and represents 100% of the data in that bar. Component of the bar are
expressed as the percentage of the total bar they represent. The following 100%
component bar graph shows the same data as above. Proportions are visible but
the absolute magnitude of the distribution is no longer visible.
A pie graph is a graph in which the size of the
"slices" is proportional to the amount of data (e.g. number of cases) it
represents. Pie graphs are used to show the component of a larger group.
However small differences between slices are more difficult to see than
differences between bars on a bar graph. This is why the proportion (%) of the
total that each slice represents is frequently added on the slice. Different
shading or colours can also be used to identify the various slices. In addition
slices can be ordered by decreasing or increasing magnitude. This can also be
supported by a regular darkening gradient of a colour.
notifiable diseases reported during the World football cup, 4 June - 10 July
Maps are using geographic coordinates to locate events
(cases by place of onset, residence, etc.). Field epidemiologists frequently
use sport maps and are maps to illustrate occurrence of disease by place.
In a spot map, a dot or any other symbol on the map is
located at the exact place the event occurred. This can be very precise when
using geographical positing system to locate events. A spot map is useful to
locate an event but since a spot map does not take into account the size of a
population it does not allow showing the risk of occurrence of the event by
place. Even when many dots are located in an area it does not tell us if it
simply reflect population density or risk of occurrence or the event.
Area maps use shaded or coloured area to show counts,
risks, rates of an event by place. Shading or colour patterns are organised in
a logical order. Usually the darker the area, the higher the count, risk or
rate. If risks or rates are used, they are computed for each area taking into
account the numerator (number of event in the area durin a specific period of
time) and the denominator (persons leaving in the are or person-time
experienced by the area population). The range of the risk distribution by
areas is divided into exclusive categories and shaded / coloured accordingly.
The following area map illustrates rates of
tuberculosis cases in France
in 1996. Rates are expressed in number of new cases per 100 000 population
per year. Different shading coloured patterns re used to describe categories of
magnitude of incidence rates in each the 100 French health discricts.
We need to
present and comment on the following type of map.
Diagrams and pictograms
Diagrams are popular tools among epidemiologists. They
are frequently used to illustrate the transmission and spread patterns during
an epidemic. The following diagram illustrates some aspect of the transmission
during the SARS outbreak in Singapore
When do we
introduce this type of graphic?
Use of computers
Graphs are now exclusively computer generated. Task
are becoming veryu to carry out. On the other hand epidemiologists become more
and more dependant upon the possibilities and the limits of computer graphic
Whatever the software used some principles should
probably be respected. We should âvoid three dimensional graphs. They do not
improve communication, they are not easier to interpret. Colours should be
selected according to complementary colour criteria (see chapter on visual
aids). Many software do not allow to produce an histogram. What is frequently
called an histogram is in fact a bar graph. Units and scale on the x and y axis
are not always clear in some software packages. Most software are not allowing
to do a perfect epidemic curve (one square = one case). Particularly when the x
axis illustrates time, most software are not flexible enough to comply with
what the epidemiologist hopes doing.
Tables, graphics and diagrams are useful and effective
tools to summarise and communicate findings from epidemiological studies. There
are a few guiding principle that need to be repeated here. Two small and simple
tables are better than a large complicated table. What is difficult to show in
a table may become simpler and clearer on a graph. All may one day be used out
of context. They therefore need to have titles, labels, legend, foot notes and
sources precisely. These tools help us to interpret and communicate. Whatever
attractive computer technology is, one should always have a clear purpose in
mind when chosing a specific type of table, graphic or diagram.
Join the discussion about this article in the forum!
Webster posted on 8/8/2010 9:24:14 PM:
Hey Alain and Ágnes,
Congratulations to a clear and concise delineation on how to present data.
I have only one general issue, which I like to bring to your attention in short. The web-based format offers some advantages over the classical book form, and I am unsure whether so far the medium has been suitably exploited/explored. Namely, the "introduction" quite rightly states that the way data are most appropriately displayed depends on the scale of the variable of interest. Well, imagine a new intrepid field epidemiologist (NIFE) is eager to display his freshly collected data and has appropriately started by determining the scale of his/her variables. What now?(S)he knows that the variable is measured on, say, a nominal scale, but is unsure as to how best summarise the information of the variable. (S)he needs to read the entire chapter to find out which of all the different possibilities applies to a particular scale. An alternative would be to have a page where to every variable scale (type) one would find the tables, graphs, etc. that are an appropriate display.For example:Nominal variable: Tables: Frequency table,. Bars: simple bar chart (depending on how many groups) And so on...One could also display it in a table with two columns (gridlines invisible). On the left are the scales of the variables (nominal, etc.), and on the right the different presentation formats, ideally blockwise (according to tables, graphs, etc.) If you then click of the scale of a variable, the Apropriate Presentation Formats" (APF's) are highlighted in bold, or arrows would point from the scale to the different APF's, or otherwise. By clicking on the single APFs, one would jump to the appropriate text.Even without such a (new) "decision tree", for each display-tool (eg, stacked bar chart) for each APF it should be stated for which variable type it is suitable (ideally at the beginning or end). For example, it doesn't tell you for which variable type line graphs are appropriate.The rest of my few comments are not nitty-gritty but rather picky: - The order of the Headings (links) of the chapter on the bottom is not the same as on the right hand side.- it would be good to have a button at the end of each page that jumps you back to the beginning rather than having to scroll up there.- "case-control study" should be hyphenated.Subchapter "Types of variables":- Numerical variable is introduced in plural; sometimes an "A" precedes the type of variable, sometimes not.- the point "organisation of data" is rather slim. Consider deleting it from the title of the subheading chapter and place the text (even with a line list example) before the introduction of variable types.Subchapter "Types of variables":- there is a table-heading called "two-by-two tables". Consider using the generic term "contingency tables"- dummy tables: I tend to put the column "cases" before "total", anyway.Subchapter "Other types .."- it should be explained what a box-and-whisker plot displays.
Hope this mail finds you well and you will find some of it helpful.
Agnes Hajdu replied on 8/12/2010 11:17:34 PM:
Thank you for the thorough review of the chapter and for the valuable suggestions! (and sorry for the late reply...)
Currently I am exploring possible formats for the guide you recommended - summarising appropriate displays for each type of data (with conditions that may apply). More challenging than I thought. :-) Your minor comments are also appreciated, will do the modifications.
Alain is on holidays now, he promised to reply after the 16th. I hope you are available to discuss any issues remaining.
You need to be logged in to post comments.
You can log in here. You can register here if you haven't done so yet.