Types of variables

Methods for describing epidemiological measurements (by person, place and time) depend on the type of data or variables used [1, 2, 3]. A variable is a characteristic of the data under consideration. Types of variables can be classified in a number of ways. One common way to classify variables is by measurement scale which distinguishes four scale types: nominal, ordinal, interval and ratio scales [4]. Another common classification system contains two main classes: categorical (qualitative) and numerical (quantitative) variables. In general categorical variables cover the first two types of measurement scale, while numerical variables cover the second two scale types.

A categorical variable (also known as qualitative variable) is one for which each response can be put in a specific category. Categorical variables can be either nominal or ordinal.

A nominal variable is one that describes a name or category, e.g. occupation, place of birth, diagnosis. There is no inherent order in the set of possible names or categories. Nominal data is called dichotomous when it is characterised by only two classes e.g. sex (male/female), exposure history (yes/no).

An ordinal variable is a categorical variable for which the possible categories can be placed in a specific order or in some natural way that gives additional information, e.g. severity of illness may be categorised and ordered as "mild", "moderate" or "severe".

A numerical variable (also known as quantitative variable) is one that can assume a number of real values, units of measurements are used. Not all variables described by numbers are considered numerical. When the person is asked to assign a value from 1 to 5 to express the severity of his/her disease, numbers are used, but the variable itself (severity) is an ordinal variable.

A discrete variable can only take a finite number of real values, usually whole numbers. This variable often relates to counted items, e.g. number of new cases of salmonellosis in a given year, number of people in a household. Discrete variables may also be grouped.

A continuous variable assumes an infinite number of real values, though necessarily recorded to a predetermined degree of precision. It often relate to measured items such as age, weight, temperature. To make them easier to handle, continuous variables are usually grouped into "class intervals" (e.g. age groups).

Line listing

When individual records are collected, they are typically entered and organised in a spreadsheet on a computer (or, if computer is not at one's disposal, on paper) where each row represents a case and each column represents a variable of interest (e.g. demographic information, clinical details, epidemiological information such as risk and exposure factors etc), creating a line listing. Such a list can be useful to view the entire database as time progresses, to fill in gaps of information, to share results with others on the team, an simply to "eyeball" for obvious errors, outliers, and trends [5]. It is a working document that also makes it easier to regroup and count cases by their characteristics, for example by using pivot tables [6].

New cases should be added to the list as they are identified, and all cases should be updated throughout the study or investigation as new information is obtained. Line listings that contain only the basic critical information have the advantage of providing a quick visual assessment of different aspects. However, a line listing with additional information may be more useful for assessing and characterizing the event of interest. All line listings should include the components of the case definition. In situations where more than one person enters data in the database, it is recommended to include initials of those who enter data in the database, should questions arise abou the data entered [7].

A line listing enables the investigator to quickly summarize, visualize and analyze the key components of the data. See an example below.

Table: Partial line listing of a gastroenteritis outbreak

Initials (1) Age (2) Sex (3) Date of onset (4*)  Presenting symptoms (5) No. of diarrhoeal episodes per day (6) Duration of illness in days (7) Severity of illness (8) Pathogen (9)
Diarrhoea Vomiting Fever Other
N.L. 34 F May 4 1 1 0 0 3 5 severe Salmonella
G.D. 52 F May 5 1 0 0 nausea 2 4 mild pending
I.P. 26 M May 5 1 0 1 0 2 3 moderate pending
F.R. 40 F May 8 1 1 0 nausea 2 1 mild Norovirus
D.A. 37 F May 5 1 0 1 abdominal cramps 3 6 severe Salmonella
E.J. 61 M May 9 1 0 1 headache 3 4 severe Salmonella

Nominal variables: 1, 3, 5, 9; Ordinal variable: 4, 8; Discrete variable: 6; Continuous variable: 2, 7 (*Though time is continuous, date is ordinal.)

As line listings will contain individual patient data, including identifiers, disease outcomes and risk factors, these files need to be considered as individual patient data and have to be treated with the same confidentiality and care as regular medical files. Proper data protection procedures need to be in place and monitored.

References


1. Merrill RM, Timmreck TC. An introduction to epidemiology. 4th ed.  Sudbury, Massachusetts: Jones and Bartlett Publishers; 2005.

2. McLennan W. 1331.0 Statistics - a powerful edge! 2nd ed. Australian Bureau of Statistics; 1998. p. 103.

3. Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991. p. 10-13.

4. http://en.wikipedia.org/wiki/Level_of_measurement

5. Gregg MB. Appendix - A walk through exercise: a food-borne epidemic in Oswego county, New York. In Field Epidemiology. Ed Gregg MB. Oxford University Press, New York, 2002, p. 420

6. Fontaine RE, Goodman RA, Describing the findings. In Field Epidemiology. Ed Gregg MB. Oxford University Press, New York, 2002, p. 79

7. Torok M. Case finding and line listing: a guide for investigators. Focus on Field Epidemiology. Volume 1, Issue 4. The North Carolina Institute for Public Health