Stratification is one of the pillars of epidemiological analysis. It allows investigators to familiarise with the distribution of data according to the variables of interest, to estimate the effect of a variable adjusted by the effect of covariates or confounding factors and to study interaction or effect modification between two factors.

However, stratification is limited in the number of variables to be examined simultaneously because the number of subjects in each stratum may drop to 0 or 1 thus even statistical methods for dealing with sparse data may not be applicable.

Regression analysis overcomes this limitation by estimating regression models to approximate the function describing the relationship between dependent and independent variables.

The different regression analysis techniques are very efficient estimating the independent effect of several covariates and for the study of interactions. On the other hand, modelling data encompasses underlying assumptions. Researchers should be familiar with regression techniques and the interpretation of results to assure that underlying model assumptions are realistic. Researchers using regression analysis may loss track of patterns of data distribution and the process may not be well understood by the target audience.

A combination of both techniques, stratification and regression, is probably the best approach for the analysis of epidemiological data.

In this chapter, we will focus on "logistic regression models", a regression analysis technique suited for the analysis of case-control data.

Topics covered in this chapter include:

  1. Linear models
  2. The logistic model
  3. Fitting logistic regression models
    1. Interpreting model coefficients
    2. Estimating Odds Ratios in the presence of interaction
  4. Model building strategies