The aim of model building is to select the variables which will
the best model to explain the observed data. Model building will be
on methods, experience and common sense. The epidemiologist, not the software
responsible for the analysis and model building process.
The most frequent approach to model building is to achieve the
model (number of variables) that still explain the data. The smallest is
because it is also the more stable. Another objective is also to provide
best possible control of confounding within the data set.
The selection of variables should start with a careful univariate
analysis of each variable. This involves defining if the variable is
described as a dichotomous, polytomous or continuous and verifying
assumptions. This also involves, prior to the logistic regression
doing a careful stratified analysis by the means of 2xn contingency tables.
provides a unique way to look at the data (what is in each cell of 2x2
Once the univariate analysis is completed we will select all
with a statistical test leading to a p-value bellow a predefined cut-off level. A cut-off level of p-value < 0,25 is often used. We should
include all variables we believe have a biological or public health
According to literature the use of more conservative or traditional
< 0,05) does not always allow for identifying all variables known to
important. One should also keep in mind that a group of variables which
individually important in the model may play a collective role
Several methods can be used to asses the fit of a best model. They
Following the achievement of the best model fit, the importance of
variable should then be verified by comparing the crude association and
results of the model including comparison of confidence intervals and its statistical significance. The process
adding, fitting, dropping refitting continues until all variables in the
are judged either statistically or biologically important.
Once we have a model with all relevant variables we then should
if interaction terms should be added. This implies that categories or
assumptions have been verified for polytomous and continuous variables.
<<Back to Logistic regression