Studying the effect of an exposure (risk factor, behaviour, intervention etc) on a health outcome within a population is a key part of epidemiology. If life was truly simple, then measuring the distributions of the exposure and outcome of interest in a population and presenting these variables in a single two-by-two table would be enough to determine this effect (relative risk, odds ratio, vaccine effectiveness etc).

However, life is always more complex; there are 'third variables' that can distort (confound) our observation of the effect of interest. In some studies there may be many of these third variables, which we therefore call confounders.

In epidemiology there are different ways to address confounding.

- Matching is a way to
*prevent*confounding during the stage of the study design. - Restriction is another way to
*prevent*confounding, which is also planned for during the stage of the study design. - Performing a multivariate or a stratified analysis is a way to
*control*confounding during the analysis, and not during the design of a study

Matching is most often used in a case control design, but it is also possible to use it with a cohort study design.

A confounding factor is a factor associated with the outcome (independently from exposure) and also associated with exposure (without being in the biological pathway between exposure and outcome). The confounding factor distorts the measurement of the effect (RR or OR) between the exposure and the outcome. Matching is the process that leads to have the same distribution of the confounding factor among cases and controls.

If the study was not planned with a matched design, an alternative solution to *control* confounding will be to perform a stratified analysis or to use multivariate models (for example a logistic regression model).

If matching was performed during the study design, it will need to be taken into account during the analysis. In this event, the formula used to calculate the OR will be different, and a special type of logistic regression should be used (conditional logistic regression). Therefore the table format and the analysis to be used in a matched case control study are different than those be be used in an unmatched case control study.

During the study design, matching can be performed according to different principles of matching, called frequency matching and individual matching,

Matching has *some* advantages and *many* disadvantages. Therefore, the decision on whether to do a matched design must be carefully thought, especially nowadays where epidemiologists are not performing calculations by hand and multivariate models like logistic regression are available from many softwares. The greatest advantage is that by doing a matched design, we will be sure that no strata contains few or none observations, therefore increasing the efficency of the analysis, with a reduced sample size and a higher amount of information per subject.

Matching is often used for convenience e.g. when it is difficult to obtain a random sample of the source population as controls. However there is no need to match since there are many limitations and traps when using a matching strategy. If resources are available a larger sample and an "a posteriori" stratified analysis may be easier to design and conduct, especially if we are confident that we can collect data on the main confounding variables. If we decide to match, we should make sure that the matching factor is a confounder, that we do not need to further study that factor, and that identification of matched controls will be logistically feasible and easier than an unmatched selection of more controls.