## Advantages of matching

Matching is a useful method to optimize resources in a case control study.

Matching on a factor linked to other factors may automatically control for the confounding role of those factors (e.g. matching on neighborhood may control for socio-economic factors).

Matching allows to use a smaller sample size, by preparing the stratified analysis "a priori" (before the study, at the time of cases and control selection), with smaller sample sizes as compared to an unmatched sample with stratified analysis made "a posteriori".

Matching avoids a stratified analysis with too many strata, with potentially no case or control, done to control several confounding factors at the same time. Indeed, in an unmatched case control study, while we perform logistic regression, or even more simply a stratified analysis, we might end up with empty strata (no cases or no control in some strata). Matching avoids this situation.

## Disadvantages of matching

The efficiency in data analysis that matching provides is limited by several disadvantages.

The greatest disadvantage of matching is that the effect of matching factor on the occurrence of the disease of interest cannot be studied anymore. *One should therefore limit matching to factors that are already known to be risk factors for the studied outcome.*

If statistical softwares with logistic regression are available, it is possible to *control* for many confounding factors during the analysis of the study, and therefore *preventing* confounding by matching during the design of the study might not be needed, especially if the study is including a large population and there are few chances that we will end up with empty strata.

If matching is performed, it must also be taken into account in the statistical analysis, because a matched OR needs to be calculated, and conditional logistic regression need to be used.

However the study of the matching factor as an effect modifier is still possible if doing a stratified analysis over several categories of the matching factor. For example when matching on age, analysis is still feasible within each age stratum created. However to use different age categories than those used for matching would require a multivariable analysis. Trying to identify a dose response involving a matching factor would also require a multivariable model of analysis.

Matching on criteria that are only associated with exposure and not with outcome further biases the measurement of the effect. In this situation the matching factor is not a confounding factor and matching would bring the OR towards 1.

Another difficulty occurs when matching on several factors. It then becomes difficult (time and energy) to logistically identify and recruit controls due the high number of matching factors (e.g. same age, sex, socio economic status, occupation, etc.). Matching on several criteria may improve the efficiency of statistical analysis with a reduced sample size but the difficulties to recruit controls may jeopardize that efficiency. It may also exclude cases for which no matched controls can be identified. In addition, matching on many criteria increases the risk of matching on exposure (therefore bringing the OR closer to one). This is sometimes called *overmatching*.

One major challenge when matching is to properly define the various strata of the matching variable. For example when frequency matching on age, we need to make sure that, within each of the age group created, age is no longer a confounding factor. This is sometimes called residual confounding. Several analysis with several width of age strata may be tested. For example, let's suppose we stratify on several age groups 20 years wide (0-19, 20-39, 40-59, 60-79, 80+). To assess if age is still a confounder within one age group we could further stratify (by five years age group) and test if age is still a confounding factor inside a 20 years wide age group. So it may still be important to take account of age as a potential confounder in a multivariable analysis.