Performance of a diagnostic test

Draft provided by: Julia Fitzner and Alain Moren (oct 2006)

To diagnose a specific disease in a patient physicians use a strategy including several categories of information. This involves interpreting the results of interviews, clinical observation and examination, and those of a wide range of laboratory, radiological, histological exams whose number and sophistication increase regularly.

One important aspect of using various tools for helping in a diagnosis is to measure the capacity of the diagnostic tool to appropriately predict the presence of the specific illness. We are interested in measuring the performance of each diagnostic tool.

The performance can be assessed by measuring four indicators: the sensitivity (Se), the specificity (Sp), the positive predictive value (PPN) and the negative predictive value (NPV) of the diagnostic tool.

Sensitivity

The sensitivity of a diagnostic tool measures its capacity to properly identify those patients who have the disease. Sensitivity will correspond to the proportion of the patients with the disease in whom the test is positive. The sensitivity of a test can only be measured among patients for whom the diagnosis is already confirmed by other means than the test we study.

Let's suppose we want to study the sensitivity of a new diagnostic test for disease X. We include in the study 100 patients whose diagnostic was confirmed by another method.

We apply the new test to each of them and count the positive results.

Disease present

___________________________

Test positive             90

Test negative           10

___________________________

Total                      100

Those who test positive are called the true positives and those who test negative are called false negative since they have the disease and the test failed to identify them as having the disease.

In the above example the sensitivity of the test is 90 / 100 = 90% since the test correctly identified 90% of those with the disease.

Specificity

The second criteria to measure the performance of a test is its capacity to correctly identify those people who do not have the disease. The specificity of a test is the proportion of those people without the disease who are correctly identified by the test as not having the disease. In order to measure the specificity of a test we apply the test to a series of persons among whom we have already verified that they did not have the disease.

For disease X we would select 100 persons free of disease and apply the test to them. Absence of disease would have been confirmed by other means than the studied test. We would then measure the proportion who test negative.

Disease absent

___________________________

Test positive             15

Test negative           85

___________________________

Total                      100

Those who test negative are called the true negative and those who test positive are called false positive since they tested positive without having the disease. In the above example the specificity (Sp) = 85%.

Choice of a cutoff value

Most of the test used to help diagnostic procedures (and particularly laboratory tests) are not based on dichotomous measurement. Results frequently correspond to continuous variables (ex. glycemia expressed on mm / l, optical density, etc..). In such situations we use to set up a cutoff value above which (or below which) the test is considered positive.

In a perfect ideal test the distribution of results for people with and without the disease would not overlap. This is illustrated in figure 1. In such a situation a cutoff value at 11 would perfectly discriminate between the two distributions. This ideal situation shown in figure 1 is in fact very rare. Most likely we would face a situation illustrated by figure 2 in which test values overlap between those without and with the disease.

In such a situation, defining the most appropriate cutoff value for deciding if the test is positive or not is crucial. It is important to fix a cutoff value which will offer the best compromise to reduce false negative and false positive results, i.e. a compromise between the sensitivity and the specificity of the test.

In practice the choice of a cutoff value will depend upon the severity of the disease or upon the consequences of the misclassification. The lower the cutoff value the higher the number of TP but the higher as well the number of FP. Alternatively the higher the threshold the higher the number of TN and FN. The choice of the threshold will either increase Se or Sp. It is a trade off.

This is illustrated by the following series of four graphs (taken from http:// www.anaesthetist.com/mnm/stats/roc/ which we recommend you to visit since it allows to visualise the concept through an animated example) in which various cutoff values for a threshold are shown with the consequent values for TP and FP.

In the example the fraction of TP (TPF or sensitivity) and the fraction of FP (FPF or 1-specificity) are shown. The curve in the box shows the value of Se (FPF) and 1-Sp (FPF) according to various cutoff values. The curve described by the relation between Se and 1-Sp is called a "Receiver operating characteristics curve" (ROC curve). ROC curves were developed in the 1950's as a by-product of research into making sense of radio signals contaminated by noise.    What matters is the relationship between TP and FP. Where should we put the cutoff point for diagnosing a disease? The answer is not simple. There are many possible criteria on which to base a decision. These include:

• Financial costs both direct and indirect of treating a disease (present or not), and of failing to treat a disease;
• Costs of further investigation (if appropriate);
• Side effects or complications caused by disease treatment, or failure to treat;
• Mortality / severity associated with treatment or non-treatment.

The best mathematical compromise for a cutoff value corresponds to a graph with the highest area under the ROC curve. The best compromise between Se and Sp in this ROC curve corresponds to the point located at the highest left upper corner of the ROC curve.

However other considerations than mathematical compromise apply.

• When doing prenatal screening for congenital toxoplasmosis false positive results have heavy consequences (interruption of pregnancy, treatment). Then specificity should be high to avoid false positive results.
• When screening for PKU (Guthrie test) at birth a false negative test would miss the disease when false positive results would only lead to a useless prevention which can be stopped later on. The aim is here a high sensitivity of the test.
• Blood screening for malaria needs a high sensitivity test in order to avoid all false negative even if some blood will be discarded due to false positives.

The following series of graphs illustrates the difficulty to chose an appropriate cutoff value according to the degree of overlapping of the distribution of the measured value between the diseased and non diseased populations.    From the above we can see that the more the curve overlap the smaller will be the area under the ROC curve (which is a diagonal when the 2 curves fully overlap).

Predictive values

We perform a diagnostic test because we do not know the diagnosis. The real questions a physician wants to answer are:

"What proportion of the patients I have tested as positive really have the disease?"

and

"What proportion of the patient the test identify as negative do not have the disease?".

Those responses are provided by the positive (PPV) and negative (NPV) predictive values of the test.

Positive predictive value (PPV)

The PPV is the proportion of positive tests which corresponds to true disease. It is the ratio of TP tests divided by all testing positive. The higher the PPV, the higher our capacity to confirm that the disease is present. The PPV is high when the specificity is high.

Disease present Disease absent

______________________________________________________

Test positive                               TP                                     FP

Test negative                             FN                                     TN

______________________________________________________

Total                                        TP + FN                               FP + TN

PPV = TP / (TP + FP)

Alternatively PPV can be computed as:

Se  x Pr

PPV =             -------------------------------------

Se  x  Pr + (1 - Sp) (1 - Pr)

In which Pr = Prevalence

Negative predictive value

The NPV is the proportion of negative tests which corresponds to true absence of disease. It is the ratio of TN tests divided by all negative tests. The higher the NPV, the higher our capacity to confirm that the disease is absent. The NPV is high when the sensitivity is high.

Alternatively NPV can be computed as:

Sp (1 - Pr)

PPV =             -------------------------------

Sp (1-Pr) + (1 - Se) Pr

Predictive values and prevalence

We have seen that predictive values are dependent upon Se (for NPV) and Sp (for PPV). Those values also depend upon the prevalence of the disease in the population within which we are using the test.

The following examples illustrates how the PPV and NPV of the same test (same SE and Sp) are modified by the prevalence of the disease in two populations.

Example with high prevalence

Disease present Disease absent

______________________________________________________

Test positive

Test negative

______________________________________________________

Total

PPV =

Example with low prevalence

Disease present Disease absent

______________________________________________________

Test positive

Test negative

______________________________________________________

Total

PPV =

As suggested above, the performance of a test, once used for screening in a population, does not depend only of its characteristics (Se and Sp) but also of the prevalence of the disease in the population. FP and FN vary according to disease prevalence.

If Se and Sp are kept constant, the PPV increases and the NPV decreases with increasing prevalence.

If the prevalence is low, a test with a good Se and Sp will have a low PPV. Even if only a small proportion of non diseased persons will have a positive test, those false positives will represent the majority of the positive tests. On the other hand the NPV will be high because false negatives will only represent a very small proportion of all negative results.

The following graph shows the variation of PPV and NPV with prevalence (Se and Sp being equal to 0,8). The following graph illustrates the change in predictive values with prevalence for various values of Se and Sp (0,7; 0,8; 0,9; 0,95) The PPV of a test depends upon prevalence and specificity.

The NPV of a test depends upon prevalence and sensitivity.

References

www.anaesthetist.com/mnm/stats/roc/

Dabis F., Drucker J, Moren A. Epidémiologie d'intervention, Arnette, 1992.

Ancelle T. Statistique épidémiologique. Maloine. 2002.