Inter-rater reliability

Inter-rater reliability or concordance is the degree of agreement among raters of a test result. Calculating the inter-rater reliability is useful in refining the user friendliness of diagnostic tests. High quality tests which are difficult to perform and therefore prone to interpretation errors are as good, or as bad, as a test with lower performance but which is easier to administer. Testing for the inter-rater reliability can also help determine if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, the scale is defective, the test is too difficult to administer correctly, or the raters need to be re-trained.

There are a number of statistics which can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are: joint-probability of agreement, Cohen's kappa and the related Fleiss' kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation (1).

Test-retest reliability

Test-retest is a statistical method used to determine a test's reliability. The reliability is the variation in measurements taken by a single person on the same patient and under the same conditions. A test may be said to be repeatable when this variation is smaller than some agreed limit. Repeatability conditions include: the same test procedure, the same observer, the same test, used under the same conditions, the same location and repetition over a short period of time (2).

 

Back to overview

 

1. Wikipedia; http://en.wikipedia.org/wiki/Inter-rater_reliability

2. Wikipedia; http://en.wikipedia.org/wiki/Repeatability