Inter-rater reliability or concordance is the degree of agreement among raters of a test result. Calculating the inter-rater reliability is useful in refining the user friendliness of diagnostic tests. High quality tests which are difficult to perform and therefore prone to interpretation errors are as good, or as bad, as a test with lower performance but which is easier to administer correctly. Testing for the inter-rater reliability can help to determine if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, the scale is either defective, too difficult to administer correctly, or the raters need to be re-trained.
There are a number of statistics which can be calculated to determine the inter-rater reliability of a test. Different statistics are appropriate for different types of measurement. Some options are: Cohen's kappa and the related Fleiss' kappa .
Test-retest is a statistical method used to determine a test's reliability. The reliability is the variation in measurements taken by a single person on the same patient and under the same conditions. A test may be said to be repeatable when this variation is smaller than an agreed limit. Repeatability conditions include: the same test procedure, the same observer, the same test, used under the same conditions, the same location and repetition over a short period of time .
1. Wikipedia; http://en.wikipedia.org/wiki/Inter-rater_reliability
2. Wikipedia; http://en.wikipedia.org/wiki/Repeatability