# Comparisons for agreement

Let’s say we have data that is only rank order from two or more evaluators (people, algorithms, etc.) and we want to determine if the evaluators agree or not.

Agreement here meaning the results from one person or another are in agreement, or they are concordant. This is typically done with this non-parametric method for 3 or more evaluators. For a comparison of two evaluators consider using Cohen’s Kappa or Spearman’s correlation coefficient as they are more appropriate.

To use an example, let’s ask three people to rank order ten popular movies. 1 being the least favorite and 10 being the favorite of the list. Here’s the data from evaluator’s A, B, and C:

 A B C 1 7 6 5 6 4 6 2 8 7 5 5 10 9 10 4 3 1 8 1 3 3 10 9 9 4 7 2 8 2

These three had perfect agreement we wouldn’t need to evaluate if they agreed. So, the question is, do they agree well enough to conclude they tend to like the same movies or not?

Compute R<>sub<i/h2>

$\displaystyle {{R}_{i}}=\sum\limits_{j=1}^{m}{{{r}_{ij}}}$

where, i is the individual items being ranked, in this case 1 through 10. Basically tally up the scores from each evaluator for each item.

m is the number of evaluators, in this case 3.

Therefore, we find

 i Ri 1 14 2 15 3 16 4 17 5 29 6 8 7 12 8 22 9 20 10 12

## Compute R̄

$\displaystyle \bar{R}=m(n+1)/2$

n is the number of items being ranked, in this case 10.

Therefore,

R̄ = 16.5.

## Compute S, sum of squared deviations

$\displaystyle S=\sum\limits_{i=1}^{n}{{{\left( {{R}_{i}}-\bar{R} \right)}^{2}}}$

S = 320.5

This is the tally of the squared differences between the  the sum of the three scores for each movie and the overall average rank.

## Compute Kendall’s coefficient of concordance, W

W is determined with

$\displaystyle W=\frac{12S}{{{m}^{2}}\left( {{n}^{3}}-n \right)}$

W will be between zero and one. Values close to zero imply no agreement and W values closer to one imply agreement.

Working out the example we find

W = 0.432

## Compute the test statistic

The test statistic, T.S., comes from the data and is compared to the critical value to determine if there is concordance or not.

$\displaystyle T.S.=\frac{12S}{m\left( {{n}^{2}}-1 \right)}$

In the example, this turns out to be 11.654

## Compute critical value

With 10 items, n = 10, and since degrees of freedom, df, is one less then n, we have df = n-1 and in this case df = 9.

We used a chi-squared, χ2 with confidence of 1 – α, and df = n – 1.

For n > 7 use the χ2 table, for n < 7 use the direct probability from table of critical values in your statistics book (one example is Siegel S and Castellan Jr. N.J. Nonparametric Statistics for the Behavioral Sciences (1988) International Edition. McGraw-Hill Book Company New York. ISBN 0-07-057357-3 Table T. Critical values for Kendall coefficient of concordance W p. 365)

In our example, χ20.10, 9 = 14.684.

## Compare W to test statistic

Ho: There is no convincing evidence of agreement if T.S. < the critical value

Ha: There is convincing evidence of agreement if T.S. > the critical value

In our example, the T.S. = 11.654 and the critical value is 14.684, therefore the T.S. is less than the critical value and we conclude that while there appears to become agreement it is not sufficient to conclude the evaluates agree.

1. Three considerations for sample size
2. Mann-Whitney U Test
3. Laplace’s Trend Test
4. Failure modes and mechanisms
5. 3 Steps NRTL use for product safety
6. Reliability Growth Testing
7. Discovery Testing
8. Hypothesis Test Selection Flowchart
9. 10 steps of FMEA