# Paired-Comparison Hypothesis Tests

Hypothesis testing previously discussed (link to past posts) generally considered samples from two populations. Maybe the experiments explored design changes, different component vendors, or two groups of customers. Occasionally you may find data that has some relationship between the samples, or where the samples are from the same population. Paired (or matched) data involves samples that are related in some meaningful way.

If we wanted to compare the diagnostic capability of two shops, for example, we could use the same set of bikes and ask both shops to inspect and provide an estimate for repairs. The two shops inspect the same samples, thus the samples are paired. Another example involves very similar samples, separated during testing for exposure to different conditions. The idea is each sample has a partner sample (or is the same sample) in the two sets of samples or measurements under consideration.

## Test Setup

The null hypothesis for a paired t-test is Ho: μd = Do.

A paired t-test is often a two-sided test, which looks for a difference where one sample is higher or lower than the other by Do. You can also look for differences that are less than or greater than zero, or some other value. The three alternate hypothesis become:

μd > Do
μd < Do
μd ≠ Do

Note: we are assuming the differences are normally distributed. If the differences are not normally distributed use the binomial hypothesis test or the Wilcoxon signed rank test instead. d is the difference in measurements or readings of the paired samples. d-bar is the average of the differences, and sd is the standard deviation of the differences. The degrees of freedom used to determine the critical value is df = n-1. The critical value is tα/2,df where (1 – α)100% is the type I confidence level. We calculate the test statistic using

$\displaystyle t=\frac{\bar{d}-{{D}_{o}}}{{}^{{{s}_{d}}}\!\!\diagup\!\!{}_{\sqrt{n}}\;}$

The degrees of freedom used to determine the critical value is df = n-1. The critical value (or rejection region) for the three tests given a (1-α)100% confidence level becomes:

Reject Ho if t > tα,df
Reject Ho if t < tα,df
Reject Ho if |t| > tα/2,df

Let’s say we have two technicians measuring the diameter of bicycle fork tubes with calipers. We suspect the measurement method is different between the two technicians and want to learn if it is significant. Therefore, using five tubes we asked each technician to measure the tube diameter. The data follows:

Sample Technician A Technician B Difference (d)
1 3.125 3.110 0,015
2 3.120 3.095 0.025
3 3.135 3.115 0.020
4 3.130 3.120 0.010
5 3.125 3.125 0

The average of the differences d-bar is 0.014 and the standard deviation, sd = 0.0096. The five samples, n = 5, provides degrees-of-freedom of df = n-1 = 5-1 = 4.

The critical value is t0.025, 4 = 2.776 given an α = 0.05 or a 95% confidence level.

The test statistic is

$\displaystyle t=\frac{\bar{d}-{{D}_{o}}}{{}^{{{s}_{d}}}\!\!\diagup\!\!{}_{\sqrt{n}}\;}=\frac{0.014-0}{{}^{0.0096}\!\!\diagup\!\!{}_{\sqrt{5}}\;}=3.256$

Since 3.256 is larger than 2.776 and in the rejection region, the null hypothesis is rejected. This means there is convincing evidence the two technicians do not measure the fork tubes and arrive at the same results.