Hypothesis Testing

Fisher’s Exact Test

This is how to do Fisher’s exact test. First construct a 2x2 contingency table (this is a perfect result for the lady drinking tea)

(C = matrix(c(4,0,0,4), 2, 2))
##      [,1] [,2]
## [1,]    4    0
## [2,]    0    4

Now run Fisher’s test on the table. alternative = “greater” option means that we are testing the probability of getting an equal or even better table by random luck (i.e., one with higher numbers on the diagonal of the table)

fisher.test(C, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  C
## p-value = 0.01429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  2.003768      Inf
## sample estimates:
## odds ratio 
##        Inf

Notice this is the same as the probability of getting exactly this table by random luck, because there is no better table. There are “8 choose 4” possible guesses, so the probability of the correct guess is:

(p = 1 / choose(8, 4))
## [1] 0.01428571

Now try if she misclassifies two cups

(C = matrix(c(3,1,1,3), 2, 2))
##      [,1] [,2]
## [1,]    3    1
## [2,]    1    3
fisher.test(C, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  C
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.3135693       Inf
## sample estimates:
## odds ratio 
##   6.408309

And now 4 misclassified cups

(C = matrix(c(2,2,2,2), 2, 2))
##      [,1] [,2]
## [1,]    2    2
## [2,]    2    2
fisher.test(C, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  C
## p-value = 0.7571
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.05093776        Inf
## sample estimates:
## odds ratio 
##          1

Notice this is just \(1 - p\), where \(p\) was the probability above of a perfect answer This is because there is exactly one combination that is worse

(C = matrix(c(1,3,3,1), 2, 2))
##      [,1] [,2]
## [1,]    1    3
## [2,]    3    1
fisher.test(C, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  C
## p-value = 0.9857
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.00326534        Inf
## sample estimates:
## odds ratio 
##   0.156047

Everything wrong! Notice the probability of doing this well or better is 1.

(C = matrix(c(0,4,4,0), 2, 2))
##      [,1] [,2]
## [1,]    0    4
## [2,]    4    0
fisher.test(C, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  C
## p-value = 1
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##    0 Inf
## sample estimates:
## odds ratio 
##          0

Testing the sample mean from a Gaussian RV

Let’s test the hypothesis that the Michelson-Morley data was on average higher than the true speed of light

Null hypothesis, \(H_0 : \mu =\) “true speed”

Alternative hypothesis, \(H_1 : mu >\) “true speed”

True speed of light (in km/s minus 299,000 km/s)

trueSpeed = 792.458

Let’s test the sample mean from just the 4th run (this was the closest to correct)

x = morley$Speed[morley$Expt == 4] 
(sampleMean = mean(x) )
## [1] 820.5
sampleSigma = sd(x) 
n = length(x)

Here is the critical value for an alpha = 0.05 significance level

(criticalValue = trueSpeed + sampleSigma / sqrt(n) * qt(1 - 0.05, df = n - 1))
## [1] 815.6729

We reject the null hypothesis if the sampleMean is greater than this (it is)

(sampleMean > criticalValue)
## [1] TRUE

Use a one-sided t test to get a p-value

(tStat = (sampleMean - trueSpeed) / (sampleSigma / sqrt(n))) 
## [1] 2.088677
(pValue = 1 - pt(tStat, df = n - 1))
## [1] 0.02521578

We reject the null hypothesis if this p-value is < 0.05 (our significance level). Note: the final answer (reject or don’t reject) is equivalent to the critical value test

(pValue < 0.05)
## [1] TRUE

All of these steps can be verified with a single R command. (This is what you would use in practice if you did not want to do all of the individual steps)

t.test(x - trueSpeed, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  x - trueSpeed
## t = 2.0887, df = 19, p-value = 0.02522
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  4.827144      Inf
## sample estimates:
## mean of x 
##    28.042

Testing a sample proportion from a Bernoulli RV

Let’s test the hypothesis that Obama was below 50% in Florida in 2012. We’ll use the last Rasmussen poll from 2012, which had him at 48%.
The sample size was 750.

sampleMean = 0.48 
n = 750 
sampleSigma = sqrt(0.48 * (1 - 0.48))

Null hypothesis, \(H_0 : p = 0.5\)

Alternative hypothesis, \(H_1 : p < 0.5\)

Let’s use the t test again on the sample mean. It is the same procedure as above (just notice the sign change because we are testing the hypothesis that it is less)

(criticalValue = 0.5 - sampleSigma / sqrt(n) * qt(1 - 0.05, df = n -
1))
## [1] 0.4699561

Also the test is now if we are less than the critical value

(sampleMean < criticalValue)
## [1] FALSE

Same pValue procedure as above, but now with the “left tail” (notice the “1 -” goes away)

(tStat = (sampleMean - 0.5) / (sampleSigma / sqrt(n))) 
## [1] -1.096323
(pValue = pt(tStat, df = n - 1)) 
## [1] 0.136645
(pValue < 0.05)
## [1] FALSE

Comparing Two distributions

Let’s look at the iris data again and test differences in the Species. Let’s reduce it down to just two of the three species

iris2 = iris[iris$Species == "virginica" | iris$Species == "versicolor",]
iris2$Species = factor(iris2$Species, levels = c("versicolor", "virginica"))

Say we are interested in if these two species have different sepal lengths Our null hypothesis is

H0 : “average sepal lengths of the two species are equal”

Our alternate hypothesis is

H1 : “average sepal lengths are different”

First, let’s do some boxplots of the data

boxplot(Sepal.Length ~ Species, data = iris2, ylab = "Sepal Length")

Looks like there may be a difference, but there is some overlap in the boxplots

Set up the two lists of data

vers = iris2$Sepal.Length[iris2$Species == "versicolor"]
virg = iris2$Sepal.Length[iris2$Species == "virginica"]
n = length(vers)
m = length(virg)

Assuming equal variances, we start with computing the pooled variance

sp = ((n - 1) * var(vers) + (m - 1) * var(virg)) / (n + m - 2) * (1/n + 1/m)

Now, we construct the t-statistic

(t = (mean(vers) - mean(virg)) / sqrt(sp))
## [1] -5.629165

Finally, we get a p-value by looking up in the t distribution. Note this is a two-sided test because of our alternate hypothesis. In the two sided test we have to calculate the probability of being “more extreme” to the right of |t| and to the left of -|t|

(p = (1-pt(abs(t), df = m + n - 2)) + pt(-abs(t), df = m + n - 2))
## [1] 1.724856e-07

This is how to do this hypothesis test step-by-step. In practice it is easiest to use the built-in “t.test” function in R

t.test(vers, virg, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  vers and virg
## t = -5.6292, df = 98, p-value = 1.725e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8818516 -0.4221484
## sample estimates:
## mean of x mean of y 
##     5.936     6.588

Now let’s say that we had a hunch that versicolor would have shorter sepals than virginica before we started the experiment. Then we might have a one-sided alternate hypothesis,

H1 : (versicolor mean) < (virginica mean)

Now our p-value is

p = pt(t, df = m + n - 2)

Notice that the t statistic and the degrees-of-freedom don’t change, just the computation of the p-value

Again, the simple t.test way looks like this:

t.test(vers, virg, var.equal = TRUE, alternative = "less")
## 
##  Two Sample t-test
## 
## data:  vers and virg
## t = -5.6292, df = 98, p-value = 8.624e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4596661
## sample estimates:
## mean of x mean of y 
##     5.936     6.588