## Let's look at the iris data again and test differences in the Species ## Let's reduce it down to just two of the three species iris2 = iris[iris$Species == "virginica" | iris$Species == "versicolor",] iris2$Species = factor(iris2$Species, levels = c("versicolor", "virginica")) ## Say we are interested in if these two species have different sepal lengths ## Our null hypothesis is ## H0 : "average sepal lengths of the two species are equal" ## Our alternate hypothesis is ## H1 : "average sepal lengths are different" ## First, let's do some boxplots of the data boxplot(Sepal.Length ~ Species, data = iris2, ylab = "Sepal Length") ## Looks like there may be a difference, but there is some overlap in the boxplots ## Set up the two lists of data vers = iris2$Sepal.Length[iris2$Species == "versicolor"] virg = iris2$Sepal.Length[iris2$Species == "virginica"] n = length(vers) m = length(virg) ## Assuming equal variances, we start with computing the pooled variance sp = ((n - 1) * var(vers) + (m - 1) * var(virg)) / (n + m - 2) * (1/n + 1/m) ## Now, we construct the t-statistic t = (mean(vers) - mean(virg)) / sqrt(sp) ## Finally, we get a p-value by looking up in the t distribution. ## Note this is a two-sided test because of our alternate hypothesis. ## In the two sided test we have to calculate the probability of being "more ## extreme" to the right of |t| and to the left of -|t| p = pt(abs(t), df = m + n - 2) + (1 - pt(-abs(t), df = m + n - 2)) ## This is how to do this hypothesis test step-by-step. ## In practice it is easiest to use the built-in "t.test" function in R t.test(vers, virg, var.equal = TRUE) ## Now let's say that we had a hunch that versicolor would have shorter sepals ## than virginica before we started the experiment. Then we might have a ## one-sided alternate hypothesis, ## H1 : (versicolor mean) < (virginica mean) ## Now our p-value is p = pt(t, df = m + n - 2) ## Notice that the t statistic and the degrees-of-freedom don't change, just the ## computation of the p-value ## Again, the simple t.test way looks like this: t.test(vers, virg, var.equal = TRUE, alternative = "less")