wiki:SOPs/anova

Note that ANOVA and post-hoc tests can be performed in Prism too.

One-way ANOVA

See the Prism help page for some general considerations.

Reading in data

  • Use read.* or create appropriate dataframe
# Input data from a tab-delimited text file of the format
# weight group
# 56   a
# 29   b
# ...
strains = read.delim("brain_weights.txt",header=TRUE)

# Input data for 4 different groups by creating a dataframe by hand
a = c(56,60,44,53)
b = c(29,38,18,35)
c = c(11,25,7,18)
d = c(26,44,20,32)
strains.frame = data.frame(a, b, c, d)
strains = stack(strains.frame)
colnames(strains) = c("weight", "group")

Creating an ANOVA table

  • Use the command anova or aov with summary. The first argument is the dependent variable, followed by ~, and then by independent variable(s).
  • So if we want to set up a model where weight is a function of the group (e.g., the weight potentially depends on the group)
# Syntax 1
anova( lm(weight ~ group, data=strains) )

# Syntax 2
summary( aov(weight ~ group, data=strains) )

Post-test: Comparing all pairs of means

Tukey

  • "Tukey's method is more conservative but may miss real differences too often" - Intuitive Biostatistics (p.259)
TukeyHSD( aov(weight ~ group, data=strains) )

Dunnett

  • Useful if you want to compare a reference group to all other groups (instead of doing an all vs. all comparison)
  • The first group ("a" in this example) is used as the reference group. If this is not the case, use the relevel() command to set the reference, like strains$group = relevel(strains$group, "b")
library(multcomp)
summary(glht(aov(weight ~ group, data=strains), linfct=mcp(group="Dunnett")))

Repeated-measures ANOVA

Repeated-measures ANOVA is typically needed if multiple measurements are made on the same sample (such as assaying a mouse's weight during a time course).

Two-way ANOVA

Two-way ANOVA should be used for experiments where two different factors are being studied (such as comparing different treatments of different genotypes of mice).

Reading in data, plotting, and summarizing

  • Use read.* or create appropriate dataframe
# Input data from a tab-delimited text file of the format
# weight treatment genotype
# 56   a   ko
# 29   b   wt
# 60   a   wt
# ...
strains = read.delim("brain_weights.txt",header=TRUE)

# Plot the data by group
boxplot(weight ~ paste(genotype, treatment), data=strains)
stripchart(weight ~ paste(genotype, treatment), data=strains, vert=T, method="jitter", jitter = 0.4, pch=19, cex=2, add=T)

# Summarize the data by group
tapply(strains$weight, paste(strains$genotype, strains$treatment), mean)

Creating an ANOVA Table

  • Use the command anova or aov with summary. The first argument is the dependent variable, followed by ~, and then by independent variable(s).
  • So if we want to set up a model where weight is a function of the group and/or the genotype, with a potential interaction (e.g., the difference between groups depends on the genotype), the typical analysis would look like
# Syntax 1
anova( lm(weight ~ group * genotype, data=strains) )
anova( lm(weight ~ genotype * group, data=strains) )

# Syntax 2
summary( aov(weight ~ group * genotype, data=strains) )
summary( aov(weight ~ genotype * group, data=strains) )

Note that the p-value for each factor depends on the order of the factors in the above formulas.

Post-test: Comparing all pairs of means

As before, with 1-way ANOVA,

TukeyHSD( aov (weight ~ group * genotype, data=strains) )

If the experimental design is unbalanced (e.g., some groups are more replicated than others), we need a more complex model.