Performing and reporting statistical tests

Doing a statistical analysis often involves several steps. The steps often depend on whether one is doing one test (comparing two vectors) or multiple tests (like differential expression analysis). Much of this is ideally discussed before the experiment is performed. In brief:

  1. Understand the details of the experiment
  2. Look at the raw data
  3. Produce a draft figure showing the desired comparison
  4. Determine the statistical test(s)
  5. Organize the data
  6. Perform the statistical test(s)
  7. Do multiple hypothesis correction (if needed)
  8. Interpret the statistical test
  9. Report the results and the detailed methods

In more detail:

  • Understand the details of the experiment
  1. What is the big picture? What is begin studied? Why was the experiment performed?
  2. What is the experimental design? Is there biological and/or technical replication?
  3. What is the sex of each sample and/or cell line? (Relevant if they're not all the same.)
  4. Were the samples obtained in different batches? Or at different times (which can be thought of as batches)?
  5. Are there other characteristics/parameters of cells or animals that could be considered?
  6. Does the experiment have paired design? Are some samples more similar to each other than to the other samples, implying some type of grouping?
  7. Are there potential quality control issues that we should know?
  8. Are there any potential confounding problems (like batch being correlated with treatment)?
  9. For time courses, are measurements repeated on the same samples or on different samples?
  10. What measurements were taken?
  11. Is there any missing data? How should that be handled?
  12. Are there potential extreme data points? If so, are those valid measurements or more likely due to technical errors?
  13. What specific question is the statistics hoping to address? This can be tricky, as slightly different questions may require completely different tests.
  • Look at the raw data
  1. Do some kind of scatterplot to show all the data.
  2. For large-scale experiments, can correlations or PCA plots show similarity/relationships between all samples?
  3. What does a global look at the data tell you?
  4. Did the experiment work?
  5. How are the values distributed?
  6. Are there any obvious problems?
  7. Are there obvious batch effects?
  8. Are there any obvious "outliers" due to some technical (or biological) reason? Be very careful about removing these (and be prepared to report how/why they were removed). If at all in doubt, keep all the data.
  9. Could the data need some form of normalization or scaling to make factors more comparable?
  • Produce a draft figure showing the desired comparison
  1. What comparisons should be made?
  2. Does it look to the eye that there's something going on? It may not be worth going any further is summary measures show nothing interesting.
  • Determine the statistical test(s)
  1. What test best addresses the question the scientist is asking?
  2. Is there a standard test for this method?
  3. Does you data meet all assumptions required of the test? If your data need to come from a normal distribution, for example, you can run the Shapiro-Wilk test of normality (see shapiro.test() in R).
  4. Given the answers to the above question, the best test may be the simplest one.
  5. GraphPad (makers of Prism) has a nice but brief presentation about How to Choose the Right Statistical Test that's a good place to start. Prism documentation includes lots of additional informative details for scientists who aren't experts in statistics.
  • Organize the data
  1. Is the data in a format that can be used by the statistical software? R, for example, has a concept of "wide" and "long" data formats.
  2. Does it make sense to scale, normalize, or transform (such as with logs) the data? This is generally recommended to make the data more amenable to the assumptions (requirements) of the statistical test.
  • Perform the statistical test(s)
  1. Run the analysis with a tool like R, Prism, or Excel.
  2. Capture all of the appropriate results and details of the test, often more than just the p-value.
  3. Does the software produce any warnings? If so, how should you react?
  • Do multiple hypothesis correction
  1. If more than one test is performed, you probably need multiple hypothesis correction.
  2. Which is better: Bonferroni, FDR, or another test?
  3. Be prepared to explain why this is necessary.
  • Interpret the statistical test(s)
  1. What does the p-value and/or the confidence interval tell you?
  2. Does the statistical model of the data effectively describe the data? How valid is the p-value?
  3. Does the statistics imply that the result is biologically meaningful (Perhaps the hardest question)?
  • Report the results and the detailed methods
  1. What figure(s) best displays the data and summarizes the analysis? Unless there are lots of data points (> 10 or 100, for example), display them all (perhaps along with summary metrics).
  2. Describe and/or show the confidence interval, the effect size, and/or another measure of the magnitude of an effect/change (and not just the p-value).
  3. What can be concluded from the analysis? How generally does this apply?
  4. How many significant figures should be reported?
  5. Report the method with enough detail so that, with the original data, a knowledgeable person could replicate the statistical analysis.

Through all of this, resist any temptation to bias the analysis to produce the result one wants or expects.

Some journals explicitly state statistics guidelines

Note: See TracWiki for help on using the wiki.