wiki:SOPs/rna-seq-diff-expressions/DESeq

Version 2 (modified by thiruvil, 12 years ago) ( diff )

--

Using DESeq

  • For experiments with or without replication
  • See the DESeq home page for official documentation
  • Also see our Hot Topics RNA-Seq vignette (session 3, third part) for another discussion of the use of DESeq.
  • Package summary: "Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution"
  • From the DESeq publication: "DESeq owes its basic idea to edgeR, yet differs in several aspects."
  • Input is also a matrix of counts, with gene identifiers used as row names.
  • Sample code to use with replication:

 # Use package
 library(DESeq)
 # Read counts
 countsTable = read.delim("Gene_counts.txt")
 # Describe groups
 groups = c(rep("C",3), rep("T", 3))
 # Make a CountDataSet
 cds = newCountDataSet(countsTable, groups)
 # Estimate effective library size
 cds = estimateSizeFactors(cds)
 # Core assumption of the method: variance is a function of expression level
 # Estimate variance for each gene
 cds = estimateDispersions(cds)
 # Do stats based on a negative binomial distribution
 results = nbinomTest(cds, "T", "C")
 write.table(results, file="DESeq_output.txt", sep="\t", quote=F)

 # Data quality assessment by sample clustering and visualisation (optional)
 library(gplots)
 library("RColorBrewer")
 #re-estimate dispersion with method "blind" to ensure analysis is unbiased.
 cdsFullBlind=estimateDispersions(cds, method="blind")
 #vst needed for PCA and heatmap
 vsd=varianceStabilizingTransformation(cdsFullBlind)

 #do heatmap 
 dists=dist(t(exprs(vsd)))
 mat=as.matrix(dists)
 rownames(mat) = colnames(mat) = with(pData(cds), paste(groups, sep=" : "))
 hmcol = colorRampPalette(brewer.pal(9, "GnBu"))(100)
 heatmap.2(mat, trace="none", col = rev(hmcol), margin=c(13, 13))

 #do PCA
 print(plotPCA(vsd, intgroup=c("condition")))

  • Sample code to use without replication:
# Show that all samples are different
 conds = c("a", "b", "c", "d", "e", "f")
  • Then apply the same code as above except for estimateVarianceFunctions():
    • cds = estimateVarianceFunctions(cds, method='blind')
  • Make desired comparison
    • results = nbinomTest(cds, "a", "b")
Note: See TracWiki for help on using the wiki.