Context Navigation

Changes between Version 40 and Version 41 of SOP/scRNA-seq

Timestamp:: 05/04/21 12:22:37 (4 years ago)
Author:: twhitfie
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOP/scRNA-seq

-              v40
+              v41
 Here is the [http://barcwiki.wi.mit.edu/wiki/SOP/scRNA-seq/Slingshot sample R code for with Slingshot], which is the top ranked method for bifurcation.
+=== Detect genes that are differentially expressed between conditions ===
+Often one is interested in how a treatment or condition alters the gene expression profile in a selected cell type.  In the desirable case where multiple biological replicates are present for each condition, K. D. Zimmerman, M. A. Espeland and Carl D. Langefeld have recently [https://www.nature.com/articles/s41467-021-21038-1 highlighted] the importance of properly taking account of the correlation present in such hierarchically structured data.  One strategy, the so-called pseudobulk approach, is to aggregate counts across cells from the same biological sample or subject.  Mixed-effects modeling, where sample is treated as a random effect, is another strategy.  The code below uses mixed effects modeling within [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 MAST] and has been adapted from [https://github.com/kdzimm/PseudoreplicationPaper K. D. Zimmerman et al.]
+   * Testing for differential expression with mixed-effects models in MAST:
+{{{
+require(MAST)
+require(dplyr)
+require(gdata)
+require(reshape2)
+# Assume that differential expression is being tested with in a single
+# cluster/cell type from a single cell RNA-seq data-set.  The counts matrix
+# is the starting point.  From a Seurat object, it can be extracted as follows:
+allgenesCT<-t(as.data.frame(clusterCells@assays$RNA@counts)) # Un-normalized
+allgenesCT<-as.data.frame(allgenesCT)
+ngenes<-ncol(allgenesCT)
+allgenesCT$Sample<-clusterCells@meta.data$Sample
+allgenesCT$Treat<-clusterCells@meta.data$Treat
+allgenesCT$wellKey<-paste(clusterCells@meta.data$Sample, rownames(clusterCells@meta.data), sep = "_")
+rownames(allgenesCT)<-allgenesCT$wellKey
+allgenesCT<-allgenesCT[,c(ngenes+3,ngenes+2,ngenes+1,1:ngenes)]
+genecountsRaw<-as.matrix(t(allgenesCT[,c(-1,-2,-3)]))
+genecounts<-log2(genecountsRaw + 1) # Normalization with pseudocount.
+coldata<-allgenesCT[,1:3]
+coldata$Sample<-as.factor(coldata$Sample)
+# Filter out genes expressed in <10% of cells
+genecounts<-genecounts[rowSums(genecounts>0)/(ncol(genecounts))>=0.1,]
+genecounts<-genecounts[,rownames(coldata)]
+fDataCT<-data.frame(primerid=rownames(genecounts))
+sca<-MAST::FromMatrix(exprsArray=genecounts, cData=coldata, fData=fDataCT)
+# CDR is the proportion of genes detected in each cell and is defined in
+# G. Finak et al., Genome Biology 16, 278 (2015).  CDR functions as a
+# regularizer in the model.
+cdr2<-colSums(SummarizedExperiment::assay(sca)>0)
+# Reassign cellular detection rate.
+SummarizedExperiment::colData(sca)$ngeneson<-scale(cdr2)
+# Assign predictors as factors.
+SummarizedExperiment::colData(sca)$Sample<-factor(SummarizedExperiment::colData(sca)$Sample)
+SummarizedExperiment::colData(sca)$Treat<-factor(SummarizedExperiment::colData(sca)$Treat)
+# In the model below, "Treat" is a categorical fixed effect, while "Sample"
+# is a categorical random effect with intercept varying by sample.
+options(mc.cores = 4) # The number of available cores is hardware-dependent.
+zlmCond<-zlm(~ Treat + ngeneson + (1 | Sample), sca, method='glmer', ebayes = FALSE, fitArgsD = list(nAGQ = 0), strictConvergence = FALSE, parallel=TRUE)
+# Tabulate statistics for for treatment ("TreatT") versus control ("TreatC" is
+# the reference level of "Treat").
+summaryCond<-MAST::summary(zlmCond,doLRT=TRUE)
+summaryDt<-summaryCond$datatable
+fcHurdle<-merge(summaryDt[contrast=='TreatT' & component=='H',.(primerid, `Pr(>Chisq)`)], summaryDt[contrast=='TreatT' & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid')
+fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')] # Correct for multiple testing.
+fcHurdle<-stats::na.omit(as.data.frame(fcHurdle)) # Omit NAs, if present.
+fcHurdle<-fcHurdle[order(fcHurdle$fdr),]
+# Examine the top differentially expressed genes, ranked by FDR.
+head(fcHurdle)
+}}}