Context Navigation

Changes between Version 41 and Version 42 of SOP/scRNA-seq

Timestamp:: 05/04/21 15:32:37 (4 years ago)
Author:: twhitfie
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOP/scRNA-seq

-              v41
+              v42
 === Detect genes that are differentially expressed between conditions ===
 Often one is interested in how a treatment or condition alters the gene expression profile in a selected cell type.  In the desirable case where multiple biological replicates are present for each condition, K. D. Zimmerman, M. A. Espeland and Carl D. Langefeld have recently [https://www.nature.com/articles/s41467-021-21038-1 highlighted] the importance of properly taking account of the correlation present in such hierarchically structured data.  One strategy, the so-called pseudobulk approach, is to aggregate counts across cells from the same biological sample or subject.  Mixed-effects modeling, where sample is treated as a random effect, is another strategy.  The code below uses mixed effects modeling within [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 MAST] and has been adapted from [https://github.com/kdzimm/PseudoreplicationPaper K. D. Zimmerman et al.]
+Often one is interested in how a treatment or condition alters the gene expression profile in a selected cell type.  In the desirable case where multiple biological replicates are present for each condition, K. D. Zimmerman, M. A. Espeland and Carl D. Langefeld have recently [https://www.nature.com/articles/s41467-021-21038-1 highlighted] the importance of properly taking account of the correlation present in such hierarchically structured data.  One strategy, the so-called pseudobulk approach, is to aggregate counts across cells from the same biological sample or subject.  Mixed-effects modeling, where sample is treated as a random effect, is another strategy.  The code below uses mixed effects modeling within [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 MAST] and has been adapted from [https://github.com/kdzimm/PseudoreplicationPaper/blob/master/Type_1_Error/Type%201%20-%20MAST%20RE.Rmd K. D. Zimmerman et al.].  Note that the additional complexity and potential benefit of these mixed-effects models are accompanied by computational expense: fitting these models to thousands of genes in thousands of cells can be slow. A vignette outlining how to use MAST for differential expression in the more traditional fixed-effect mode (i.e. ''without'' including any random effects) can be found [https://www.bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html | here].
    * Testing for differential expression with mixed-effects models in MAST:
 …
 SummarizedExperiment::colData(sca)$Treat<-factor(SummarizedExperiment::colData(sca)$Treat)
+# In the model below, "Treat" is a categorical fixed effect, while "Sample"
+# is a categorical random effect with intercept varying by sample.
+# In the model below, "Treat" is a categorical fixed effect, while "Sample" is a categorical random effect with intercept
+# varying by sample.
+#
+# A note on the nAGQ=0 argument for fitting the discrete component of MAST's hurdle model:
+#
+# Fitting the model involves optimizing an objective function, the Laplace approximation to the deviance, with respect to the
+# parameters. The Laplace approximation to the deviance requires determining the conditional modes of the random effects.
+# These are the values that maximize the conditional density of the random effects, given the model parameters and the data.
+# This is done using Penalized Iteratively Reweighted Least Squares (PIRLS). In most cases PIRLS is fast and stable. It is simply
+# a penalized version of the IRLS algorithm used in fitting GLMs.
+#
+# The distinction between the "fast" (nAGQ=0) and "slow" (nAGQ=1) algorithms in lme4 (to which glmer belongs)
+# is whether the fixed-effects parameters are optimized in PIRLS or a nonlinear optimizer.
 options(mc.cores = 4) # The number of available cores is hardware-dependent.
 zlmCond<-zlm(~ Treat + ngeneson + (1 | Sample), sca, method='glmer', ebayes = FALSE, fitArgsD = list(nAGQ = 0), strictConvergence = FALSE, parallel=TRUE)
 # Tabulate statistics for for treatment ("TreatT") versus control ("TreatC" is
+# Tabulate statistics for for treatment ("TreatT") versus control ("TreatC") is
 # the reference level of "Treat").
 summaryCond<-MAST::summary(zlmCond,doLRT=TRUE)