Changes between Version 43 and Version 44 of SOP/scRNA-seq


Ignore:
Timestamp:
05/04/21 15:46:43 (4 years ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOP/scRNA-seq

    v43 v44  
    212212
    213213=== Detect genes that are differentially expressed between conditions ===
    214 Often one is interested in how a treatment or condition alters the gene expression profile in a selected cell type.  In the desirable case where multiple biological replicates are present for each condition, K. D. Zimmerman, M. A. Espeland and Carl D. Langefeld have recently [https://www.nature.com/articles/s41467-021-21038-1 highlighted] the importance of properly taking account of the correlation present in such hierarchically structured data.  One strategy, the so-called pseudobulk approach, is to aggregate counts across cells from the same biological sample or subject.  Mixed-effects modeling, where sample is treated as a random effect, is another strategy.  The code below uses mixed effects modeling within [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 MAST] and has been adapted from [https://github.com/kdzimm/PseudoreplicationPaper/blob/master/Type_1_Error/Type%201%20-%20MAST%20RE.Rmd K. D. Zimmerman et al.].  Note that the additional complexity and potential benefit of these mixed-effects models are accompanied by increased computational expense: fitting these models to thousands of genes in thousands of cells can be slow. A vignette outlining how to use MAST for differential expression in the more traditional fixed-effect mode (i.e. ''without'' including any random effects) can be found [https://www.bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html | here].
     214Often one is interested in how a treatment or condition alters the gene expression profile in a selected cell type.  In the desirable case where multiple biological replicates are present for each condition, K. D. Zimmerman, M. A. Espeland and C. D. Langefeld have recently [https://www.nature.com/articles/s41467-021-21038-1 highlighted] the importance of properly taking account of the correlation present in such hierarchically structured data.  One strategy, the so-called pseudobulk approach, is to aggregate counts across cells from the same biological sample or subject (for examples of how to implement pseudobulk models see this [https://biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/dechapter.html link]).  Mixed-effects modeling, where sample is treated as a random effect, is another strategy.  The code below uses mixed effects modeling within [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 MAST] and has been adapted from [https://github.com/kdzimm/PseudoreplicationPaper/blob/master/Type_1_Error/Type%201%20-%20MAST%20RE.Rmd K. D. Zimmerman et al.].  Note that the additional complexity and potential benefit of these mixed-effects models are accompanied by increased computational expense: fitting these models to thousands of genes in thousands of cells can be slow. A vignette outlining how to use MAST for differential expression in the more traditional fixed-effect mode (i.e. ''without'' including any random effects) can be found [https://www.bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html | here].
    215215
    216216   * Testing for differential expression with mixed-effects models in MAST:
     
    223223# Assume that differential expression is being tested with in a single
    224224# cluster/cell type from a single cell RNA-seq data-set.  The counts matrix
    225 # is the starting point.  From a Seurat object, it can be extracted as follows:
     225# is the starting point.  From a Seurat object clusterCells, it can be extracted as follows:
    226226allgenesCT<-t(as.data.frame(clusterCells@assays$RNA@counts)) # Un-normalized
    227227allgenesCT<-as.data.frame(allgenesCT)
     
    273273zlmCond<-zlm(~ Treat + ngeneson + (1 | Sample), sca, method='glmer', ebayes = FALSE, fitArgsD = list(nAGQ = 0), strictConvergence = FALSE, parallel=TRUE)
    274274
    275 # Tabulate statistics for for treatment ("TreatT") versus control ("TreatC") is
    276 # the reference level of "Treat").
     275# Tabulate statistics for for treatment ("TreatT") versus control ("TreatC" is the reference level of "Treat").
    277276summaryCond<-MAST::summary(zlmCond,doLRT=TRUE)
    278277summaryDt<-summaryCond$datatable