Changes between Version 19 and Version 20 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
09/11/14 15:53:13 (10 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v19 v20  
    112112    * Fold change thresholds may be used in addition, if desired, to select genes that are changing an amount that is biologically meaningful.
    113113
    114  * **Remove batch effects**
    115     * With [[http://www.bioconductor.org/packages/release/bioc/html/edgeR.html|edgeR]], the treatments can be adjusted for differences between the batch with addictive model formula by design = model.matrix( ~Batch+Treatment). For example, with target.txt like this:
     114 * **Accounting for a batch effect in a differential expression model**
     115    * With [[http://www.bioconductor.org/packages/release/bioc/html/edgeR.html|edgeR]], the comparison can be adjusted for differences between batches (such as date of sample preparation) with an addictive model formula using the equation
     116
     117  '''design = model.matrix(~Batch+Treatment)'''.
     118
     119  For example, with a targets file (target.txt) like this:
    116120
    117121
    118122   ||= Sample =||= genotype =||= date =||
    119    ||  KO.1  || KO || old ||
    120    ||  KO.2  || KO      || old ||
    121    ||  WT.1  || WT      || old ||
    122    ||  KO.3  || KO      || new ||
    123    ||  KO.4  || KO      || new ||
    124    ||  WT.2  || WT      || new ||
     123   ||  KO.1  ||  KO  ||  old  ||
     124   ||  WT.1  ||  WT  ||  old  ||
     125   ||  KO.2  ||  KO  ||  new  ||
     126   ||  WT.2  ||  WT  ||  new  ||
    125127
    126128
    127   Sample R codes for reducing the date effect (old/new) are
     129  Sample R code for reducing the date effect (old vs. new) is
    128130
    129131 {{{
    130    target   <- read.delim("target.txt", header=T)
    131    genotype <- factor(target$genotype, levels=c("WT", "KO"))
    132    mydate   <- factor(target$date, levels=c("old", "new"))
    133    Design   <-model.matrix(~mydate+genotype)
    134    colnames(Design)
    135 # [1] "(Intercept)" "mydatenew"   "genotypeKO"
     132        target   <- read.delim("target.txt", header=T)
     133        genotype <- factor(target$genotype, levels=c("WT", "KO"))
     134        mydate   <- factor(target$date, levels=c("old", "new"))
     135        Design   <- model.matrix(~mydate+genotype)
     136        colnames(Design)
     137        # [1] "(Intercept)" "mydatenew"   "genotypeKO"
    136138
    137    Y2  <- calcNormFactors(Y) # Y is the DGEList
    138    Y2  <- estimateGLMCommonDisp(Y2, Design, verbose=TRUE)
    139    Y2  <- estimateGLMTrendedDisp(Y2, Design)
    140    Y2  <- estimateGLMTagwiseDisp(Y2, Design)
    141    Fit <- glmFit(Y2, Design)
    142    Lrt <- glmLRT(Fit, coef="genotypeKO") # coef is pointed to the column name in Design
     139        Y2  <- calcNormFactors(Y) # Y is the DGEList
     140        Y2  <- estimateGLMCommonDisp(Y2, Design, verbose=TRUE)
     141        Y2  <- estimateGLMTrendedDisp(Y2, Design)
     142        Y2  <- estimateGLMTagwiseDisp(Y2, Design)
     143        Fit <- glmFit(Y2, Design)
     144        Lrt <- glmLRT(Fit, coef="genotypeKO") # where 'coef' points to the column name in 'Design'
    143145 }}}
    144146
    145   Lrt$table has the log2FC(Fold change) and P-value.
     147  The data structure Lrt$table contains the values for log2FC (log 2 fold change) and p-value.
    146148   
    147   Detail information can be found in [[http://www.bioconductor.org/packages/release/bioc/html/edgeR.html|edgeR]]
     149  Detailed information can be found in the [[http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf|edgeR User Guide]] (after a search for "batch effects").
    148150
    149151