Context Navigation

Changes between Version 34 and Version 35 of SOPs/rna-seq-diff-expressions

Timestamp:: 04/27/17 08:18:41 (8 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/rna-seq-diff-expressions

-              v34
+              v35
         * Example, using an extremely precise balance:  If Dick weighs more than Sally, we cannot conclude that males weigh more than females because we know nothing about the variability of weights among males and among females.  Even if we weighed several individuals together, we'd still be missing information about within-group variability.
     * Sample commands to get raw counts from an alignment file:
         * ''coverageBed -split -abam accepted_hits.bam -b transcripts.gtf > transcript.coverage.bed'' (See the [[http://code.google.com/p/bedtools/wiki/Usage#coverageBed|bedTools]] page for details)
+        * ''coverageBed -split -abam accepted_hits.bam -b transcripts.gtf > transcript.coverage.bed'' (See the [http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html bedTools coverage] page for details)
         * ''htseq-count -m intersection-strict --stranded=no accepted_hits.sam -b transcripts.gff > transcript.coverage.txt''  (See the [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] page for details)
         * In our view, htseq-count is better at handling reads that map to a genome region with overlapping genes.
 …
   * **Mapping**
     * Use [[http://tophat.cbcb.umd.edu/manual.html|TopHat]] (or another spliced mapper) to map short reads to the genome of choice.
     * See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for more details.
+    * Use [https://ccb.jhu.edu/software/tophat/manual.shtml TopHat] (or another spliced mapper) to map short reads to the genome of choice.
+    * See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping mapping SOP] for more details.
   * **Quantification of raw counts**
 …
     * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization.
   * **Quantification of FPKM values**
     * Use [[http://cufflinks.cbcb.umd.edu/manual.html|cufflinks]]
+  * **Quantification by FPKM (Fragments Per Kilobase of transcript per Million mapped reads)**
+    * Use [http://cole-trapnell-lab.github.io/cufflinks/manual/ cufflinks]
       * to quantify transcripts and genes in a GTF file (ex: bsub cufflinks -G gene_models.gtf accepted_hits.bam)
       * to quantify transcripts, potentially novel, annotated by cufflinks (ex: bsub cufflinks accepted_hits.bam)
 …
         * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
      * A second option to get fpkm is to use Cuffquant and Cuffnorm as described here  [[http://cole-trapnell-lab.github.io/monocle-release/getting-started/]] and here [[http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/|cuffnorm]]. The default normalization for Cuffnorm is the same than the normalization performed by DEseq.
+     * A second option to get fpkm is to use Cuffquant as described with [http://cole-trapnell-lab.github.io/monocle-release/getting-started/ monocle] and and [http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/#running-cuffnorm Cuffnorm]. The default normalization for Cuffnorm is the same than the normalization performed by DEseq.
   * **Gene filtering**
     * Remove from the analysis any genes with 0 counts across all samples.  Some analysis tools do this themselves.