Context Navigation

Changes between Version 46 and Version 47 of SOPs/rna-seq-diff-expressions

Timestamp:: 06/15/17 10:15:51 (8 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/rna-seq-diff-expressions

-              v46
+              v47
   * **Quantification by FPKM (Fragments Per Kilobase of transcript per Million mapped reads)**
+    * Use [http://cole-trapnell-lab.github.io/cufflinks/manual/ cufflinks]
+      * to quantify transcripts and genes in a GTF file (ex: bsub cufflinks -G gene_models.gtf accepted_hits.bam)
+      * to quantify transcripts, potentially novel, annotated by cufflinks (ex: bsub cufflinks accepted_hits.bam)
+    * '''Method 1''': Use [http://cole-trapnell-lab.github.io/cufflinks/manual/ cufflinks]
+      * This is the traditional method.
+      * To quantify transcripts and genes in a GTF file, use a command like
+{{{
+bsub cufflinks -G gene_models.gtf accepted_hits.bam
+}}}
+      * To quantify transcripts, potentially novel, annotated by cufflinks, use a command like
+{{{
+bsub cufflinks accepted_hits.bam
+}}}
+      * Gene-level FPKM values are calculated by taking the sum of all transcript FPKMs for a gene.  As a result, no "gene length" needs to be calculated.
       * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks.  This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length).  When this occurs, the standard err output of cufflinks (contained in the long LSF email when cufflinks is run via 'bsub') will contain the message "Warning: Skipping large bundle."  To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match.
+      * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
 {{{
             awk -F"\t" '{print $9}' genes.gtf | awk '{print $2}' | perl -pe 's/\"//g;s/;//g' | sort -u > gtf_genes.txt
 …
 }}}
+        * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
+     * A second option to get fpkm is to use Cuffquant as described with [http://cole-trapnell-lab.github.io/monocle-release/getting-started/ monocle] and and [http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/#running-cuffnorm Cuffnorm]. The default normalization for Cuffnorm is the same than the normalization performed by DEseq.
+    * '''Method 2''': Use the fpkm() function from DESeq2.  This method requires raw counts and any measure of gene length (such as mean or median length of transcripts of gene) that one needs to produce independently (or create a GRanges data structure).  It can produce (raw) FPKMs [using fpkm(... robust = FALSE)] and normalized FPKMs [using fpkm(... robust = TRUE)].
+    * '''Method 3''': Using Cuffquant to get '''normalized''' FPKMs, as described with [http://cole-trapnell-lab.github.io/monocle-release/getting-started/ monocle] and [http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/#running-cuffnorm Cuffnorm]. The default normalization for Cuffnorm is the same than the normalization performed by DESeq.
+    * '''Method 4''': Use featureCounts
+      * Gene-level FPKM values are calculated from gene-level counts, corrected for library size and "gene length", as defined by the length of the non-redundant overlap of all exons of all transcripts of that gene.  Typically this method overestimates gene length since the length of a gene is by definition at least as long as every transcript defining the gene.
   * **Gene filtering**
     * Remove from the analysis any genes with 0 counts across all samples.  Some analysis tools do this themselves.