Changes between Version 46 and Version 47 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
06/15/17 10:15:51 (8 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v46 v47  
    7878
    7979  * **Quantification by FPKM (Fragments Per Kilobase of transcript per Million mapped reads)**
    80     * Use [http://cole-trapnell-lab.github.io/cufflinks/manual/ cufflinks]
    81       * to quantify transcripts and genes in a GTF file (ex: bsub cufflinks -G gene_models.gtf accepted_hits.bam)
    82       * to quantify transcripts, potentially novel, annotated by cufflinks (ex: bsub cufflinks accepted_hits.bam)
     80
     81    * '''Method 1''': Use [http://cole-trapnell-lab.github.io/cufflinks/manual/ cufflinks]
     82      * This is the traditional method.
     83
     84      * To quantify transcripts and genes in a GTF file, use a command like
     85{{{
     86bsub cufflinks -G gene_models.gtf accepted_hits.bam
     87}}}
     88
     89      * To quantify transcripts, potentially novel, annotated by cufflinks, use a command like
     90{{{
     91bsub cufflinks accepted_hits.bam
     92}}}
     93
     94      * Gene-level FPKM values are calculated by taking the sum of all transcript FPKMs for a gene.  As a result, no "gene length" needs to be calculated.
    8395      * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks.  This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length).  When this occurs, the standard err output of cufflinks (contained in the long LSF email when cufflinks is run via 'bsub') will contain the message "Warning: Skipping large bundle."  To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match.
    84 
     96      * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
    8597{{{
    8698            awk -F"\t" '{print $9}' genes.gtf | awk '{print $2}' | perl -pe 's/\"//g;s/;//g' | sort -u > gtf_genes.txt
     
    88100}}}
    89101   
    90         * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
    91      * A second option to get fpkm is to use Cuffquant as described with [http://cole-trapnell-lab.github.io/monocle-release/getting-started/ monocle] and and [http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/#running-cuffnorm Cuffnorm]. The default normalization for Cuffnorm is the same than the normalization performed by DEseq.
     102    * '''Method 2''': Use the fpkm() function from DESeq2.  This method requires raw counts and any measure of gene length (such as mean or median length of transcripts of gene) that one needs to produce independently (or create a GRanges data structure).  It can produce (raw) FPKMs [using fpkm(... robust = FALSE)] and normalized FPKMs [using fpkm(... robust = TRUE)].
     103
     104    * '''Method 3''': Using Cuffquant to get '''normalized''' FPKMs, as described with [http://cole-trapnell-lab.github.io/monocle-release/getting-started/ monocle] and [http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/#running-cuffnorm Cuffnorm]. The default normalization for Cuffnorm is the same than the normalization performed by DESeq.
     105
     106    * '''Method 4''': Use featureCounts
     107      * Gene-level FPKM values are calculated from gene-level counts, corrected for library size and "gene length", as defined by the length of the non-redundant overlap of all exons of all transcripts of that gene.  Typically this method overestimates gene length since the length of a gene is by definition at least as long as every transcript defining the gene.
     108
    92109  * **Gene filtering**
    93110    * Remove from the analysis any genes with 0 counts across all samples.  Some analysis tools do this themselves.