Changes between Version 52 and Version 53 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
11/02/17 12:37:01 (7 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v52 v53  
    2828
    2929  * **Quantification of raw counts**
    30     * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene
     30
     31    * Currently our favorite tool for this is [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package.
     32      * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads
     33      * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools)
     34      * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting.
     35      * Sample commands:
     36{{{
     37# single-end reads (unstranded)
     38featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     39# single-end reads (forward stranded)
     40featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     41# single-end reads (reverse stranded)
     42featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     43
     44
     45# paired-end reads (unstranded)
     46featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     47# paired-end reads (forward stranded)
     48featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
     49# paired-end reads (reverse stranded)
     50featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
     51}}}
     52
     53    * [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] works fine to get counts for each gene, but it's quite slow.
    3154      * Include same GTF file describing gene models as was used for mapping -- but think carefully about what genes should be included (such as long non-coding RNAs, microRNAs, or piRNAs)
    3255      * Is your sequencing library stranded or unstranded?  This information is needed to help htseq-count accurately count features.  If the library prep method is "TruSeqStrandedPolyA", for example, the reads will be stranded in the reverse direction (relative to the transcript orientation).
     
    5477 
    5578}}}
    56        
    57     * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package
    58       * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads
    59       * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools)
    60       * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting.
    61       * Sample commands:
    62 {{{
    63 # single-end reads (unstranded)
    64 featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
    65 # single-end reads (forward stranded)
    66 featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
    67 # single-end reads (reverse stranded)
    68 featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
    69 
    70 
    71 # paired-end reads (unstranded)
    72 featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
    73 # paired-end reads (forward stranded)
    74 featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
    75 # paired-end reads (reverse stranded)
    76 featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
    77 }}}
    78 
    79     * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization.
     79
    8080    * **NOTE:**
    8181      * Both htseq-count and featureCounts ignore multi-mapped reads (ie. these will not get counted) by default.  In featureCounts use -M option to count multi-mapped reads, if needed.
    8282      * Summary metrics reported in both htseq-count and featureCounts is with respect to number of records (ie. lines) in the bam file, to summarize by reads further parsing/processing may be needed: extra information can be obtained from i) htseq-count use -o option and ii) featureCounts use -R option.
    8383
     84    * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq2) require raw input values without any pseudocounts or normalization.
    8485
    8586  * **Quantification by FPKM (Fragments Per Kilobase of transcript per Million mapped reads)**