Changes between Version 25 and Version 26 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
01/13/16 09:19:57 (9 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v25 v26  
    2828
    2929  * **Quantification of raw counts**
    30     * Use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene
     30    * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene
    3131      * Include same GTF file describing gene models as was used for mapping -- but think carefully about what genes should be included (such as long non-coding RNAs, microRNAs, or piRNAs)
    3232      * Carefully choose the best "mode" to handle reads that don't completely map to exactly one gene
     
    3838         * For paired-end reads the sam file has to be sorted by read name, or coordinate, eg. ''bsub  "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"''
    3939         * To request a certain amount of memory and a specific node use ''bsub  -R "rusage[mem=50000]" -m NodeName ''
    40     * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc.
     40         * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc.
     41    * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package
     42      * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads
     43      * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools)
     44      * Sample commands:
     45         * ''featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (single-end reads)
     46         * ''featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (paired-end reads)
     47
    4148    * For custom analyses, you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization.
    4249