Context Navigation

Changes between Version 25 and Version 26 of SOPs/rna-seq-diff-expressions

Timestamp:: 01/13/16 09:19:57 (10 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/rna-seq-diff-expressions

-              v25
+              v26
   * **Quantification of raw counts**
     * Use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene
+    * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene
       * Include same GTF file describing gene models as was used for mapping -- but think carefully about what genes should be included (such as long non-coding RNAs, microRNAs, or piRNAs)
       * Carefully choose the best "mode" to handle reads that don't completely map to exactly one gene
 …
          * For paired-end reads the sam file has to be sorted by read name, or coordinate, eg. ''bsub  "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"''
          * To request a certain amount of memory and a specific node use ''bsub  -R "rusage[mem=50000]" -m NodeName ''
+    * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc.
+         * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc.
+    * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package
+      * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads
+      * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools)
+      * Sample commands:
+         * ''featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (single-end reads)
+         * ''featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (paired-end reads)
     * For custom analyses, you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization.