Changes between Version 25 and Version 26 of SOPs/rna-seq-diff-expressions
- Timestamp:
- 01/13/16 09:19:57 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/rna-seq-diff-expressions
v25 v26 28 28 29 29 * **Quantification of raw counts** 30 * Use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene30 * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene 31 31 * Include same GTF file describing gene models as was used for mapping -- but think carefully about what genes should be included (such as long non-coding RNAs, microRNAs, or piRNAs) 32 32 * Carefully choose the best "mode" to handle reads that don't completely map to exactly one gene … … 38 38 * For paired-end reads the sam file has to be sorted by read name, or coordinate, eg. ''bsub "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"'' 39 39 * To request a certain amount of memory and a specific node use ''bsub -R "rusage[mem=50000]" -m NodeName '' 40 * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc. 40 * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc. 41 * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package 42 * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads 43 * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools) 44 * Sample commands: 45 * ''featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (single-end reads) 46 * ''featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (paired-end reads) 47 41 48 * For custom analyses, you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization. 42 49