Context Navigation

Changes between Version 1 and Version 2 of SOPs/rna-seq-diff-expressions_TE

Timestamp:: 02/23/21 13:30:39 (4 years ago)
Author:: twhitfie
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/rna-seq-diff-expressions_TE

-              v1
+              v2
 === Background ===
+    * For each sample, map reads to genome using splice-aware mapper.
+    * Count reads mapped to each gene (or other set of features).
+    * Use gene counts to identify differentially expressed genes.
+    * Transposable elements make up between 20 to 80% of the genome sequence for many eukaryotes, yet are typically excluded from the analysis that follows transcriptomic profiling with RNA-seq.  This exclusion is due to the repetitive nature of transposons and the ambiguity that accompanies assigning multi-mapping reads.
-=== General suggestions ===
-  * **Preliminary issues**
-    * Statistics for all methods require a matrix of counts (positive integer values) for each gene for each sample.
-    * Create a tab-delimited matrix of integer counts, with column labels for each sample.
-    * Genes with no counts in any sample should generally be removed to permit higher statistical power to identify differential expression.
-    * According to [[http://www.ncbi.nlm.nih.gov/pubmed/20167110|Bullard et al., 2010]], differential expression analysis is influenced more by the normalization method than by the choice of differential expression statistic.
-    * Note that without replication, one cannot make very strong conclusions.  High-throughput sequencing, just like every other technology, needs biological replication.
-        * One can conclude that certain genes in sample A have a different RNA abundance than in sample B, but the results cannot be generalized.
-        * Example, using an extremely precise balance:  If Dick weighs more than Sally, we cannot conclude that males weigh more than females because we know nothing about the variability of weights among males and among females.  Even if we weighed several individuals together, we'd still be missing information about within-group variability.
-    * Sample commands to get raw counts from an alignment file:
-        * ''coverageBed -split -abam accepted_hits.bam -b transcripts.gtf > transcript.coverage.bed'' (See the [http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html bedTools coverage] page for details)
-        * ''htseq-count -m intersection-strict --stranded=no accepted_hits.sam transcripts.gff > transcript.coverage.txt''  (See the [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] page for details)
-        * In our view, htseq-count is better at handling reads that map to a genome region with overlapping genes.
 === Step by step analysis ===