== Using RNA-Seq to quantify gene levels and assay for differential expression for transposable elements ==

=== Background ===

    * Transposable elements make up between 20 to 80% of the genome sequence for many eukaryotes, yet are typically excluded from the analysis that follows transcriptomic profiling with RNA-seq.  This exclusion is due to the repetitive nature of transposons and the ambiguity that accompanies assigning multi-mapping reads.

=== Step by step analysis ===

  * **Mapping**
    * Use [https://github.com/alexdobin/STAR STAR] or another spliced mapper to map short reads to the genome of choice.
    * See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping mapping SOP] for more details.

  * **Quantification of raw counts**

    * Is your sequencing library stranded or unstranded?  This information is needed to help these tools accurately count features.  Strandedness of some library prep methods:
      * TruSeq Stranded mRNA Kits ("TruSeqStrandedPolyA") reads are reverse stranded (stranded in the reverse direction relative to the transcript orientation).
      * SMART-Seq v4 Ultra Low Input RNA Kit ("SMARTerUltra-lowPOLYA-V4") reads are unstranded.
      * KAPA RNA HyperPrep Kits ("KAPAHyperPrepmRNA") reads are reverse stranded. 
      * The Whitehead Genome Core has some more [http://genomecore.wi.mit.edu/index.php/NCBISubmission Library Prep Descriptions].
    * See [[SAMBAMqc]] (and/or look at mapped reads in a genome browser) to determine or verify strandedness

{{{
# single-end reads (unstranded)
featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
# single-end reads (forward stranded)
featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
# single-end reads (reverse stranded)
featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam


# paired-end reads (unstranded)
featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
# paired-end reads (forward stranded)
featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
# paired-end reads (reverse stranded)
featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam
}}}


 * **Other**
   * Review articles:
      * [[http://www.ncbi.nlm.nih.gov/pubmed/24020486|Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.]] - Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D.  Genome Biol. 2013 Sep 10;14(9):R95. 
      * [[http://www.ncbi.nlm.nih.gov/pubmed/21106489|A survey of statistical software for analyzing RNA-seq data]] - Gao D, Kim J, Kim H, Phang TL, Selby H, Tan AC, Tong T.  Hum Genomics. 2010 Oct;5(1):56-60.
      * [[http://www.ncbi.nlm.nih.gov/pubmed/21176179|From RNA-seq reads to differential expression results]] - Oshlack A, Robinson MD, Young MD.  Genome Biol. 2010;11(12):220. Epub 2010 Dec 22.
   * For more practical information, see the third session of [http://jura.wi.mit.edu/bio/education/R2011/ An introduction to R and Bioconductor: A BaRC Short Course] and the  [http://jura.wi.mit.edu/bio/hot_topics/ BaRC Hot Topic] (under "Short Read Sequencing", see "Practical RNA-Seq analysis")