Context Navigation

Changes between Version 46 and Version 47 of SOPs/mapping

Timestamp:: 06/02/17 08:16:19 (8 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/mapping

-              v46
+              v47
 '''[https://github.com/alexdobin/STAR STAR]'''
 STAR is an ultrafast universal RNA-seq aligner.  It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first.  STAR tends to align more reads to pseudogenes compared to Tophat2.  However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping.
+STAR ([https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf manual]) is an ultrafast universal RNA-seq aligner.  It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first.  STAR tends to align more reads to pseudogenes compared to Tophat2.  However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping.
 Sample command:
 …
 To generate genome index files for STAR:
 {{{
+bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 39 --runThreadN 8
+}}}
+bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 100 --runThreadN 8
+}}}
+The parameters included in the above sample command are:
+  * '''--sjdbOverhang  ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database.  For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info).
+  * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step.  Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.
 To map:
 {{{
 …
   * '''--genomeDir </path/to/GenomeDir>'''  Specifies the path to the directory used for storing the genome information created in the genomeGenerate step.
   * '''--genomeFastaFiles <genome FASTA files>''' Specifies genome FASTA files to be used.
-  * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step.  Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.
   * '''--readFilesIn <read1.fastq read2.fastq> ''' Specifies the fastq files containing the reads, can be single-end or paired-end.
   * '''--sjdbScore <n> ''' Provides extra alignment score for alignments that cross database junctions (default = 2). If this score is positive, it will bias the alignment toward annotated junctions. This is only used if during the genomeGenerate step a splice junction annotation file is used.
   * '''--runThreadN <n> ''' Specifies the number of threads to use.
-  * '''--sjdbOverhang  ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database.  For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info).