Changes between Version 46 and Version 47 of SOPs/mapping
- Timestamp:
- 06/02/17 08:16:19 (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/mapping
v46 v47 197 197 '''[https://github.com/alexdobin/STAR STAR]''' 198 198 199 STAR is an ultrafast universal RNA-seq aligner. It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first. STAR tends to align more reads to pseudogenes compared to Tophat2. However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping.199 STAR ([https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf manual]) is an ultrafast universal RNA-seq aligner. It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first. STAR tends to align more reads to pseudogenes compared to Tophat2. However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping. 200 200 201 201 Sample command: … … 203 203 To generate genome index files for STAR: 204 204 {{{ 205 bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 39 --runThreadN 8 206 }}} 205 bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 100 --runThreadN 8 206 }}} 207 208 The parameters included in the above sample command are: 209 * '''--sjdbOverhang ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database. For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info). 210 * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step. Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes. 211 207 212 To map: 208 213 {{{ … … 214 219 * '''--genomeDir </path/to/GenomeDir>''' Specifies the path to the directory used for storing the genome information created in the genomeGenerate step. 215 220 * '''--genomeFastaFiles <genome FASTA files>''' Specifies genome FASTA files to be used. 216 * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step. Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.217 221 * '''--readFilesIn <read1.fastq read2.fastq> ''' Specifies the fastq files containing the reads, can be single-end or paired-end. 218 222 * '''--sjdbScore <n> ''' Provides extra alignment score for alignments that cross database junctions (default = 2). If this score is positive, it will bias the alignment toward annotated junctions. This is only used if during the genomeGenerate step a splice junction annotation file is used. 219 223 * '''--runThreadN <n> ''' Specifies the number of threads to use. 220 * '''--sjdbOverhang ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database. For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info).221 224 222 225