| 130 | '''[https://github.com/alexdobin/STAR STAR]''' |
| 131 | |
| 132 | STAR is an ultrafast universal RNA-seq aligner. It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first. STAR tends to align more reads to pseudogenes compared to Tophat2. However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping. |
| 133 | |
| 134 | Sample command: |
| 135 | |
| 136 | To generate genome directory for STAR: |
| 137 | {{{ |
| 138 | bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --runThreadN 8 |
| 139 | }}} |
| 140 | To map: |
| 141 | {{{ |
| 142 | bsub STAR --genomeDir /path/to/GenomeDir --readFilesIn /path/to/read1.fastq /path/to/read2.fastq --sjdbScore 2 --outFileNamePrefix whateverPrefix --runThreadN 8 |
| 143 | }}} |
| 144 | |
| 145 | The parameters included in the sample command are: |
| 146 | * '''--runMode <alignReads, genomeGenerate>''' "alignReads" does the actual mapping. "genomeGenerate" generates the genomeDir required for mapping (default = alignReads). |
| 147 | * '''--genomeDir </path/to/GenomeDir>''' Specifies the path to the directory used for storing the genome information created in the genomeGenerate step. |
| 148 | * '''--genomeFastaFiles <genome FASTA files>''' Specifies genome FASTA files to be used. |
| 149 | * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step. Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes. |
| 150 | * '''--readFilesIn <read1.fastq read2.fastq> ''' Specifies the fastq files containing the reads, can be single-end or paired-end. |
| 151 | * '''--sjdbScore <n> ''' Provides extra alignment score for alignments that cross database junctions (default = 2). If this score is positive, it will bias the alignment toward annotated junctions. This is only used if during the genomeGenerate step a splice junction annotation file is used. |
| 152 | * '''--runThreadN <n> ''' Specifies the number of threads to use. |