Changes between Version 18 and Version 19 of SOPs/mapping


Ignore:
Timestamp:
12/04/14 15:21:38 (10 years ago)
Author:
yhuang
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/mapping

    v18 v19  
    128128  * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding
    129129
     130'''[https://github.com/alexdobin/STAR STAR]'''
     131
     132STAR is an ultrafast universal RNA-seq aligner.  It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first.  STAR tends to align more reads to pseudogenes compared to Tophat2.  However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping. 
     133
     134Sample command:
     135
     136To generate genome directory for STAR:
     137{{{
     138bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --runThreadN 8
     139}}}
     140To map:
     141{{{
     142bsub STAR --genomeDir /path/to/GenomeDir --readFilesIn /path/to/read1.fastq /path/to/read2.fastq --sjdbScore 2 --outFileNamePrefix whateverPrefix --runThreadN 8
     143}}}
     144
     145The parameters included in the sample command are:
     146  * '''--runMode <alignReads, genomeGenerate>'''   "alignReads" does the actual mapping. "genomeGenerate" generates the genomeDir required for mapping (default = alignReads).
     147  * '''--genomeDir </path/to/GenomeDir>'''  Specifies the path to the directory used for storing the genome information created in the genomeGenerate step.
     148  * '''--genomeFastaFiles <genome FASTA files>''' Specifies genome FASTA files to be used.
     149  * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step.  Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.
     150  * '''--readFilesIn <read1.fastq read2.fastq> ''' Specifies the fastq files containing the reads, can be single-end or paired-end.
     151  * '''--sjdbScore <n> ''' Provides extra alignment score for alignments that cross database junctions (default = 2). If this score is positive, it will bias the alignment toward annotated junctions. This is only used if during the genomeGenerate step a splice junction annotation file is used. 
     152  * '''--runThreadN <n> ''' Specifies the number of threads to use.