Changes between Version 46 and Version 47 of SOPs/mapping


Ignore:
Timestamp:
06/02/17 08:16:19 (8 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/mapping

    v46 v47  
    197197'''[https://github.com/alexdobin/STAR STAR]'''
    198198
    199 STAR is an ultrafast universal RNA-seq aligner.  It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first.  STAR tends to align more reads to pseudogenes compared to Tophat2.  However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping. 
     199STAR ([https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf manual]) is an ultrafast universal RNA-seq aligner.  It maps >60 times faster than Tophat2. To use STAR, a genome directory specific for the STAR mapper needs to be generated first.  STAR tends to align more reads to pseudogenes compared to Tophat2.  However, the pseudogene problem can be significantly minimized by providing an annotation file containing known splice junctions. If no annotation is available for a genome of interest, a 2-pass mapping procedure is recommended. The first pass generates a splice junctions file, which is then used as the annotation file to run the second pass mapping. 
    200200
    201201Sample command:
     
    203203To generate genome index files for STAR:
    204204{{{
    205 bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 39 --runThreadN 8
    206 }}}
     205bsub STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --sjdbGTFfile /path/to/GTF/FileName.gtf --sjdbOverhang 100 --runThreadN 8
     206}}}
     207
     208The parameters included in the above sample command are:
     209  * '''--sjdbOverhang  ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database.  For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info).
     210  * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step.  Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.
     211
    207212To map:
    208213{{{
     
    214219  * '''--genomeDir </path/to/GenomeDir>'''  Specifies the path to the directory used for storing the genome information created in the genomeGenerate step.
    215220  * '''--genomeFastaFiles <genome FASTA files>''' Specifies genome FASTA files to be used.
    216   * '''--sjdbGTFfile <GTF_file.gtf>''' Supplies STAR with a GTF file during the genomeGenerate step.  Combined with the --sjdbScore <n> option during mapping, this will bias the alignment toward annotated junctions, and reduces alignment to pseudogenes.
    217221  * '''--readFilesIn <read1.fastq read2.fastq> ''' Specifies the fastq files containing the reads, can be single-end or paired-end.
    218222  * '''--sjdbScore <n> ''' Provides extra alignment score for alignments that cross database junctions (default = 2). If this score is positive, it will bias the alignment toward annotated junctions. This is only used if during the genomeGenerate step a splice junction annotation file is used. 
    219223  * '''--runThreadN <n> ''' Specifies the number of threads to use.
    220   * '''--sjdbOverhang  ''' Specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database.  For short reads (<50) use readLength - 1, otherwise a generic value of 100 will work as well (see manual for more info).
    221224
    222225