Changes between Version 56 and Version 57 of SOPs/mapping


Ignore:
Timestamp:
11/03/17 14:36:23 (8 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/mapping

    v56 v57  
    135135
    136136These mappers permit the beginning and end of a read to map to (originate from) different places in the genome, which is common for spliced RNA.
    137 
    138 '''tophat version 1 (old)'''
    139 
    140 '''TopHat version 1 is no longer recommended.'''
    141 Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session.  First run this command:
    142 {{{
    143 export PATH="/usr/local/share/tophat1:$PATH"
    144 }}}
    145 and then check that your terminal will use the correct TopHat version:
    146 {{{
    147 tophat --version
    148 }}}
    149 
    150 Sample command:
    151 {{{
    152 bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
    153 }}}
    154 
    155 The parameters included in the sample command are:
    156   * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
    157   * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
    158   * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
    159   * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
    160   * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
    161   * '''-p/--num-threads''' Use this many threads to align reads (default is 1)
    162 Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
    163   * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
    164   * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
    165 
    166 '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]'''
    167 
    168 '''TopHat version 2 is no longer recommended.'''  The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2].
    169 TopHat version 2 uses bowtie2, rather than bowtie, for its mapping.  As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt).
    170 
    171 Sample command:
    172 {{{
    173 # Single-end reads
    174 bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
    175 # Paired-end reads
    176 # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp).  The inner distance, or insert size, does not include length of the reads/mates.  For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.
    177 bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt
    178 }}}
    179 
    180 The parameters included in the sample command are:
    181   * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
    182   * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
    183   * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
    184   * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
    185   * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
    186   * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand
    187 
    188 Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
    189   * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
    190   * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
    191   * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding
    192 
    193 Choices for controlling alignment (eg. mismatches)
    194   * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2).
    195   * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2).
    196   * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2).
    197   * '''--segment-mismatches'''  Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2).
    198137
    199138'''[https://github.com/alexdobin/STAR STAR]'''
     
    274213
    275214
     215'''tophat version 1 (old)'''
     216
     217'''TopHat version 1 is no longer recommended.'''
     218Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session.  First run this command:
     219{{{
     220export PATH="/usr/local/share/tophat1:$PATH"
     221}}}
     222and then check that your terminal will use the correct TopHat version:
     223{{{
     224tophat --version
     225}}}
     226
     227Sample command:
     228{{{
     229bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
     230}}}
     231
     232The parameters included in the sample command are:
     233  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
     234  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
     235  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
     236  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
     237  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
     238  * '''-p/--num-threads''' Use this many threads to align reads (default is 1)
     239Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
     240  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
     241  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
     242
     243'''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]'''
     244
     245'''TopHat version 2 is no longer recommended.'''  The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2].
     246TopHat version 2 uses bowtie2, rather than bowtie, for its mapping.  As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt).
     247
     248Sample command:
     249{{{
     250# Single-end reads
     251bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
     252# Paired-end reads
     253# For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp).  The inner distance, or insert size, does not include length of the reads/mates.  For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.
     254bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt
     255}}}
     256
     257The parameters included in the sample command are:
     258  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
     259  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
     260  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
     261  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
     262  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
     263  * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand
     264
     265Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
     266  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
     267  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
     268  * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding
     269
     270Choices for controlling alignment (eg. mismatches)
     271  * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2).
     272  * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2).
     273  * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2).
     274  * '''--segment-mismatches'''  Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2).
     275
    276276== Others ==
    277277