Context Navigation

Changes between Version 56 and Version 57 of SOPs/mapping

Timestamp:: 11/03/17 14:36:23 (8 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/mapping

-              v56
+              v57
 These mappers permit the beginning and end of a read to map to (originate from) different places in the genome, which is common for spliced RNA.
-'''tophat version 1 (old)'''
-'''TopHat version 1 is no longer recommended.'''
-Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session.  First run this command:
-{{{
-export PATH="/usr/local/share/tophat1:$PATH"
-}}}
-and then check that your terminal will use the correct TopHat version:
-{{{
-tophat --version
-}}}
-Sample command:
-{{{
-bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
-}}}
-The parameters included in the sample command are:
-  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
-  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
-  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
-  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
-  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
-  * '''-p/--num-threads''' Use this many threads to align reads (default is 1)
-Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
-  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
-  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
-'''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]'''
-'''TopHat version 2 is no longer recommended.'''  The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2].
-TopHat version 2 uses bowtie2, rather than bowtie, for its mapping.  As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt).
-Sample command:
-{{{
-# Single-end reads
-bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
-# Paired-end reads
-# For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp).  The inner distance, or insert size, does not include length of the reads/mates.  For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.
-bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt
-}}}
-The parameters included in the sample command are:
-  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
-  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
-  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
-  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
-  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
-  * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand
-Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
-  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
-  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
-  * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding
-Choices for controlling alignment (eg. mismatches)
-  * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2).
-  * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2).
-  * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2).
-  * '''--segment-mismatches'''  Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2).
 '''[https://github.com/alexdobin/STAR STAR]'''
 …
+'''tophat version 1 (old)'''
+'''TopHat version 1 is no longer recommended.'''
+Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session.  First run this command:
+{{{
+export PATH="/usr/local/share/tophat1:$PATH"
+}}}
+and then check that your terminal will use the correct TopHat version:
+{{{
+tophat --version
+}}}
+Sample command:
+{{{
+bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
+}}}
+The parameters included in the sample command are:
+  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
+  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
+  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
+  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
+  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
+  * '''-p/--num-threads''' Use this many threads to align reads (default is 1)
+Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
+  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
+  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
+'''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]'''
+'''TopHat version 2 is no longer recommended.'''  The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2].
+TopHat version 2 uses bowtie2, rather than bowtie, for its mapping.  As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt).
+Sample command:
+{{{
+# Single-end reads
+bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt
+# Paired-end reads
+# For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp).  The inner distance, or insert size, does not include length of the reads/mates.  For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.
+bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt
+}}}
+The parameters included in the sample command are:
+  * '''-o/--output-dir <word>'''     All output files will be created in this directory (default = tophat_out)
+  * '''--segment-length <int>'''  Shortest length of a spliced read that can map to one side of the junction.  For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads).  For longer reads, the default length (25) can be used.
+  * '''-I <int>''' Maximum intron length.  If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value.
+  * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models.  This can help bowtie identify functions that may otherwise be missed.
+  * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file.  Not used if looking for novel isoforms.
+  * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand
+Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file).  See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details.
+  * '''--solexa-quals'''         (for input quality scores from Illumina versions 1.2 and earlier)
+  * '''--solexa1.3-quals''' or '''--phred64-quals'''     (for input quality scores from Illumina versions 1.3-1.7)
+  * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding
+Choices for controlling alignment (eg. mismatches)
+  * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2).
+  * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2).
+  * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2).
+  * '''--segment-mismatches'''  Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2).
 == Others ==