137 | | |
138 | | '''tophat version 1 (old)''' |
139 | | |
140 | | '''TopHat version 1 is no longer recommended.''' |
141 | | Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session. First run this command: |
142 | | {{{ |
143 | | export PATH="/usr/local/share/tophat1:$PATH" |
144 | | }}} |
145 | | and then check that your terminal will use the correct TopHat version: |
146 | | {{{ |
147 | | tophat --version |
148 | | }}} |
149 | | |
150 | | Sample command: |
151 | | {{{ |
152 | | bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
153 | | }}} |
154 | | |
155 | | The parameters included in the sample command are: |
156 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
157 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
158 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
159 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
160 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
161 | | * '''-p/--num-threads''' Use this many threads to align reads (default is 1) |
162 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
163 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
164 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
165 | | |
166 | | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
167 | | |
168 | | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
169 | | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
170 | | |
171 | | Sample command: |
172 | | {{{ |
173 | | # Single-end reads |
174 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
175 | | # Paired-end reads |
176 | | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
177 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
178 | | }}} |
179 | | |
180 | | The parameters included in the sample command are: |
181 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
182 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
183 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
184 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
185 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
186 | | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
187 | | |
188 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
189 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
190 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
191 | | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
192 | | |
193 | | Choices for controlling alignment (eg. mismatches) |
194 | | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
195 | | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
196 | | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
197 | | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| 215 | '''tophat version 1 (old)''' |
| 216 | |
| 217 | '''TopHat version 1 is no longer recommended.''' |
| 218 | Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session. First run this command: |
| 219 | {{{ |
| 220 | export PATH="/usr/local/share/tophat1:$PATH" |
| 221 | }}} |
| 222 | and then check that your terminal will use the correct TopHat version: |
| 223 | {{{ |
| 224 | tophat --version |
| 225 | }}} |
| 226 | |
| 227 | Sample command: |
| 228 | {{{ |
| 229 | bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 230 | }}} |
| 231 | |
| 232 | The parameters included in the sample command are: |
| 233 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 234 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 235 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 236 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 237 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 238 | * '''-p/--num-threads''' Use this many threads to align reads (default is 1) |
| 239 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 240 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 241 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 242 | |
| 243 | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
| 244 | |
| 245 | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| 246 | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| 247 | |
| 248 | Sample command: |
| 249 | {{{ |
| 250 | # Single-end reads |
| 251 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 252 | # Paired-end reads |
| 253 | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| 254 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| 255 | }}} |
| 256 | |
| 257 | The parameters included in the sample command are: |
| 258 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 259 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 260 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 261 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 262 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 263 | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| 264 | |
| 265 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 266 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 267 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 268 | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| 269 | |
| 270 | Choices for controlling alignment (eg. mismatches) |
| 271 | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| 272 | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| 273 | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| 274 | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| 275 | |