| 137 | | |
| 138 | | '''tophat version 1 (old)''' |
| 139 | | |
| 140 | | '''TopHat version 1 is no longer recommended.''' |
| 141 | | Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session. First run this command: |
| 142 | | {{{ |
| 143 | | export PATH="/usr/local/share/tophat1:$PATH" |
| 144 | | }}} |
| 145 | | and then check that your terminal will use the correct TopHat version: |
| 146 | | {{{ |
| 147 | | tophat --version |
| 148 | | }}} |
| 149 | | |
| 150 | | Sample command: |
| 151 | | {{{ |
| 152 | | bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 153 | | }}} |
| 154 | | |
| 155 | | The parameters included in the sample command are: |
| 156 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 157 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 158 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 159 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 160 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 161 | | * '''-p/--num-threads''' Use this many threads to align reads (default is 1) |
| 162 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 163 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 164 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 165 | | |
| 166 | | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
| 167 | | |
| 168 | | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| 169 | | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| 170 | | |
| 171 | | Sample command: |
| 172 | | {{{ |
| 173 | | # Single-end reads |
| 174 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 175 | | # Paired-end reads |
| 176 | | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| 177 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| 178 | | }}} |
| 179 | | |
| 180 | | The parameters included in the sample command are: |
| 181 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 182 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 183 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 184 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 185 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 186 | | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| 187 | | |
| 188 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 189 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 190 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 191 | | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| 192 | | |
| 193 | | Choices for controlling alignment (eg. mismatches) |
| 194 | | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| 195 | | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| 196 | | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| 197 | | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| | 215 | '''tophat version 1 (old)''' |
| | 216 | |
| | 217 | '''TopHat version 1 is no longer recommended.''' |
| | 218 | Running TopHat version 1 requires a change to a user's environment on tak and only applies to the specific tak session. First run this command: |
| | 219 | {{{ |
| | 220 | export PATH="/usr/local/share/tophat1:$PATH" |
| | 221 | }}} |
| | 222 | and then check that your terminal will use the correct TopHat version: |
| | 223 | {{{ |
| | 224 | tophat --version |
| | 225 | }}} |
| | 226 | |
| | 227 | Sample command: |
| | 228 | {{{ |
| | 229 | bsub tophat -o s_7_tophat_out -p 6 --phred64-quals --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| | 230 | }}} |
| | 231 | |
| | 232 | The parameters included in the sample command are: |
| | 233 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| | 234 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| | 235 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| | 236 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| | 237 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| | 238 | * '''-p/--num-threads''' Use this many threads to align reads (default is 1) |
| | 239 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| | 240 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| | 241 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| | 242 | |
| | 243 | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
| | 244 | |
| | 245 | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| | 246 | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| | 247 | |
| | 248 | Sample command: |
| | 249 | {{{ |
| | 250 | # Single-end reads |
| | 251 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| | 252 | # Paired-end reads |
| | 253 | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| | 254 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| | 255 | }}} |
| | 256 | |
| | 257 | The parameters included in the sample command are: |
| | 258 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| | 259 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| | 260 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| | 261 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| | 262 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| | 263 | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| | 264 | |
| | 265 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| | 266 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| | 267 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| | 268 | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| | 269 | |
| | 270 | Choices for controlling alignment (eg. mismatches) |
| | 271 | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| | 272 | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| | 273 | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| | 274 | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| | 275 | |