Changes between Initial Version and Version 1 of SOPs/trimming_mapping_shortReads


Ignore:
Timestamp:
01/23/13 16:49:43 (13 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/trimming_mapping_shortReads

    v1 v1  
     1= Remove linker (adapter) RNA: =
     2  * What is the sequence of the linker (adapter) to be removed?
     3    * Biologists generally know which linker (adapter) RNA is used for their sample(s).
     4    * Also or in addition, when you run quality control with shortRead or FASTQC, check out
     5         * repetitive segments in the "over represented sequences" section.
     6         * "Per base sequence content" for any patterns at the beginning of your reads
     7    * See [[http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage|fastx_clipper usage]] (or ''fastx_clipper -h'') for more arguments
     8  * sample command:
     9
     10{{{
     11bsub "fastx_clipper -a CTGTAGGCACCATCAAT -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt"
     12In the above command:
     13   -a CTGTAGGCACCATCAAT is the linker sequence
     14   -i  s2_sequence.txt is input solexa fastq file
     15   -v is Verbose [report number of sequences in output and discarded]
     16   -l 22 is to discard sequences shorter than 22 nucleotides
     17   -o s2_ sequence_noLinker.txt is output file.
     18}}}
     19
     20
     21  * If you get the message "Invalid quality score value..." you have the older range of quality scores.
     22    * Add the argument -Q 33, such as
     23    * fastx_clipper -a CTGTAGGCACCATCAAT -Q 33 -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt
     24
     25= Trim reads =
     26   * If we have reads of different lengths (//i.e.// because we clipped out the adapter sequences), we can trim them to have them all be the same length. Use **fastx_trimmer** for that.
     27   * sample command:
     28
     29 
     30{{{
     31bsub "fastx_trimmer -f 1 -l 22  -i s7_sequence_clipped.txt -o  s7_sequence_clipped_trimmed.txt"
     32     
     33[-i INFILE]  = FASTA/Q input file. default is STDIN.
     34[-o OUTFILE] = FASTA/Q output file. default is STDOUT.
     35[-l N] = Last base to keep
     36[-f N] = First base to keep. Default is 1 (=first base).
     37
     38}}}