wiki:SOPs/trimming_mapping_shortReads

Remove linker (adapter) RNA:

  • What is the sequence of the linker (adapter) to be removed?
    • Biologists generally know which linker (adapter) RNA is used for their sample(s).
    • Also or in addition, when you run quality control with shortRead or FASTQC, check out
      • repetitive segments in the "over represented sequences" section.
      • "Per base sequence content" for any patterns at the beginning of your reads
    • See fastx_clipper usage (or fastx_clipper -h) for more arguments
  • sample command:
bsub "fastx_clipper -a CTGTAGGCACCATCAAT -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt"
In the above command: 
   -a CTGTAGGCACCATCAAT is the linker sequence
   -i  s2_sequence.txt is input solexa fastq file
   -v is Verbose [report number of sequences in output and discarded]
   -l 22 is to discard sequences shorter than 22 nucleotides
   -o s2_ sequence_noLinker.txt is output file.
  • If you get the message "Invalid quality score value..." you have the older range of quality scores.
    • Add the argument -Q 33, such as
    • fastx_clipper -a CTGTAGGCACCATCAAT -Q 33 -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt

Trim reads

  • If we have reads of different lengths (i.e. because we clipped out the adapter sequences), we can trim them to have them all be the same length. Use fastx_trimmer for that.
  • sample command:

bsub "fastx_trimmer -f 1 -l 22  -i s7_sequence_clipped.txt -o  s7_sequence_clipped_trimmed.txt"
      
[-i INFILE]  = FASTA/Q input file. default is STDIN.
[-o OUTFILE] = FASTA/Q output file. default is STDOUT.
[-l N] = Last base to keep 
[-f N] = First base to keep. Default is 1 (=first base).

Note: See TracWiki for help on using the wiki.