Remove linker (adapter) RNA:
- What is the sequence of the linker (adapter) to be removed?
- Biologists generally know which linker (adapter) RNA is used for their sample(s).
- Also or in addition, when you run quality control with shortRead or FASTQC, check out
- repetitive segments in the "over represented sequences" section.
- "Per base sequence content" for any patterns at the beginning of your reads
- See fastx_clipper usage (or fastx_clipper -h) for more arguments
- sample command:
bsub "fastx_clipper -a CTGTAGGCACCATCAAT -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt" In the above command: -a CTGTAGGCACCATCAAT is the linker sequence -i s2_sequence.txt is input solexa fastq file -v is Verbose [report number of sequences in output and discarded] -l 22 is to discard sequences shorter than 22 nucleotides -o s2_ sequence_noLinker.txt is output file.
- If you get the message "Invalid quality score value..." you have the older range of quality scores.
- Add the argument -Q 33, such as
- fastx_clipper -a CTGTAGGCACCATCAAT -Q 33 -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt
Trim reads
- If we have reads of different lengths (i.e. because we clipped out the adapter sequences), we can trim them to have them all be the same length. Use fastx_trimmer for that.
- sample command:
bsub "fastx_trimmer -f 1 -l 22 -i s7_sequence_clipped.txt -o s7_sequence_clipped_trimmed.txt" [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. [-l N] = Last base to keep [-f N] = First base to keep. Default is 1 (=first base).
Note:
See TracWiki
for help on using the wiki.