| | 1 | = Remove linker (adapter) RNA: = |
| | 2 | * What is the sequence of the linker (adapter) to be removed? |
| | 3 | * Biologists generally know which linker (adapter) RNA is used for their sample(s). |
| | 4 | * Also or in addition, when you run quality control with shortRead or FASTQC, check out |
| | 5 | * repetitive segments in the "over represented sequences" section. |
| | 6 | * "Per base sequence content" for any patterns at the beginning of your reads |
| | 7 | * See [[http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage|fastx_clipper usage]] (or ''fastx_clipper -h'') for more arguments |
| | 8 | * sample command: |
| | 9 | |
| | 10 | {{{ |
| | 11 | bsub "fastx_clipper -a CTGTAGGCACCATCAAT -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt" |
| | 12 | In the above command: |
| | 13 | -a CTGTAGGCACCATCAAT is the linker sequence |
| | 14 | -i s2_sequence.txt is input solexa fastq file |
| | 15 | -v is Verbose [report number of sequences in output and discarded] |
| | 16 | -l 22 is to discard sequences shorter than 22 nucleotides |
| | 17 | -o s2_ sequence_noLinker.txt is output file. |
| | 18 | }}} |
| | 19 | |
| | 20 | |
| | 21 | * If you get the message "Invalid quality score value..." you have the older range of quality scores. |
| | 22 | * Add the argument -Q 33, such as |
| | 23 | * fastx_clipper -a CTGTAGGCACCATCAAT -Q 33 -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt |
| | 24 | |
| | 25 | = Trim reads = |
| | 26 | * If we have reads of different lengths (//i.e.// because we clipped out the adapter sequences), we can trim them to have them all be the same length. Use **fastx_trimmer** for that. |
| | 27 | * sample command: |
| | 28 | |
| | 29 | |
| | 30 | {{{ |
| | 31 | bsub "fastx_trimmer -f 1 -l 22 -i s7_sequence_clipped.txt -o s7_sequence_clipped_trimmed.txt" |
| | 32 | |
| | 33 | [-i INFILE] = FASTA/Q input file. default is STDIN. |
| | 34 | [-o OUTFILE] = FASTA/Q output file. default is STDOUT. |
| | 35 | [-l N] = Last base to keep |
| | 36 | [-f N] = First base to keep. Default is 1 (=first base). |
| | 37 | |
| | 38 | }}} |