= Remove linker (adapter) RNA: = * What is the sequence of the linker (adapter) to be removed? * Biologists generally know which linker (adapter) RNA is used for their sample(s). * Also or in addition, when you run quality control with shortRead or FASTQC, check out * repetitive segments in the "over represented sequences" section. * "Per base sequence content" for any patterns at the beginning of your reads * See [[http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage|fastx_clipper usage]] (or ''fastx_clipper -h'') for more arguments * sample command: {{{ bsub "fastx_clipper -a CTGTAGGCACCATCAAT -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt" In the above command: -a CTGTAGGCACCATCAAT is the linker sequence -i s2_sequence.txt is input solexa fastq file -v is Verbose [report number of sequences in output and discarded] -l 22 is to discard sequences shorter than 22 nucleotides -o s2_ sequence_noLinker.txt is output file. }}} * If you get the message "Invalid quality score value..." you have the older range of quality scores. * Add the argument -Q 33, such as * fastx_clipper -a CTGTAGGCACCATCAAT -Q 33 -i s2_sequence.txt -v -l 22 -o s2_sequence_noLinker.txt = Trim reads = * If we have reads of different lengths (//i.e.// because we clipped out the adapter sequences), we can trim them to have them all be the same length. Use **fastx_trimmer** for that. * sample command: {{{ bsub "fastx_trimmer -f 1 -l 22 -i s7_sequence_clipped.txt -o s7_sequence_clipped_trimmed.txt" [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. [-l N] = Last base to keep [-f N] = First base to keep. Default is 1 (=first base). }}}