Version 2 (modified by 9 years ago) ( diff ) | ,
---|
Searching for patterns or motifs in a DNA or protein sequence
This is a traditional bioinformatics task, any many tools do this in a variety of ways. One main determinant of tool is your representation of what you're looking for.
Search for a pattern (text, with optional choices at some positions)
dreg (EMBOSS suite) - for nucleic acids (where "pattern" is a regular expression)
dreg -pattern "GGCC[ACGT]" -sequence My_promoters.fa -outfile My_promoters.GGCCN.dreg_out.txt
preg (EMBOSS suite) - for proteins (where "pattern" is a regular expression)
dreg -pattern "LPE[ACS]G" -sequence My_proteins.fa -outfile My_proteins.fa.LPEMG.preg_out.txt
fuzznuc (EMBOSS suite) - for nucleic acids (where "pmismatch" is the number of mismatches in the pattern)
fuzznuc -pattern "nnnGGCCTnnn" -sequence My_promoters.fa -pmismatch 1 -outfile My_promoters.GGCCT.1mis.fuzznuc_out.txt
fuzzpro (EMBOSS suite) - for proteins (where "pmismatch" is the number of mismatches in the pattern)
fuzzpro -pattern "xxxxLPEAGxxxx" -sequence My_proteins.fa -pmismatch 1 -outfile My_proteins.LPEAG.1mis.fuzzpro_out.txt
Search for a motif (a probability matrix, with optional choices at all positions)
Note:
See TracWiki
for help on using the wiki.