=== De novo search for overrepresented DNA motifs that could represent Transcription Factor Binding Sites (TFBS)===
Different methods have different ways of sample the DNA motifs and estimate the overrepresentation of the motifs.  [http://barcwiki/wiki/SOP/PatternsMotifs Searching for all sites] is a related but different task.

Below are links to review articles:

 * [[http://www.springerlink.com/content/k712222066900072/#section=93033&page=1|Discovering sequence motifs]]
 * [[http://www.biomedcentral.com/1471-2105/8/S7/S21#IDAW3YLD|A survey of DNA motif finding algorithms]]

==== MEME ====
Based on expectation maximization (deterministic optimization).  Spurious motifs can be reduced by filtering the input sequences, for example based on fold enrichment and/or reducing the sequence length (eg. ~200bp regions within the summit) from MACS for TFs in ChIP-Seq data.

Sample commands: 

{{{
meme testsmall.FA -oc TEST-OUT -dna
meme seq.fa -minw 6 -maxw 50 -mod oops
# Look for 5 motifs of width 5-15, with 0 or 1 motifs per sequence expected
meme Promoters.fa -dna -oc . -mod zoops -nmotifs 5 -minw 5 -maxw 15 -revcomp
}}}

{{{
[-oc <output dir>]      name of directory for output files will replace existing directory
[-dna]                  sequences use DNA alphabet
[-minw <minw>]          minumum motif width
[-maxw <maxw>]          maximum motif width
[-mod oops|zoops|anr]   distribution of motifs
     oops    One per sequence
     zoops   Zero or one per sequence
     anr     Any number
}}}

Tomtom can then be run to compare MEME motifs to database(s) of known motifs.  It's part of the MEME suite.

{{{
tomtom -no-ssc -verbosity 1 -min-overlap 5 -dist pearson -evalue -thresh 10.0 -o tomtom_out meme_out/meme.txt /nfs/BaRC_datasets/MEME_matrix_databases/Jaspar.meme.2016.txt /nfs/BaRC_datasets/MEME_matrix_databases/MotifDb.matrices.txt /nfs/BaRC_datasets/MEME_matrix_databases/Transfac_2014.1.dat.txt
}}}

==== MEME-ChIP ====
Motif Analysis of Large DNA Datasets. It is especially appropriate for analyzing the bound genomic regions identified in a transcription factor (TF) ChIP-seq experiment.  Note, MEME-ChIP pre-processes the data around the center of the region, "Prior to motif discovery and motif enrichment analysis, MEME-ChIP centers and trims each sequence to 100 bp; the full-length sequences are used in the subsequent motif visualization step." [[http://bioinformatics.oxfordjournals.org/content/27/12/1696.full | MEME-ChIP]]


[[http://meme.nbcr.net/meme/memechip-intro.html|MEME-ChIP Documentation]]

[[http://meme.nbcr.net/meme/cgi-bin/meme-chip.cgi|MEME-ChIP Submission form]]



Sample files:
[[enrichFileTest|enrichFileTest]]
[[AllSequencesTest.txt|AllSequencesTest.txt]][[br]][[br]]



=== De novo search for all DNA motifs that could represent Transcription Factor Binding Sites (TFBS)  ===

Source of the Motifs:
       * Databases such as TRANSFAC or JASPAR
       * Protein binding arrays (PBM).
       * TFBS prediction programs.
  
Depending on the source of the motif, the program used to scan for potential binding sites may be different.

[http://gene-regulation.com/Match_command_line.txt TRANSFAC's match] - for transcription factor binding sites
  * commercial application requiring a license for the most up-to-date version
  * Whitehead only: See BaRC_datasets/Transfac for the command-line program and (old) data files
{{{
# Search using all Transfac profiles
match matrix.dat MyPromoters.fa MyPromoters.match_out.txt minSUM_good.prf

# Search using a subset of profiles
match matrix.dat MyPromoters.fa MyPromoters.vert.match_out.txt vertebrate_non_redundant_minSUM.prf
}}}
  * Publication: [http://www.ncbi.nlm.nih.gov/pubmed/12824369 Kel et al., 2003]
  * Public web site (older data): http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi

For position weight matrices (PWM) or regular expressions we can use programs like MAST. Most prediction programs have a setting to scan for TFBS for a given motif.
   
      **Example of mast commands:**
{{{
mast motif.txt  Sequence.fasta
mast p53_BMC.txt  Promoters.fasta
 }}}