Changes between Version 8 and Version 9 of SOPs/enriched_tf_binding_sites


Ignore:
Timestamp:
11/04/20 14:37:34 (4 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/enriched_tf_binding_sites

    v8 v9  
    4646
    4747
    48 
    49 ==== FIRE ====
    50 It can only be applied to several distinct groups of sequences that have a common feature that could come from specific binding of TFs (expression pattern, being bound by a TF, etc). Motifs are selected based on how informative they are in predicting one or more of the group of sequences.  Saying that a motif is overrepresented in a group means that it is overrepresented in that group versus the other groups of sequences. It doesn't mean it is overrepresented in one group versus the background of that group or that organism. FIRE doesn't make any assumptions about the background sequences and it doesn't have to model the background. It is background independent. It is very different to other prediction programs. FIRE is based in mutual information.
    51 
    52 [[https://tavazoielab.c2b2.columbia.edu/FIRE/|FIRE Web site]] [[https://tavazoielab.c2b2.columbia.edu/lab/publications/Elemento_etal_Mol_Cell_2007.pdf|FIRE Paper[PDF]]]
    53 
    54 
    55 Sequences are divided into several groups.// i.e.// Corresponding to different expression profiles, or bound by different TFs in ChIP-Seq experiments.
    56 One input files has the all sequences in fasta format; the other input file has a list of the sequence names follow by the group they belong to, like:
    57 
    58 ID           cluster
    59 
    60 sequenceName1   1
    61 sequenceName2   1
    62 sequenceName3   2
    63 sequenceName4   2
    64 
    65 
    66 This is a sample command.
    67 
    68 {{{
    69 fire.pl  --expfiles=enrichFileTest  --exptype=discrete --fastafile_dna=AllSequencesTest.txt --nodups=1
    70 }}}
    71 
    72 This script generates the file specifying the groups (enrichment file) and the file containing the sequences: [[PrepareFilesForFIRE_keepall.txt|"PrepareFilesForFIRE_keepall.pl"]]
    73  
    74 
    7548Sample files:
    7649[[enrichFileTest|enrichFileTest]]
    7750[[AllSequencesTest.txt|AllSequencesTest.txt]][[br]][[br]]
    7851
    79 ==== OTHER PROGRAMS YOU MAY WANT TO EXPLORE ====
    80 ===== Amadeus =====
    81 [[http://bioinfo-out.curie.fr/training/CGH-PATHWAYworkshop/pathway_charting_materials/amadeusPaper.pdf|Amadeus paper]]
    82 
    83 ===== Weeder and YMF =====
    84 Enumerate the n-mers and look for overrepresentation of the n-mers versus background.  [[http://iona/barcwiki/doku.php?id=identifying_all_and_or_enriched_transcription_factor_binding_sites|See review articles]]
    85  
    86 
    87 ===== Gibbs sampling =====
    88 Is Based on probabilistic optimization.  [[http://iona/barcwiki/doku.php?id=identifying_all_and_or_enriched_transcription_factor_binding_sites|See review articles]]
    89  
    90 ===== ConTra =====
    91 
    92 Combines multiz alignments (from UCSC) and PWMs from JASPAR and TRANSFAC, to predict TFBS. [[http://bioit.dmbr.ugent.be/contrav2/index.php|ConTra]]
    9352
    9453
    95 ===== TRAP =====
     54=== De novo search for all DNA motifs that could represent Transcription Factor Binding Sites (TFBS)  ===
    9655
    97 [[http://trap.molgen.mpg.de/cgi-bin/home.cgi | TF Affinity Prediction (TRAP)]], uses binding affinities to predict association between TF and co-regulated genes. [[http://bioinformatics.oxfordjournals.org/content/25/4/435.full|PASTAA: identifying transcription factors associated with sets of co-regulated genes]]
    98 
    99 
    100 ===== RSA-Tools: Peak-motifs =====
    101 [[http://rsat.ulb.ac.be/peak-motifs_form.cgi| RSA-Tools: Peak-motifs]], discover motifs in ChIP-Seq peak sequences.
    102 
    103 
    104 ===== TAMO ======
    105 
    106 [[http://fraenkel.mit.edu/TAMO/ | TAMO ]]: motif discovery package (incl. interfaces to other motif searching eg. MEME) along with integration of expression and other databases.
    107 
    108 ===== WebMOTIFS =====
    109 
    110 [[http://fraenkel.mit.edu/webmotifs.html | WebMOTIFS ]]: motif discovery using TAMO and other tools (eg. MEME).
    111 
    112 === Scan for TFBS using known motifs  ===
    113 1. Source of the Motifs:
    114        * Databases: Transfac, Jaspar, other.
     56Source of the Motifs:
     57       * Databases such as TRANSFAC or JASPAR
    11558       * Protein binding arrays (PBM).
    11659       * TFBS prediction programs.
    11760 
    118        
    119 2. Depending on the source of the motif the program used to scan will be different.
    120     * For position weight matrices (PWM) or regular expressions we can use programs like MAST or FIRE. Most prediction programs have a setting to scan for TFBS for a given motif.
     61Depending on the source of the motif, the program used to scan for potential binding sites may be different.
     62
     63[http://www.biobase-international.com/wp-content/uploads/2012/03/Match_command_line.txt TRANSFAC's match] - for transcription factor binding sites
     64  * commercial application requiring a license for the most up-to-date version
     65  * Whitehead only: See BaRC_datasets/Transfac for the command-line program and data files
     66{{{
     67# Search using all Transfac profiles
     68match matrix.dat MyPromoters.fa MyPromoters.match_out.txt minSUM_good.prf
     69# Search using a subset of profiles
     70match matrix.dat MyPromoters.fa MyPromoters.vert.match_out.txt vertebrate_non_redundant_minSUM.prf
     71}}}
     72  * Publication: [http://www.ncbi.nlm.nih.gov/pubmed/12824369 Kel et al., 2003]
     73  * Public web site (older data): http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi
     74
     75For position weight matrices (PWM) or regular expressions we can use programs like MAST. Most prediction programs have a setting to scan for TFBS for a given motif.
    12176   
    122       **Example of a mast command:** (files are on the system_testing -> Aug2010_Testing folder)
     77      **Example of mast commands:**
    12378{{{
    12479mast motif.txt  Sequence.fasta
    125 mast p53_BMC.txt  Sengupta.fasta
     80mast p53_BMC.txt  Promoters.fasta
    12681 }}}
    127      
    128 **Example of a FIRE command**
    129 {{{
    130 fire.pl --expfiles=groups.txt --exptype=discrete --fastafile_dna=FileWithSeqsFASTa.txt --nodups=1 --doskipdiscovery=1 --motiffile_dna=dnamotifs.txt
    131  }}}
    132      
    133 INPUT FILES[[br]][[br]]
    134      
    135 //groups.txt// defines the sequences in each group
    136      
    137 {{{
    138 ID           cluster
    139 sequenceName1   1
    140 sequenceName2   1
    141 sequenceName3   2
    142 sequenceName4   2
    143 }}}
    144      
    145 //FileWithSeqsFASTa.txt//:
    146 has all the sequences in fasta format[[br]][[br]]
    147      
    148 //dnamotifs.txt//:
    149 has the DNA motifs //i.e.// .AGATA[AT]..
    150 
    151 
    152 
    153 
    154     * For other inputs like files coming from PBM, using PWM is a simplification that throws out part of the data. It is more appropriate to use specific script.
     82