| 120 | === Motifs analysis === |
| 121 | |
| 122 | Search motifs with [[http://homer.ucsd.edu/homer/ngs/peakMotifs.html | homer findMotifsGenome.pl]] |
| 123 | |
| 124 | By default, it performs de novo motif discovery as well as check the enrichment of known motifs (By default, known.motifs in the downloaded homer folder is used). |
| 125 | |
| 126 | Note: findMotifsGenome.pl calls tab2fasta.pl, which sharing the same name as one of our BaRC script. Make sure that you calls the script from homer. |
| 127 | |
| 128 | {{{ |
| 129 | findMotifsGenome.pl peak.bed hg38 out_dir -size 300 -S 2 -p 5 -cache 100 -fdr 5 -mask -mknown Jaspar_hs_core_homer.motifs -mcheck Jaspar_hs_core_homer.motifs |
| 130 | |
| 131 | input parameters: |
| 132 | -mask: use the repeat-masked sequence |
| 133 | -size: (default 200). Explanation from homer website: "If analyzing ChIP-Seq peaks from a transcription factor, Chuck would recommend 50 bp for establishing the primary motif bound by a given transcription factor and 200 bp for finding both primary and "co-enriched" motifs for a transcription factor. When looking at histone marked regions, 500-1000 bp is probably a good idea (i.e. H3K4me or H3/H4 acetylated regions). |
| 134 | -mknown <motif file> (known motifs to check for enrichment. |
| 135 | -mcheck <motif file> (known motifs to check against de novo motifs, |
| 136 | -S: Number of motifs to find (default 25) |
| 137 | -p Number of processors to use (default 1) |
| 138 | |
| 139 | }}} |
| 140 | |
| 141 | |
| 142 | To download species specific Jaspar motifs, convert to homer motif format, and save to motif file. |
| 143 | |
| 144 | In this example, download human core motifs from JASPAR2016, and saved to Jaspar_hs_core_homer.motifs |
| 145 | |
| 146 | {{{ |
| 147 | |
| 148 | library(TFBSTools) |
| 149 | |
| 150 | opts["collection"] <- "CORE" |
| 151 | opts["species"] = 9606 |
| 152 | |
| 153 | Jaspar_hs_core <- getMatrixSet(JASPAR2016::JASPAR2016, opts) |
| 154 | |
| 155 | # convert to homer motif format: |
| 156 | |
| 157 | library(universalmotif) |
| 158 | |
| 159 | write_homer (Jaspar_hs_core, file="Jaspar_hs_core_homer.motifs") |
| 160 | |
| 161 | }}} |
| 162 | |
| 163 | |
| 164 | The findMotifsGenome.pl creates two html files, one for de novo identified motifs, the other is known motifs. |
| 165 | |
| 166 | |
| 167 | Annotated motifs with homer [[http://homer.ucsd.edu/homer/ngs/quantification.html | annotatePeaks.pl]] |
| 168 | |
| 169 | |
| 170 | {{{ |
| 171 | annotatePeaks.pl peak.bed hg38 -m input_motifs -mbed motif.bed > annotated_motifs.txt |
| 172 | |
| 173 | Where: |
| 174 | -m: motifs can be combined first and save as a file. In the output file, this will link motifs associated with a peak together. |
| 175 | -mbed <filename> (Output motif positions to a BED file to load at genome browser) |
| 176 | |
| 177 | }}} |
| 178 | |
| 179 | For each peak, it gives the distance to nearest feature, categorized them into promoter, intergenic, intron#, exon#, TSS) |
| 180 | |