| | 120 | === Motifs analysis === |
| | 121 | |
| | 122 | Search motifs with [[http://homer.ucsd.edu/homer/ngs/peakMotifs.html | homer findMotifsGenome.pl]] |
| | 123 | |
| | 124 | By default, it performs de novo motif discovery as well as check the enrichment of known motifs (By default, known.motifs in the downloaded homer folder is used). |
| | 125 | |
| | 126 | Note: findMotifsGenome.pl calls tab2fasta.pl, which sharing the same name as one of our BaRC script. Make sure that you calls the script from homer. |
| | 127 | |
| | 128 | {{{ |
| | 129 | findMotifsGenome.pl peak.bed hg38 out_dir -size 300 -S 2 -p 5 -cache 100 -fdr 5 -mask -mknown Jaspar_hs_core_homer.motifs -mcheck Jaspar_hs_core_homer.motifs |
| | 130 | |
| | 131 | input parameters: |
| | 132 | -mask: use the repeat-masked sequence |
| | 133 | -size: (default 200). Explanation from homer website: "If analyzing ChIP-Seq peaks from a transcription factor, Chuck would recommend 50 bp for establishing the primary motif bound by a given transcription factor and 200 bp for finding both primary and "co-enriched" motifs for a transcription factor. When looking at histone marked regions, 500-1000 bp is probably a good idea (i.e. H3K4me or H3/H4 acetylated regions). |
| | 134 | -mknown <motif file> (known motifs to check for enrichment. |
| | 135 | -mcheck <motif file> (known motifs to check against de novo motifs, |
| | 136 | -S: Number of motifs to find (default 25) |
| | 137 | -p Number of processors to use (default 1) |
| | 138 | |
| | 139 | }}} |
| | 140 | |
| | 141 | |
| | 142 | To download species specific Jaspar motifs, convert to homer motif format, and save to motif file. |
| | 143 | |
| | 144 | In this example, download human core motifs from JASPAR2016, and saved to Jaspar_hs_core_homer.motifs |
| | 145 | |
| | 146 | {{{ |
| | 147 | |
| | 148 | library(TFBSTools) |
| | 149 | |
| | 150 | opts["collection"] <- "CORE" |
| | 151 | opts["species"] = 9606 |
| | 152 | |
| | 153 | Jaspar_hs_core <- getMatrixSet(JASPAR2016::JASPAR2016, opts) |
| | 154 | |
| | 155 | # convert to homer motif format: |
| | 156 | |
| | 157 | library(universalmotif) |
| | 158 | |
| | 159 | write_homer (Jaspar_hs_core, file="Jaspar_hs_core_homer.motifs") |
| | 160 | |
| | 161 | }}} |
| | 162 | |
| | 163 | |
| | 164 | The findMotifsGenome.pl creates two html files, one for de novo identified motifs, the other is known motifs. |
| | 165 | |
| | 166 | |
| | 167 | Annotated motifs with homer [[http://homer.ucsd.edu/homer/ngs/quantification.html | annotatePeaks.pl]] |
| | 168 | |
| | 169 | |
| | 170 | {{{ |
| | 171 | annotatePeaks.pl peak.bed hg38 -m input_motifs -mbed motif.bed > annotated_motifs.txt |
| | 172 | |
| | 173 | Where: |
| | 174 | -m: motifs can be combined first and save as a file. In the output file, this will link motifs associated with a peak together. |
| | 175 | -mbed <filename> (Output motif positions to a BED file to load at genome browser) |
| | 176 | |
| | 177 | }}} |
| | 178 | |
| | 179 | For each peak, it gives the distance to nearest feature, categorized them into promoter, intergenic, intron#, exon#, TSS) |
| | 180 | |