Changes between Version 54 and Version 55 of SOPs/chip_seq_peaks
- Timestamp:
- 08/06/20 11:43:21 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/chip_seq_peaks
v54 v55 3 3 4 4 === Basic Approach === 5 * For each sample map the reads to the appropriate genome (e.g. with Bowtie2). 6 * Call peaks (eg. MACS) or use another method to analyze enrichment and/or presumed binding. For ChIP-seq experiments profiling transcription factors (with discrete binding sites), binding site identification (via peak calling) is typically recommended. Peaks can be called without control; however, it's highly recommended to include a control sample (e.g. IgG or input, with input generally preferred over IgG). For ChIP-seq experiments profiling epigenetic (such as histone) modifications, however, modeling ChIP enrichment as peaks may not accurately describe the actual data, and some other (such as sliding window) quantification may be more relevant. 7 * Note that quality control is important after read mapping and after peak calling. The ENCODE consortium recommends some [[https://genome.ucsc.edu/ENCODE/qualityMetrics.html|quality metrics]]. 8 9 * [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_2018/ChIPseq_2018.commands.txt| A simple pipeline can be found in our Hot Topics workshop]]. You need to change it based on your experiment as mentioned below. 5 * [#map Step 1: Map reads] 6 * [#cca Step 2: Perform strand cross correlation analysis] 7 * [#peaks Step 3: Call peaks] 8 * [#repro Step 4: Identify reproducible peaks] (for peak-calling experiments with replication) 9 * [#link Step 5: Link "bound" regions to genomic features] 10 * [#compare Step 6: Compare binding across different samples] 11 12 Note that quality control is important after read mapping and after peak calling. The ENCODE consortium recommends some [[https://genome.ucsc.edu/ENCODE/qualityMetrics.html|quality metrics]]. 13 14 15 A basic analysis pipeline can be found in our [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_2018/ChIPseq_2018.commands.txt| Hot Topics workshop]]. It should be customized, based on your specific experiment, as mentioned below. 10 16 11 17 === Reviews === … … 16 22 [http://genome.cshlp.org/content/22/9/1813.full ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia] (Genome Research, 2012) 17 23 18 === Step 1: Map reads===24 === [=#map Step 1: Map reads] === 19 25 20 26 * Use [[http://bowtie-bio.sourceforge.net/tutorial.shtml|Bowtie]], [[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml|bowtie2]], or another unspliced mapping tool. We recommend using Bowtie2 (default parameters), filtering to remove multi-mapped reads (eg. a mapping quality of 2 or higher) is not needed as Bowtie2 randomly chooses a best hit when multiple good hits exist (see [[http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#reporting | Bowtie2 manual [under Reporting] ]). Low mapping quality reads may need to be removed if using another aligner since MACS does not filter on mapping quality. … … 22 28 * See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for more details. 23 29 24 === Step 2: Perform strand cross correlation analysis===30 === [=#cca Step 2: Perform strand cross correlation analysis] === 25 31 26 32 * The goal of this step is to assess the quality of the IP and to estimate the fragment size of the immunoprecipitated DNA. … … 32 38 * After this analysis a good ChIP-seq experiment will have a second peak (reflecting the fragment size) at least as tall as the first peak (a "phantom" peak reflecting read length). This is how the graph should look: ([[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/figure/F4/|Fig4E]]). If the second peak is smaller than the first, ([[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/figure/F4/|like the example shown in Fig4G Marginal]]), macs will not estimate fragment size correctly. In that case we recommend running macs with parameters "--nomodel" and "--shiftsize=half_of_the_fragment_size", or "--nomodel" and "--extsize=fragment_size". The fragment size is detected on the strand cross correlation analysis. 33 39 34 === Step 3: Call peaks (presumed bound regions), especially for transcription factors === 35 Some of the parameters to consider when comparing programs are: 40 === [=#peaks Step 3: Call peaks] === 41 42 Peak-calling identifies presumed bound regions. This is especially relevant for transcription factors, which tend to have narrow peaks. For other ChIP-seq experiments that assay binding proteins exhibiting broad "peaks" (such as specific histone modifications), identifying "peaks" may be less helpful (a less useful way of representing the data), and some other (such as sliding window) quantification may be more relevant. 43 Peaks can be called without the use of a control sample; however, it's highly recommended to include one (e.g. IgG or input, with input generally preferred over IgG). 44 45 Some of the parameters to consider when comparing peak-calling programs are: 36 46 * Adjustment of sequence tags to better represent the original DNA fragment (by shifting tags in the 3′ direction or by extending tags to the estimated length of the original fragments) 37 47 * Background model used … … 125 135 }}} 126 136 127 SIS Rs input is a bed file. Convert mapped reads from SAM to BAM and from BAM to bed format137 SISSRs input is a bed file. Convert mapped reads from SAM to BAM and from BAM to bed format 128 138 129 139 {{{ … … 150 160 151 161 152 === Step 4 [for peak-calling experiments with replication]: Identify reproducible peaks===162 === [=#repro Step 4 [for peak-calling experiments with replication]: Identify reproducible peaks] === 153 163 154 164 For more information about the method, see the main [[https://sites.google.com/site/anshulkundaje/projects/idr| IDR page]]. … … 203 213 }}} 204 214 205 === Step 5: Link "bound" (or other interesting) regions to genomic features (genes, promoters, enhancers, etc.) === 215 === [=#link Step 5: Link "bound" regions to genomic features] === 216 206 217 Both MACS and SISSRs provide bed files with the set of peaks, presumably indicating bound regions. 207 218 208 To link this regions to genescheck out this SOP: [[genome_regions_annotations|Linking genome regions to genome annotations]]219 To Link "bound" (or other interesting) regions to genomic features (genes, promoters, enhancers, etc.), check out this SOP: [[genome_regions_annotations|Linking genome regions to genome annotations]] 209 220 210 221 Below it is a detailed example of how to do the following annotation: … … 284 295 285 296 286 === Step 6 [if appropriate]: Comparing binding across different samples===297 === [=#compare Step 6: Compare binding across different samples] === 287 298 288 299 * Method 1: Run IDR to identify reproducible peaks