Changes between Version 54 and Version 55 of SOPs/chip_seq_peaks


Ignore:
Timestamp:
08/06/20 11:43:21 (4 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/chip_seq_peaks

    v54 v55  
    33 
    44=== Basic Approach ===
    5  * For each sample map the reads to the appropriate genome (e.g. with Bowtie2).
    6  * Call peaks (eg. MACS) or use another method to analyze enrichment and/or presumed binding.  For ChIP-seq experiments profiling transcription factors (with discrete binding sites), binding site identification (via peak calling) is typically recommended.  Peaks can be called without control; however, it's highly recommended to include a control sample (e.g. IgG or input, with input generally preferred over IgG).  For ChIP-seq experiments profiling epigenetic (such as histone) modifications, however, modeling ChIP enrichment as peaks may not accurately describe the actual data, and some other (such as sliding window) quantification may be more relevant.
    7  * Note that quality control is important after read mapping and after peak calling.  The ENCODE consortium recommends some [[https://genome.ucsc.edu/ENCODE/qualityMetrics.html|quality metrics]].
    8 
    9 * [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_2018/ChIPseq_2018.commands.txt| A simple pipeline can be found in our Hot Topics workshop]]. You need to change it based on your experiment as mentioned below.
     5 * [#map Step 1: Map reads]
     6 * [#cca Step 2: Perform strand cross correlation analysis]
     7 * [#peaks Step 3: Call peaks]
     8 * [#repro Step 4: Identify reproducible peaks] (for peak-calling experiments with replication)
     9 * [#link Step 5: Link "bound" regions to genomic features]
     10 * [#compare Step 6: Compare binding across different samples]
     11
     12Note that quality control is important after read mapping and after peak calling.  The ENCODE consortium recommends some [[https://genome.ucsc.edu/ENCODE/qualityMetrics.html|quality metrics]].
     13
     14
     15A basic analysis pipeline can be found in our [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_2018/ChIPseq_2018.commands.txt| Hot Topics workshop]]. It should be customized, based on your specific experiment, as mentioned below.
    1016
    1117=== Reviews ===
     
    1622[http://genome.cshlp.org/content/22/9/1813.full ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia] (Genome Research, 2012)
    1723
    18 === Step 1: Map reads ===
     24=== [=#map Step 1: Map reads] ===
    1925
    2026 * Use [[http://bowtie-bio.sourceforge.net/tutorial.shtml|Bowtie]], [[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml|bowtie2]], or another unspliced mapping tool. We recommend using Bowtie2 (default parameters), filtering to remove multi-mapped reads (eg. a mapping quality of 2 or higher) is not needed as Bowtie2 randomly chooses a best hit when multiple good hits exist (see [[http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#reporting | Bowtie2 manual [under Reporting] ]).  Low mapping quality reads may need to be removed if using another aligner since MACS does not filter on mapping quality.
     
    2228 * See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for more details.
    2329
    24 === Step 2: Perform strand cross correlation analysis ===
     30=== [=#cca Step 2: Perform strand cross correlation analysis] ===
    2531
    2632 * The goal of this step is to assess the quality of the IP and to estimate the fragment size of the immunoprecipitated DNA.
     
    3238  * After this analysis a good ChIP-seq experiment will have a second peak (reflecting the fragment size) at least as tall as the first peak (a "phantom" peak reflecting read length). This is how the graph should look: ([[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/figure/F4/|Fig4E]]). If the second peak is smaller than the first, ([[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/figure/F4/|like the example shown in Fig4G Marginal]]),  macs will not estimate fragment size correctly. In that case we recommend running macs with parameters "--nomodel" and "--shiftsize=half_of_the_fragment_size", or "--nomodel" and "--extsize=fragment_size". The fragment size is detected on the strand cross correlation analysis.
    3339 
    34 === Step 3: Call peaks (presumed bound regions), especially for transcription factors ===
    35 Some of the parameters to consider when comparing programs are:
     40=== [=#peaks Step 3: Call peaks]  ===
     41
     42Peak-calling identifies presumed bound regions.  This is especially relevant for transcription factors, which tend to have narrow peaks.  For other ChIP-seq experiments that assay binding proteins exhibiting broad "peaks" (such as specific histone modifications), identifying "peaks" may be less helpful (a less useful way of representing the data), and some other (such as sliding window) quantification may be more relevant.
     43Peaks can be called without the use of a control sample; however, it's highly recommended to include one (e.g. IgG or input, with input generally preferred over IgG).
     44
     45Some of the parameters to consider when comparing peak-calling programs are:
    3646  * Adjustment of sequence tags to better represent the original DNA fragment (by shifting tags in the 3′ direction or by extending tags to the estimated length of the original fragments)
    3747  * Background model used
     
    125135}}}
    126136
    127 SISRs input is a bed file. Convert mapped reads from SAM to BAM and from BAM to bed format
     137SISSRs input is a bed file. Convert mapped reads from SAM to BAM and from BAM to bed format
    128138
    129139{{{
     
    150160
    151161
    152 === Step 4 [for peak-calling experiments with replication]: Identify reproducible peaks ===
     162=== [=#repro Step 4 [for peak-calling experiments with replication]: Identify reproducible peaks] ===
    153163
    154164For more information about the method, see the main [[https://sites.google.com/site/anshulkundaje/projects/idr| IDR page]].
     
    203213}}}
    204214
    205 === Step 5: Link "bound" (or other interesting) regions to genomic features (genes, promoters, enhancers, etc.) ===
     215=== [=#link Step 5: Link "bound" regions to genomic features] ===
     216
    206217Both MACS and SISSRs provide bed files with the set of peaks, presumably indicating bound regions.
    207218
    208 To link this regions to genes check out this SOP: [[genome_regions_annotations|Linking genome regions to genome annotations]]
     219To Link "bound" (or other interesting) regions to genomic features (genes, promoters, enhancers, etc.), check out this SOP: [[genome_regions_annotations|Linking genome regions to genome annotations]]
    209220
    210221Below it is a detailed example of how to do the following annotation:
     
    284295
    285296
    286 === Step 6 [if appropriate]: Comparing binding across different samples ===
     297=== [=#compare Step 6: Compare binding across different samples] ===
    287298
    288299  * Method 1: Run IDR to identify reproducible peaks