Changes between Version 25 and Version 26 of SOPs/InProgress


Ignore:
Timestamp:
05/15/14 09:36:25 (11 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/InProgress

    v25 v26  
    1 == Get reproducible peaks from multiple ChIP-seq replicates with IDR (after using macs for peak calling) ==
    2 
    3 For more information about the method, see the main [[https://sites.google.com/site/anshulkundaje/projects/idr| IDR page]].
    4 
    5 === Run macs2 (macs version 2, not version 1) to call ChIP-seq peaks. ===
    6 
    7 The IDR analysis requires that you call lots of peaks, including all "good" ones (signal) and some bad ones (noise).
    8 
    9 Adjust the fragment length with --nomodel --shiftsize if NSC/RSC plots are not good.
    10 
    11 {{{
    12 # -B => create bedgraph output files
    13 # -p 1e -3 => Set p-value cutoff to 1e-3 (which is more relaxed than the default setting)
    14 
    15 bsub macs2 callpeak -t IP_1.bam -c control_1.bam -f BAM -g hs -n IP.1_vs_control.1 -B -p 1e-3
    16 bsub macs2 callpeak -t IP_2.bam -c control_2.bam -f BAM -g hs -n IP.2_vs_control.2 -B -p 1e-3
    17 }}}
    18 
    19 === Sort peaks in .narrowPeak files (created by macs2) ===
    20 
    21 Sort from best to worst using the -log10(p-value) column (column 8), and only keep the top 100k peaks (at most).
    22 
    23 {{{
    24 bsub "sort -k 8,8nr IP.1_vs_control.1_peaks.narrowPeak | head -n 100000 | gzip -c >| IP.1_vs_control.1.regionPeak.gz"
    25 bsub "sort -k 8,8nr IP.2_vs_control.2_peaks.narrowPeak | head -n 100000 | gzip -c >| IP.2_vs_control.2.regionPeak.gz"
    26 }}}
    27 
    28 === Estimate Irreproducibility Discovery Rate (IDR) between replicates ===
    29 
    30 The interface of this command has been slightly modified by BaRC, but the main method remains exactly the same.
    31 
    32 The command requires names and lengths of all chromosomes in a file (ex: chromInfo) of format chromosome<tab>size.
    33 
    34 {{{
    35 # USAGE: batch-consistency-analysis.r [peakfile1] [peakfile2] [peak.half.width] [outfile.prefix] [min.overlap.ratio] [is.broadpeak] [ranking.measure]\
    36 #
    37 # peak.half.width => set the peak.half.width to the reported peak width in the peak files (-1 (minus one) is recommended)
    38 # outfile.prefix => the one-word name of your analysis
    39 # min.overlap.ratio => fractional bp overlap (0 to 1) between peaks in replicates to be considered as overlapping peaks. (0 is recommended) 
    40 # is.broadpeak => Is the peak file format broadPeak (or else is narrowPeak)? (F for false)  The IDR method is not recommended for broad peaks.
    41 # ranking.measure => p.value is recommended
    42 
    43 batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt
    44 }}}
    45 
    46 === Create the IDR plots ===
    47 {{{
    48 # USAGE: batch-consistency-plot.r [npairs] [output.prefix] [input.file.prefix1] [input.file.prefix2] [input.file.prefix3] ....
    49 # This method can plot 1 or more pairs of replicates
    50 
    51 batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR
    52 }}}
    53 
    54 === Generate a conservative and an optimal final set of peak calls ===
    55 {{{
    56 # Use an IDR cutoff of 0.01 to 0.05, depending on the number of pre-IDR peaks and size of the genome:
    57 # IDR <= 0.05 for < 100K pre-IDR peaks for large genomes (human/mouse)
    58 # IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm
    59 
    60 awk '{ if($NF < 0.05) print $0 }' rep1_vs_rep2_IDR-overlapped-peaks.txt > rep1_vs_rep2_conserved_peaks_by_IDR.txt
    61 }}}
     1None at this time