SOPs/InProgress – BaRC Wiki

wiki:SOPs/InProgress

Context Navigation

Version 16 (modified by byuan, 12 years ago) ( diff )
--

Get reproducible peaks from multiple chip-seq replicates with IDR https://sites.google.com/site/anshulkundaje/projects/idr

Use macs2 to get peaks (not macs14): You can adjust fragment length with --nomodel --shiftsize if NSC/RSC plots are not good

bsub macs2 callpeak -t IP_1.bam -c control_1.bam  -f BAM -g hs -n IP.1_vs_control.1 -B  -p 1e-3
bsub macs2 callpeak -t IP_2.bam -c control_2.bam  -f BAM -g hs -n IP.2_vs_control.2 -B  -p 1e-3

Sort .narrowPeak files ( macs2 output) from best to worst using the -log10(pvalue) column i.e. column 8, and only keep the top 100k peaks

bsub "sort -k 8nr,8nr IP.1_vs_control.1_peaks.narrowPeak |head -n 100000|gzip -c >| IP.1_vs_control.1.regionPeak.gz"
bsub "sort -k 8nr,8nr IP.2_vs_control.2_peaks.narrowPeak |head -n 100000|gzip -c >| IP.2_vs_control.2.regionPeak.gz"

Estimate Irreproducibility Discovery Rate (IDR) between replicates:

# chromInfo.txt format: chromosome<tab>size
Rscript batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt

Plot the IDR plots:

 Rscript batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR

Generate a conservative and an optimal final set of peak calls:

 #IDR cutoff is between 0.01 or 0.05 depending on the number of pre-IDR peaks and size of genomes:
 #IDR <=0.05 for < 100K pre-IDR peaks for large genomes (human/mouse)
 #IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm

awk '{ if($NF < 0.05) print $0 }'  rep1_vs_rep2_IDR-overlapped-peaks.txt >  rep1_vs_rep2_conserved_peaks_by_IDR.txt

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text