Changes between Version 15 and Version 16 of SOPs/InProgress


Ignore:
Timestamp:
05/13/14 18:17:23 (11 years ago)
Author:
byuan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/InProgress

    v15 v16  
    1 Get reproducible peaks from multiple chip-seq replicates with IDR (https://sites.google.com/site/anshulkundaje/projects/idr)
     1== Get reproducible peaks from multiple chip-seq replicates with IDR https://sites.google.com/site/anshulkundaje/projects/idr ==
    22
    3 == use macs2 to get peaks: You can adjust fragment length with --nomodel --shiftsize if NSC/RSC plots are not good ==
    4 * bsub macs2 callpeak -t IP_1.bam -c control_1.bam  -f BAM -g hs -n IP.1_vs_control.1 -B  -p 1e-3
    5 * bsub macs2 callpeak -t IP_2.bam -c control_2.bam  -f BAM -g hs -n IP.2_vs_control.2 -B  -p 1e-3
    63
    7 == sort .narrowPeak files ( macs2 output) from best to worst using the -log10(pvalue) column i.e. column 8, and only keep the top 100k peaks ==
    8 * bsub "sort -k 8nr,8nr IP.1_vs_control.1_peaks.narrowPeak |head -n 100000|gzip -c >| IP.1_vs_control.1.regionPeak.gz"
    9 * bsub "sort -k 8nr,8nr IP.2_vs_control.2_peaks.narrowPeak |head -n 100000|gzip -c >| IP.2_vs_control.2.regionPeak.gz"
     4=== Use macs2 to get peaks (not macs14): You can adjust fragment length with --nomodel --shiftsize if NSC/RSC plots are not good ===
     5{{{
     6bsub macs2 callpeak -t IP_1.bam -c control_1.bam  -f BAM -g hs -n IP.1_vs_control.1 -B  -p 1e-3
     7bsub macs2 callpeak -t IP_2.bam -c control_2.bam  -f BAM -g hs -n IP.2_vs_control.2 -B  -p 1e-3
     8}}}
    109
    11 == estimate Irreproducibility Discovery Rate (IDR) between replicates: ==
    12 * chromInfo.txt format: chromosome<tab>size
    13 * Rscript batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt
     10=== Sort .narrowPeak files ( macs2 output) from best to worst using the -log10(pvalue) column i.e. column 8, and only keep the top 100k peaks ===
     11{{{
     12bsub "sort -k 8nr,8nr IP.1_vs_control.1_peaks.narrowPeak |head -n 100000|gzip -c >| IP.1_vs_control.1.regionPeak.gz"
     13bsub "sort -k 8nr,8nr IP.2_vs_control.2_peaks.narrowPeak |head -n 100000|gzip -c >| IP.2_vs_control.2.regionPeak.gz"
     14}}}
    1415
    15 == plot the IDR plots ==
    16 * Rscript batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR
     16=== Estimate Irreproducibility Discovery Rate (IDR) between replicates: ===
     17{{{
     18# chromInfo.txt format: chromosome<tab>size
     19Rscript batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt
     20}}}
    1721
    18 == generate a conservative and an optimal final set of peak calls: ==
    19 == IDR cutoff is between 0.01 or 0.05 depends on the number of pre-IDR peaks or size of genomes:==
    20 == IDR<=0.05 for < 100K pre-IDR peaks for large genomes (human/mouse) ==
    21 == IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm ==
     22=== Plot the IDR plots: ===
     23{{{
     24 Rscript batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR
     25}}}
    2226
    23 * awk '{ if($NF < 0.05) print $0 }'  rep1_vs_rep2_IDR-overlapped-peaks.txt >  rep1_vs_rep2_conserved_peaks_by_IDR.txt
     27=== Generate a conservative and an optimal final set of peak calls: ===
     28{{{
     29 #IDR cutoff is between 0.01 or 0.05 depending on the number of pre-IDR peaks and size of genomes:
     30 #IDR <=0.05 for < 100K pre-IDR peaks for large genomes (human/mouse)
     31 #IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm
     32
     33awk '{ if($NF < 0.05) print $0 }'  rep1_vs_rep2_IDR-overlapped-peaks.txt >  rep1_vs_rep2_conserved_peaks_by_IDR.txt
     34}}}