Changes between Version 14 and Version 15 of SOPs/InProgress


Ignore:
Timestamp:
05/13/14 18:00:20 (11 years ago)
Author:
byuan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/InProgress

    v14 v15  
    66
    77== sort .narrowPeak files ( macs2 output) from best to worst using the -log10(pvalue) column i.e. column 8, and only keep the top 100k peaks ==
    8 bsub "sort -k 8nr,8nr IP.1_vs_control.1_peaks.narrowPeak |head -n 100000|gzip -c >| IP.1_vs_control.1.regionPeak.gz"
    9 bsub "sort -k 8nr,8nr IP.2_vs_control.2_peaks.narrowPeak |head -n 100000|gzip -c >| IP.2_vs_control.2.regionPeak.gz"
     8* bsub "sort -k 8nr,8nr IP.1_vs_control.1_peaks.narrowPeak |head -n 100000|gzip -c >| IP.1_vs_control.1.regionPeak.gz"
     9* bsub "sort -k 8nr,8nr IP.2_vs_control.2_peaks.narrowPeak |head -n 100000|gzip -c >| IP.2_vs_control.2.regionPeak.gz"
    1010
    11 # estimate Irreproducibility Discovery Rate (IDR) between replicates:
    12 # chromInfo.txt format: chromosome<tab>size
    13 Rscript batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt
     11== estimate Irreproducibility Discovery Rate (IDR) between replicates: ==
     12* chromInfo.txt format: chromosome<tab>size
     13* Rscript batch-consistency-analysis.r IP.1_vs_control.1.regionPeak.gz IP.2_vs_control.2.regionPeak.gz -1 rep1_vs_rep2_IDR 0 F p.value chromInfo.txt
    1414
    15 # plot the IDR plots
    16 Rscript batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR
     15== plot the IDR plots ==
     16* Rscript batch-consistency-plot.r 1 rep1_vs_rep2_IDR_plot rep1_vs_rep2_IDR
    1717
    18 # generate a conservative and an optimal final set of peak calls:
    19 # IDR cutoff is between 0.01 or 0.05 depends on the number of pre-IDR peaks or size of genomes:
    20 # IDR<=0.05 for < 100K pre-IDR peaks for large genomes (human/mouse)
    21 # IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm
     18== generate a conservative and an optimal final set of peak calls: ==
     19== IDR cutoff is between 0.01 or 0.05 depends on the number of pre-IDR peaks or size of genomes:==
     20== IDR<=0.05 for < 100K pre-IDR peaks for large genomes (human/mouse) ==
     21== IDR <= 0.01 or 0.02 for ~15K to 40K peaks in smaller genomes such as worm ==
    2222
    23 awk '{ if($NF < 0.05) print $0 }'  rep1_vs_rep2_IDR-overlapped-peaks.txt >  rep1_vs_rep2_conserved_peaks_by_IDR.txt
     23* awk '{ if($NF < 0.05) print $0 }'  rep1_vs_rep2_IDR-overlapped-peaks.txt >  rep1_vs_rep2_conserved_peaks_by_IDR.txt