wiki:SOPs/CUT&Tag

CUT&Tag

Cleavage Under Targets & Tagmentation (CUT&Tag) is a tethering method that uses a protein-A-Tn5 (pA-Tn5) transposome fusion protein. It is an alternative technique to ChIP-seq and CUT&Run for detecting enrichment of protein-DNA interactions or histone modifications. A detailed description of the experimental method together with a protocol for computational analysis have been published by the Henikoff laboratory. Our preferences for specific steps include:

  • As in the analysis protocol from the Henikoff lab, we recommend using an IgG control.
  • As an alternative to calling peaks with SEACR, we recommend using MAC2 because the resulting peaks tend to be narrower and better capture the tagged regions.
     macs2 callpeak --keep-dup all -t sample.mapped.bam -g hs -f BAMPE -n OutputName
    
  • We recommend not removing duplicates from any of the samples.
  • Spike-in calibration using the number of fragments mapped to the E. coli genome, as described in the analysis protocol published by the Henikoff lab, is useful for visualization of the CUT&Tag profile with a genome browser.
  • Spike-in normalized bedgraph files are not an appropriate input for MACS2, since MACS2 will renormalize to the library size.
  • Spike-in normalization using the commands described in Call peaks using MACS2 subcommands, step 4 hasn't worked well for us.
  • We recommend using the spike-in scale factors in subsequent steps when comparing binding between conditions using tools like DESeq2.

For a working example for how to run the published analysis workflow using the computing resources at the Whitehead Institute, please follow /nfs/BaRC_Public/BaRC_code/pipelines/analyze_CUTnTag/README and find the associated scripts within the parent directory.

To run the analysis for the same example input with one command using nextflow run the following commands on fry

mkdir /nfs/BaRC_training/CUTTAG/yourUserName
cd /nfs/BaRC_training/CUTTAG/yourUserName
sbatch --partition=20 --job-name=NextF_CT --output=NextF_CT_1sample-%j.out --mem=150gb --nodes=1 --ntasks=1 --cpus-per-task=20 --wrap "/nfs/BaRC_Public/apps/nextflow/nextflow run nf-core/cutandrun -profile singularity --input /nfs/BaRC_Public/Hot_Topics/CUTandTag/nextFlow/samplesheet.csv --normalisation_mode CPM --igg_scale_factor 1 --peakcaller 'MACS2' --multiqc_title 'multiQCReport' --skip_removeduplicates true --skip_preseq false --skip_dt_qc false --skip_multiqc false --skip_reporting false --dump_scale_factors true --email 'userName@wi.mit.edu' --genome GRCh38 --extend_fragments false --macs2_qvalue 0.1 --minimum_alignment_q_score 0 --outdir ./OutNextF_keepAllReads_CPM_q0"

###Alternative more stringent peak calling
#Change these parameters to increase the stringency:
# --minimum_alignment_q_score 20  #to filter out low quality mapping 
#and
# --macs2_qvalue 0.01 or 0.001 #to increase macs2 stringency

sbatch --partition=20 --job-name=NextF_CT --output=NextF_CT_1sample-%j.out --mem=150gb --nodes=1 --ntasks=1 --cpus-per-task=20 --wrap "/nfs/BaRC_Public/apps/nextflow/nextflow run nf-core/cutandrun -profile singularity --input /nfs/BaRC_Public/Hot_Topics/CUTandTag/nextFlow/samplesheet.csv --normalisation_mode CPM --igg_scale_factor 1 --peakcaller 'MACS2' --multiqc_title 'multiQCReport' --skip_removeduplicates true --skip_preseq false --skip_dt_qc false --skip_multiqc false --skip_reporting false --dump_scale_factors true --email 'userName@wi.mit.edu' --genome GRCh38 --extend_fragments false --macs2_qvalue 0.01 --minimum_alignment_q_score 20 --outdir ./OutNextF_keepAllReads_CPM_q20"

These are our recommended options:

--end_to_end FALSE  
--save_spikein_aligned  TRUE  
--save_align_intermed  TRUE 
--skip_removeduplicates true  
--skip_preseq false   
--skip_dt_qc false 
--skip_multiqc false 
--skip_reporting false 
--dump_scale_factors true
--normalisation_binsize 1 (default 50) 

To run macs2 using the "--keep-dup auto" setting you can use a input a profile file like the one described below: macs2CustomCUTRUN.config

process {
    withName: '.*:CUTANDRUN:MACS2_.*' {
        ext.args   = [
            '--keep-dup auto',
            '--nomodel',
            '--shift -75',
            '--extsize 150',
            '--format BAM',
            '--bdg ',
            '--qvalue 0.01'
        ].join(' ').trim()

    }
}

The command to be run using that configuration file is:

sbatch --partition=20 --job-name=NextF --output=NextF-%j.out  --mem=300gb   --nodes=1 --ntasks=1 --cpus-per-task=20 --wrap \
nextflow run nf-core/cutandrun -profile singularity --normalisation_binsize 1  --input ./samplesheet.csv -c macs2CustomCUTRUN.config  --normalisation_mode CPM \
--save_align_intermed  TRUE --peakcaller 'MACS2' --replicate_threshold 2  --end_to_end FALSE  --multiqc_title 'multiQCReport' --skip_removeduplicates true \
--skip_preseq false   --skip_dt_qc false --skip_multiqc false --skip_reporting false --dump_scale_factors true --email 'username@wi.mit.edu' --genome GRCh38 \
--extend_fragments false --macs2_qvalue 0.01 --outdir  ./nextFlow_macs2auto  "

Pipeline reference pages:

nf-core CUT&Tag pipeline

CUT&Tag pipeline parameters

Note: See TracWiki for help on using the wiki.