Using Hi-C to capture chromatin structure
The Hi-C method Hi-C method generalizes earlier experimental techniques, such as 3C or 5C, for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome. Two software pipelines for analyzing data from Hi-C experiments are juicer and HiC-Pro. An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly. This choice is taken to ease downstream analysis. Please see the HiC-Pro documentation for additional examples.
Analysis outline
- Set up the HiC-Pro configuration file
- Submit data processing to the LSF batch queue
- Generate contact map visualizations from processed Hi-C data
Set up the configuration file
- The configuration file (here called config.txt) dictates essential settings for the analysis. In this example, human samples have been sequenced and an e-mail will be sent to "user at wi.mit.edu" on job completion.
# Please change the variable settings below if necessary ######################################################################### ## Paths and Settings - Do not edit ! ######################################################################### TMP_DIR = tmp LOGS_DIR = logs BOWTIE2_OUTPUT_DIR = bowtie_results MAPC_OUTPUT = hic_results RAW_DIR = rawdata ####################################################################### ## SYSTEM - PBS - Start Editing Here !! ####################################################################### N_CPU = 8 LOGFILE = hicpro.log JOB_NAME = HiC JOB_MEM = 20gb JOB_WALLTIME = 12:00:00 JOB_QUEUE = batch JOB_MAIL = user@wi.mit.edu ######################################################################### ## Data ######################################################################### PAIR1_EXT = _R1 PAIR2_EXT = _R2 ####################################################################### ## Alignment options ####################################################################### FORMAT = phred33 MIN_MAPQ = 0 BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder ####################################################################### ## Annotation files ####################################################################### REFERENCE_GENOME = hg38 GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt ####################################################################### ## Allele specific ####################################################################### ALLELE_SPECIFIC_SNP = ####################################################################### ## Digestion Hi-C ####################################################################### GENOME_FRAGMENT = /nfs/genomes/human_hg38_dec13/arima/hg38_GATC_GANTC.bed LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC MIN_FRAG_SIZE = 10 MAX_FRAG_SIZE = 100000 MIN_INSERT_SIZE = 100 MAX_INSERT_SIZE = 1000 ####################################################################### ## Hi-C processing ####################################################################### MIN_CIS_DIST = GET_ALL_INTERACTION_CLASSES = 1 GET_PROCESS_SAM = 1 RM_SINGLETON = 1 RM_MULTI = 1 RM_DUP = 1 ####################################################################### ## Contact Maps ####################################################################### BIN_SIZE = 1000 5000 10000 20000 50000 100000 MATRIX_FORMAT = upper ####################################################################### ## ICE Normalization ####################################################################### MAX_ITER = 100 FILTER_LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 EPS = 0.1
Submit data processing to the LSF batch queue
The essential command to include when submitting HiC-Pro computations to the LSF batch queue is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories. As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
/usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut
Generate contact map visualizations from processed Hi-C data
Hi-C experiments enable the visualization of chromatin contact maps. Output from HiC-Pro can be translated into contact maps using HiTC, the juicebox viewer or HiCPlotter . To use the juicebox viewer, the HiC-Pro output must first be processed using /usr/local/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh. The example below illustrates how to use HiCPlotter to visualize the contact map for the Y chromosome using a 100 kb resolution and naming the graphical output file with a "cellType" prefix.
python2 /usr/local/bin/python2.7/HiCPlotter.py -f /path/to/hic_results/matrix/sample1/iced/100000/sample1_100000_iced.matrix -chr chrY -o cellType -n " " -r 100000 -tri 1 -bed /path/to/hic_results/matrix/sample1/raw/100000/sample1_100000_abs.bed -mm 6