     1== Using Hi-C to capture chromatin structure ==
     4The  [[ | Hi-C method]] Hi-C method generalizes earlier experimental techniques, such as 3C or 5C,  for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome.  Two software pipelines for analyzing data from Hi-C experiments are [[ | juicer]] and  [[ | HiC-Pro]]. An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly.  This choice is taken to ease downstream analysis.  Please see the HiC-Pro [[ | documentation]] for additional examples.
     6=== Basic Approach ===
     10=== [=#SetConfig Set up the configuration file] ===
     12  * The configuration file (here called config.txt) dictates essential settings for the analysis.  In this example, human samples have been sequenced.
     15# Please change the variable settings below if necessary
     18## Paths and Settings  - Do not edit !
     21TMP_DIR = tmp
     22LOGS_DIR = logs
     23BOWTIE2_OUTPUT_DIR = bowtie_results
     24MAPC_OUTPUT = hic_results
     25RAW_DIR = rawdata
     28## SYSTEM - PBS - Start Editing Here !!
     30N_CPU = 8
     31LOGFILE = hicpro.log
     33JOB_NAME = HiC
     34JOB_MEM = 20gb
     35JOB_WALLTIME = 12:00:00
     36JOB_QUEUE = batch
     37JOB_MAIL =
     40## Data
     43PAIR1_EXT = _R1
     44PAIR2_EXT = _R2
     47## Alignment options
     50FORMAT = phred33
     51MIN_MAPQ = 0
     53BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie
     54BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
     55BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder
     58## Annotation files
     61REFERENCE_GENOME = hg38
     62GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt
     65## Allele specific
     71## Digestion Hi-C
     74GENOME_FRAGMENT = /nfs/genomes/arimaCutsites/hg38_GATC_GANTC.bed
     76MIN_FRAG_SIZE = 10
     77MAX_FRAG_SIZE = 100000
     78MIN_INSERT_SIZE = 100
     79MAX_INSERT_SIZE = 1000
     82## Hi-C processing
     85MIN_CIS_DIST =
     87GET_PROCESS_SAM = 1
     88RM_SINGLETON = 1
     89RM_MULTI = 1
     90RM_DUP = 1
     93## Contact Maps
     96BIN_SIZE = 1000 5000  10000 20000 50000 100000
     97MATRIX_FORMAT = upper
     100## ICE Normalization
     102MAX_ITER = 100
     103FILTER_LOW_COUNT_PERC = 0.02
     105EPS = 0.1
     110=== [=#Submit Submit data processing to the LSF batch queue] ===
     112The essential command to include when submitting HiC-Pro computations to the LSF batch queue is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories.  As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
     114/usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut