== Using Hi-C to capture chromatin structure ==


The  [[https://science.sciencemag.org/content/326/5950/289 | Hi-C method]] Hi-C method generalizes earlier experimental techniques, such as 3C or 5C,  for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome.  Two software pipelines for analyzing data from Hi-C experiments are [[https://github.com/aidenlab/juicer | juicer]] and  [[https://github.com/nservant/HiC-Pro | HiC-Pro]]. 

Instructions for running juicer on Whitehead computing resources is as follows, using juicer 1.6 with a shell script designed to be run on a slurm cluster.  Start in the folder with your fastq files.

{{{
1 - Set up directory and file structure expected by juicer

# Link to the slurm scripts folder
ln -s /nfs/BaRC_Public/apps/juicer/juicer-1.6/scripts
# Link to the genome files (for TAIR10) or create your own, with a similar organization
ln -s /nfs/BaRC_Public/apps/juicer/juicer-1.6/genome
ln -s /nfs/BaRC_Public/apps/juicer/juicer-1.6/references
ln -s /nfs/BaRC_Public/apps/juicer/juicer-1.6/restriction_sites
# Create a folder called 'fastq' and symlink to your fastq sequences
# Fastq sequence files needs to be text (not gzipped) and have names like [SAMPLE]_R1.fastq and [SAMPLE]_R2.fastq
mkdir fastq; cd fastq
ln -s ../*.fastq .
# Go back to your original working directory
cd ..

2 - Run the main juicer command, replacing MY_WORKING_DIR with your current working directory (which contains the scripts, fastq, genome, etc. folders), 
 using names of your desired genome, and your queue name in place of '20')
./scripts/juicer.sh -g TAIR10 -z MY_WORKING_DIR/references/TAIR10.fa -q 20 -l 20 -s DpnII -p MY_WORKING_DIR/genome/TAIR10.chrom.sizes -y MY_WORKING_DIR/restriction_sites/TAIR10_DpnII.txt -D MY_WORKING_DIR -t 8
}}}

An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly.  This choice is taken to ease downstream analysis.  Please see the HiC-Pro [[http://nservant.github.io/HiC-Pro/ | documentation]] for additional examples.

=== Analysis outline ===
 * [#SetConfig Set up the HiC-Pro configuration file]
 * [#Submit Submit data processing to the computing cluster]
 * [#ContactMaps Generate contact map visualizations from processed Hi-C data]

=== [=#SetConfig Set up the configuration file] ===

  * The configuration file (here called config.txt) dictates essential settings for the analysis.  In this example, human samples have been sequenced and an e-mail will be sent to "user at wi.mit.edu" on job completion.

{{{
# Please change the variable settings below if necessary

#########################################################################
## Paths and Settings  - Do not edit !
#########################################################################

TMP_DIR = tmp
LOGS_DIR = logs
BOWTIE2_OUTPUT_DIR = bowtie_results
MAPC_OUTPUT = hic_results
RAW_DIR = rawdata

#######################################################################
## SYSTEM - PBS - Start Editing Here !!
#######################################################################
N_CPU = 8
LOGFILE = hicpro.log

JOB_NAME = HiC
JOB_MEM = 20gb
JOB_WALLTIME = 12:00:00
JOB_QUEUE = batch
JOB_MAIL = user@wi.mit.edu

#########################################################################
## Data
#########################################################################

PAIR1_EXT = _R1
PAIR2_EXT = _R2

#######################################################################
## Alignment options
#######################################################################

FORMAT = phred33
MIN_MAPQ = 0

BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie
BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################
## Annotation files
#######################################################################

REFERENCE_GENOME = hg38
GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt

#######################################################################
## Allele specific
#######################################################################

ALLELE_SPECIFIC_SNP = 

#######################################################################
## Digestion Hi-C
#######################################################################

GENOME_FRAGMENT = /nfs/genomes/human_hg38_dec13/arima/hg38_GATC_GANTC.bed
LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC
MIN_FRAG_SIZE = 10
MAX_FRAG_SIZE = 100000
MIN_INSERT_SIZE = 100
MAX_INSERT_SIZE = 1000

#######################################################################
## Hi-C processing
#######################################################################

MIN_CIS_DIST =
GET_ALL_INTERACTION_CLASSES = 1
GET_PROCESS_SAM = 1
RM_SINGLETON = 1
RM_MULTI = 1
RM_DUP = 1

#######################################################################
## Contact Maps
#######################################################################

BIN_SIZE = 1000 5000  10000 20000 50000 100000
MATRIX_FORMAT = upper

#######################################################################
## ICE Normalization
#######################################################################
MAX_ITER = 100
FILTER_LOW_COUNT_PERC = 0.02
FILTER_HIGH_COUNT_PERC = 0
EPS = 0.1
}}}



=== [=#Submit Submit data processing to the computing cluster] ===

The essential command to include when submitting HiC-Pro computations to the computing cluster is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories.  As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
{{{
/usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut
}}}

=== [=#ContactMaps Generate contact map visualizations from processed Hi-C data] ===

Hi-C experiments enable the visualization of chromatin contact maps.  Output from HiC-Pro can be translated into contact maps using [[http://bioconductor.org/packages/release/bioc/html/HiTC.html | HiTC]], the [[https://www.aidenlab.org/juicebox/ | juicebox viewer]] or [[https://github.com/kcakdemir/HiCPlotter | HiCPlotter ]].  To use the juicebox viewer, the HiC-Pro output must first be processed using /usr/local/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh.  The example below illustrates how to use HiCPlotter to visualize the contact map for the Y chromosome using a 100 kb resolution and naming the graphical output file with a "cellType" prefix.
{{{
python2 /usr/local/bin/python2.7/HiCPlotter.py -f /path/to/hic_results/matrix/sample1/iced/100000/sample1_100000_iced.matrix -chr chrY -o cellType -n " " -r 100000 -tri 1 -bed /path/to/hic_results/matrix/sample1/raw/100000/sample1_100000_abs.bed -mm 6
}}}