Changes between Initial Version and Version 1 of SOPs/HiChIP


Ignore:
Timestamp:
01/07/21 16:25:08 (4 years ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/HiChIP

    v1 v1  
     1== Using Hi-C to capture chromatin structure ==
     2
     3
     4The  [[https://science.sciencemag.org/content/326/5950/289 | Hi-C method]] Hi-C method generalizes earlier experimental techniques, such as 3C or 5C,  for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome.  Two software pipelines for analyzing data from Hi-C experiments are [[https://github.com/aidenlab/juicer | juicer]] and  [[https://github.com/nservant/HiC-Pro | HiC-Pro]]. An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly.  This choice is taken to ease downstream analysis.  Please see the HiC-Pro [[http://nservant.github.io/HiC-Pro/ | documentation]] for additional examples.
     5
     6=== Analysis outline ===
     7 * [#SetConfig Set up the HiC-Pro configuration file]
     8 * [#Submit Submit data processing to the LSF batch queue]
     9 * [#ContactMaps Generate contact map visualizations from processed Hi-C data]
     10
     11=== [=#SetConfig Set up the configuration file] ===
     12
     13  * The configuration file (here called config.txt) dictates essential settings for the analysis.  In this example, human samples have been sequenced and an e-mail will be sent to "user at wi.mit.edu" on job completion.
     14
     15{{{
     16# Please change the variable settings below if necessary
     17
     18#########################################################################
     19## Paths and Settings  - Do not edit !
     20#########################################################################
     21
     22TMP_DIR = tmp
     23LOGS_DIR = logs
     24BOWTIE2_OUTPUT_DIR = bowtie_results
     25MAPC_OUTPUT = hic_results
     26RAW_DIR = rawdata
     27
     28#######################################################################
     29## SYSTEM - PBS - Start Editing Here !!
     30#######################################################################
     31N_CPU = 8
     32LOGFILE = hicpro.log
     33
     34JOB_NAME = HiC
     35JOB_MEM = 20gb
     36JOB_WALLTIME = 12:00:00
     37JOB_QUEUE = batch
     38JOB_MAIL = user@wi.mit.edu
     39
     40#########################################################################
     41## Data
     42#########################################################################
     43
     44PAIR1_EXT = _R1
     45PAIR2_EXT = _R2
     46
     47#######################################################################
     48## Alignment options
     49#######################################################################
     50
     51FORMAT = phred33
     52MIN_MAPQ = 0
     53
     54BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie
     55BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
     56BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder
     57
     58#######################################################################
     59## Annotation files
     60#######################################################################
     61
     62REFERENCE_GENOME = hg38
     63GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt
     64
     65#######################################################################
     66## Allele specific
     67#######################################################################
     68
     69ALLELE_SPECIFIC_SNP =
     70
     71#######################################################################
     72## Digestion Hi-C
     73#######################################################################
     74
     75GENOME_FRAGMENT = /nfs/genomes/human_hg38_dec13/arima/hg38_GATC_GANTC.bed
     76LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC
     77MIN_FRAG_SIZE = 10
     78MAX_FRAG_SIZE = 100000
     79MIN_INSERT_SIZE = 100
     80MAX_INSERT_SIZE = 1000
     81
     82#######################################################################
     83## Hi-C processing
     84#######################################################################
     85
     86MIN_CIS_DIST =
     87GET_ALL_INTERACTION_CLASSES = 1
     88GET_PROCESS_SAM = 1
     89RM_SINGLETON = 1
     90RM_MULTI = 1
     91RM_DUP = 1
     92
     93#######################################################################
     94## Contact Maps
     95#######################################################################
     96
     97BIN_SIZE = 1000 5000  10000 20000 50000 100000
     98MATRIX_FORMAT = upper
     99
     100#######################################################################
     101## ICE Normalization
     102#######################################################################
     103MAX_ITER = 100
     104FILTER_LOW_COUNT_PERC = 0.02
     105FILTER_HIGH_COUNT_PERC = 0
     106EPS = 0.1
     107}}}
     108
     109
     110
     111=== [=#Submit Submit data processing to the LSF batch queue] ===
     112
     113The essential command to include when submitting HiC-Pro computations to the LSF batch queue is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories.  As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
     114{{{
     115/usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut
     116}}}
     117
     118=== [=#ContactMaps Generate contact map visualizations from processed Hi-C data] ===
     119
     120Hi-C experiments enable the visualization of chromatin contact maps.  Output from HiC-Pro can be translated into contact maps using [[http://bioconductor.org/packages/release/bioc/html/HiTC.html | HiTC]], the [[https://www.aidenlab.org/juicebox/ | juicebox viewer]] or [[https://github.com/kcakdemir/HiCPlotter | HiCPlotter ]].  To use the juicebox viewer, the HiC-Pro output must first be processed using /usr/local/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh.  The example below illustrates how to use HiCPlotter to visualize the contact map for the Y chromosome using a 100 kb resolution and naming the graphical output file with a "cellType" prefix.
     121{{{
     122python2 /usr/local/bin/python2.7/HiCPlotter.py -f /path/to/hic_results/matrix/sample1/iced/100000/sample1_100000_iced.matrix -chr chrY -o cellType -n " " -r 100000 -tri 1 -bed /path/to/hic_results/matrix/sample1/raw/100000/sample1_100000_abs.bed -mm 6
     123}}}