Changes between Initial Version and Version 1 of SOPs/Hi-C


Ignore:
Timestamp:
11/17/20 13:55:07 (4 years ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/Hi-C

    v1 v1  
     1== Using Hi-C to capture chromatin structure ==
     2
     3
     4The  [[https://science.sciencemag.org/content/326/5950/289 | Hi-C method]] Hi-C method generalizes earlier experimental techniques, such as 3C or 5C,  for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome.  Two software pipelines for analyzing data from Hi-C experiments are [[https://github.com/aidenlab/juicer | juicer]] and  [[https://github.com/nservant/HiC-Pro | HiC-Pro]]. An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly.  This choice is taken to ease downstream analysis.  Please see the HiC-Pro [[http://nservant.github.io/HiC-Pro/ | documentation]] for additional examples.
     5
     6=== Basic Approach ===
     7 * [#SetConfig Set up the HiC-Pro configuration file]
     8 * [#Submit Submit data processing to the LSF batch queue]
     9
     10=== [=#SetConfig Set up the configuration file] ===
     11
     12  * The configuration file (here called config.txt) dictates essential settings for the analysis.  In this example, human samples have been sequenced.
     13
     14{{{
     15# Please change the variable settings below if necessary
     16
     17#########################################################################
     18## Paths and Settings  - Do not edit !
     19#########################################################################
     20
     21TMP_DIR = tmp
     22LOGS_DIR = logs
     23BOWTIE2_OUTPUT_DIR = bowtie_results
     24MAPC_OUTPUT = hic_results
     25RAW_DIR = rawdata
     26
     27#######################################################################
     28## SYSTEM - PBS - Start Editing Here !!
     29#######################################################################
     30N_CPU = 8
     31LOGFILE = hicpro.log
     32
     33JOB_NAME = HiC
     34JOB_MEM = 20gb
     35JOB_WALLTIME = 12:00:00
     36JOB_QUEUE = batch
     37JOB_MAIL = user@wi.mit.edu
     38
     39#########################################################################
     40## Data
     41#########################################################################
     42
     43PAIR1_EXT = _R1
     44PAIR2_EXT = _R2
     45
     46#######################################################################
     47## Alignment options
     48#######################################################################
     49
     50FORMAT = phred33
     51MIN_MAPQ = 0
     52
     53BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie
     54BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
     55BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder
     56
     57#######################################################################
     58## Annotation files
     59#######################################################################
     60
     61REFERENCE_GENOME = hg38
     62GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt
     63
     64#######################################################################
     65## Allele specific
     66#######################################################################
     67
     68ALLELE_SPECIFIC_SNP =
     69
     70#######################################################################
     71## Digestion Hi-C
     72#######################################################################
     73
     74GENOME_FRAGMENT = /nfs/genomes/arimaCutsites/hg38_GATC_GANTC.bed
     75LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC
     76MIN_FRAG_SIZE = 10
     77MAX_FRAG_SIZE = 100000
     78MIN_INSERT_SIZE = 100
     79MAX_INSERT_SIZE = 1000
     80
     81#######################################################################
     82## Hi-C processing
     83#######################################################################
     84
     85MIN_CIS_DIST =
     86GET_ALL_INTERACTION_CLASSES = 1
     87GET_PROCESS_SAM = 1
     88RM_SINGLETON = 1
     89RM_MULTI = 1
     90RM_DUP = 1
     91
     92#######################################################################
     93## Contact Maps
     94#######################################################################
     95
     96BIN_SIZE = 1000 5000  10000 20000 50000 100000
     97MATRIX_FORMAT = upper
     98
     99#######################################################################
     100## ICE Normalization
     101#######################################################################
     102MAX_ITER = 100
     103FILTER_LOW_COUNT_PERC = 0.02
     104FILTER_HIGH_COUNT_PERC = 0
     105EPS = 0.1
     106}}}
     107
     108
     109
     110=== [=#Submit Submit data processing to the LSF batch queue] ===
     111
     112The essential command to include when submitting HiC-Pro computations to the LSF batch queue is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories.  As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
     113{{{
     114/usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut
     115}}}
     116
     117