Changes between Version 1 and Version 2 of SOPs/HiChIP


Ignore:
Timestamp:
01/07/21 18:07:26 (3 years ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/HiChIP

    v1 v2  
    1 == Using Hi-C to capture chromatin structure ==
     1== Using HiChIP experiments to characterize genome-wide chromatin contacts between regulatory elements ==
    22
    33
    4 The  [[https://science.sciencemag.org/content/326/5950/289 | Hi-C method]] Hi-C method generalizes earlier experimental techniques, such as 3C or 5C,  for characterizing contacts between specific chromosomal loci, to enable unbiased identification of chromatin interactions across an entire genome.  Two software pipelines for analyzing data from Hi-C experiments are [[https://github.com/aidenlab/juicer | juicer]] and  [[https://github.com/nservant/HiC-Pro | HiC-Pro]]. An example for using HiC-Pro on Whitehead computing resources with data collected with a kit from Arima Genomics is outlined below. Note that in this example, a reference genome is used that excludes unlocalized or unplaced contigs from the assembly.  This choice is taken to ease downstream analysis.  Please see the HiC-Pro [[http://nservant.github.io/HiC-Pro/ | documentation]] for additional examples.
     4The  [[https://www.nature.com/articles/nmeth.3999 | HiChIP method]] combines the [[https://science.sciencemag.org/content/326/5950/289 | Hi-C technique]] for high-throughput chromosome conformation capture with chromatin immunoprecipitation-sequencing (ChIP-seq) to characterize genome-wide chromatin contacts between regulatory elements, such as those marked by specific histone modifications or bound by other proteins (e.g. cohesin). [[https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006982 | MAPS]] is an analysis [[https://github.com/ijuric/MAPS | pipeline]] that can be used to extract such significant interactions from HiChIP (or the closely related PLAC-seq) data and visualize them in genome browsers.
    55
    66=== Analysis outline ===
    7  * [#SetConfig Set up the HiC-Pro configuration file]
     7 * [#SetConfig Set up the MAPS configuration file]
    88 * [#Submit Submit data processing to the LSF batch queue]
    9  * [#ContactMaps Generate contact map visualizations from processed Hi-C data]
    109
    1110=== [=#SetConfig Set up the configuration file] ===
    1211
    13   * The configuration file (here called config.txt) dictates essential settings for the analysis.  In this example, human samples have been sequenced and an e-mail will be sent to "user at wi.mit.edu" on job completion.
     12  * The MAPS pipeline is run from a shell script that specifies important configuration settings, including those for file pathways to interpreters and software for manipulating sequencing data.  For the specific case of data collected using the HiChIP kit from Arima Genomics, the [[https://github.com/ijuric/MAPS/tree/master/Arima_Genomics | pipeline]] comes with the Arima-MAPS_v2.0.sh shell script, which should be edited before running on the Whitehead cluster to include the following:
    1413
    1514{{{
    16 # Please change the variable settings below if necessary
    17 
    18 #########################################################################
    19 ## Paths and Settings  - Do not edit !
    20 #########################################################################
    21 
    22 TMP_DIR = tmp
    23 LOGS_DIR = logs
    24 BOWTIE2_OUTPUT_DIR = bowtie_results
    25 MAPC_OUTPUT = hic_results
    26 RAW_DIR = rawdata
    27 
    28 #######################################################################
    29 ## SYSTEM - PBS - Start Editing Here !!
    30 #######################################################################
    31 N_CPU = 8
    32 LOGFILE = hicpro.log
    33 
    34 JOB_NAME = HiC
    35 JOB_MEM = 20gb
    36 JOB_WALLTIME = 12:00:00
    37 JOB_QUEUE = batch
    38 JOB_MAIL = user@wi.mit.edu
    39 
    40 #########################################################################
    41 ## Data
    42 #########################################################################
    43 
    44 PAIR1_EXT = _R1
    45 PAIR2_EXT = _R2
    46 
    47 #######################################################################
    48 ## Alignment options
    49 #######################################################################
    50 
    51 FORMAT = phred33
    52 MIN_MAPQ = 0
    53 
    54 BOWTIE2_IDX_PATH = /nfs/genomes/human_hg38_dec13_no_random/bowtie
    55 BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
    56 BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder
    57 
    58 #######################################################################
    59 ## Annotation files
    60 #######################################################################
    61 
    62 REFERENCE_GENOME = hg38
    63 GENOME_SIZE = /nfs/genomes/human_hg38_dec13_no_random/anno/chromInfo.txt
    64 
    65 #######################################################################
    66 ## Allele specific
    67 #######################################################################
    68 
    69 ALLELE_SPECIFIC_SNP =
    70 
    71 #######################################################################
    72 ## Digestion Hi-C
    73 #######################################################################
    74 
    75 GENOME_FRAGMENT = /nfs/genomes/human_hg38_dec13/arima/hg38_GATC_GANTC.bed
    76 LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC
    77 MIN_FRAG_SIZE = 10
    78 MAX_FRAG_SIZE = 100000
    79 MIN_INSERT_SIZE = 100
    80 MAX_INSERT_SIZE = 1000
    81 
    82 #######################################################################
    83 ## Hi-C processing
    84 #######################################################################
    85 
    86 MIN_CIS_DIST =
    87 GET_ALL_INTERACTION_CLASSES = 1
    88 GET_PROCESS_SAM = 1
    89 RM_SINGLETON = 1
    90 RM_MULTI = 1
    91 RM_DUP = 1
    92 
    93 #######################################################################
    94 ## Contact Maps
    95 #######################################################################
    96 
    97 BIN_SIZE = 1000 5000  10000 20000 50000 100000
    98 MATRIX_FORMAT = upper
    99 
    100 #######################################################################
    101 ## ICE Normalization
    102 #######################################################################
    103 MAX_ITER = 100
    104 FILTER_LOW_COUNT_PERC = 0.02
    105 FILTER_HIGH_COUNT_PERC = 0
    106 EPS = 0.1
     15python_path=/usr/bin/python
     16Rscript_path=/usr/bin/Rscript
     17MACS2_path=/usr/local/bin/python3.6/macs2
    10718}}}
    108 
    109 
    11019
    11120=== [=#Submit Submit data processing to the LSF batch queue] ===
    11221
    113 The essential command to include when submitting HiC-Pro computations to the LSF batch queue is given below, where config.txt is located in the working directory, as are the rawdata and hicproOut subdirectories.  As also specified in the (config.txt) configuration file, HiC-Pro will create bowtie_results and hic_results subdirectories beneath hicproOut.
     22The command below is an example for how to run MAPS and should be used when submitting these computations to the LSF batch queue.  In this example, ChIP peaks are provided to the pipeline, rather than being called by it (using MACS2) and the reference genome (hg19) is for human (-o specifies "organism" here).
    11423{{{
    115 /usr/local/HiC-Pro_2.11.1/bin/HiC-Pro -c config.txt -i rawdata/ -o hicproOut
     24Arima-MAPS_v2.0.sh -C 0 -I /path/to/fastqFiles/fastqFileNamePrefix -O /path/to/output -m /path/to/peaks/peaks.bed -o hg19 -b /nfs/genomes/human_gp_feb_09_no_random/bwa_alt_name/hg19.fa -t 8 -f 1
    11625}}}
    117 
    118 === [=#ContactMaps Generate contact map visualizations from processed Hi-C data] ===
    119 
    120 Hi-C experiments enable the visualization of chromatin contact maps.  Output from HiC-Pro can be translated into contact maps using [[http://bioconductor.org/packages/release/bioc/html/HiTC.html | HiTC]], the [[https://www.aidenlab.org/juicebox/ | juicebox viewer]] or [[https://github.com/kcakdemir/HiCPlotter | HiCPlotter ]].  To use the juicebox viewer, the HiC-Pro output must first be processed using /usr/local/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh.  The example below illustrates how to use HiCPlotter to visualize the contact map for the Y chromosome using a 100 kb resolution and naming the graphical output file with a "cellType" prefix.
    121 {{{
    122 python2 /usr/local/bin/python2.7/HiCPlotter.py -f /path/to/hic_results/matrix/sample1/iced/100000/sample1_100000_iced.matrix -chr chrY -o cellType -n " " -r 100000 -tri 1 -bed /path/to/hic_results/matrix/sample1/raw/100000/sample1_100000_abs.bed -mm 6
    123 }}}