Changes between Initial Version and Version 1 of SOPs/integrateExpressionIP


Ignore:
Timestamp:
01/23/13 16:49:43 (12 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/integrateExpressionIP

    v1 v1  
     1== Introduction ==
     2
     3Given a ChIP-seq experiment for a transcription factor (TF) of interest and a microarray experiment comparing WT samples to KO or RNAi for the same TF, the goal of this analysis is to find out if the genes potentially regulated by the TF are changing in expression and if there is an enrichment of genes differentially expressed (DE) within the genes bound by the TF and vice versa.
     4
     5We will be using the term “genes bound” or “genes potentially regulated by the TF” to refer to genes that are within a certain distance from a ChIP-Seq peak. This is defined by the user. Some examples of criteria to define “genes bound” by the TF would be: 1) The region from 2kb upstream to 500 pb downstream of the TSS overlaps with a peak. 2) The gene body or the 10Kb upstream or 10 Kb downstream of the gene overlap with a peak.
     6
     7
     8== Step 1: Define peaks ("bound regions") from a ChIP-Seq experiment ==
     9
     10[wiki:SOPs/chip_seq_peaks Follow the standard operating procedure for identifying ChIP-Seq peaks]
     11
     12== Step 2: Find genes differentially expressed from an expression microarray experiment ==
     13
     14Follow standard operating procedures for [wiki:SOPs/normalize_ma normalizing microarray experiments] and [wiki:SOPs/diff_microarry identifying differentially expressed genes] to get a list of genes differentially expressed or a list of all the genes and their fold change on the microarray experiment.
     15
     16== Step 3: Find out if the genes potentially regulated by the TF are changing in expression ==
     17
     18'''Method 1'''. Use R to draw cumulative distribution function (cdf) plots that show the distribution of expression changes for all genes and for genes bound. If binding results in activation or repression of the genes bound we should see a shift in the distributions. We can test if the difference is significant with a Kolmogorov–Smirnov (KS) test.
     19
     20Naming: FC stands for fold change in expression between control cells and cells where the TF of interest has been knock down. GenesBound is the list of genes potentially regulated by the TF according to the ChIP-seq experiment (see introduction above). AllTranscripts.FC.txt and GenesBound.FC.txt files have the FC for all genes or for the subset of genes bound. We plot both distributions and look for differences.
     21
     22The input files used for the R code below are like:
     23
     24AllTranscripts.FC.txt
     25
     26{{{
     27 NM_000022        0.13    -0.02   -0.53     0.29   -0.06   -0.22   -0.18   -0.07
     28 NM_000073        0.03    -0.39   -0.42     0.29    0.01   -0.23   -0.02   -0.14
     29 NM_000188       -0.18    -0.43   -0.41    -0.72   -0.04   -0.29   -0.38   -0.47
     30 NM_000270        0.31     0.54    0.50     0.67    0.32    0.18    0.31    0.54
     31 NM_000314       -0.12     0.13    0.04    -0.23    0.03    0.14    0.11   -0.06
     32 NM_000401        0.00    -0.40    0.13    -0.06   -0.32    0.10    0.15    0.18
     33 NM_000404       -0.08     0.17    0.30    -0.20   -0.03    0.22   -0.08    0.11
     34    .....
     35}}}
     36
     37GenesBound.FC.txt
     38{{{
     39 NM_000073       0.03    -0.39  -0.42   0.29    0.01  -0.23   -0.02   -0.14
     40 NM_000401       0.00    -0.40   0.13  -0.06   -0.32   0.10    0.15    0.18
     41 ...
     42}}}
     43
     44These input files have the logFC of 8 know-down experiments. The expression changes we are interested in plotting are on the 6th expression column.
     45{{{
     46 control <- read.delim("AllTranscripts.FC.txt")
     47 GenesBound <- read.delim("GenesBound.FC.txt")
     48 png(file="cdfPlot.png")
     49 plot(ecdf(control[,6]), xlab="logFC_hsTF", ylab="Cumulative Fraction")
     50 lines(ecdf(GenesBound[,6]), verticals=T,do.points=FALSE, col="purple", lty=1, xlim=c(-6,6))
     51 dev.off()
     52}}}
     53{{{
     54 # Perform a KS test
     55 ks.test(control[,6], GenesBound[,6])
     56}}}
     57
     58'''Method 2'''. Pick a cut-off for genes that are differentially expressed and test if the bound genes are enriched for differentially expressed genes using Fisher's exact test.
     59
     60For example: Of the 20,000 transcripts assayed by the microarray, 700 were differentially expressed and 400 of the DE expressed genes were also found to be bound by the TF. The total number of genes bound by the TF was 3000.
     61
     62{{{
     63              DE  non-DE  Totals
     64   Bound      400   2600  3000
     65   Not-Bound  300  16700  17000
     66   Totals     700  19300  20000
     67}}}
     68
     69You can perform a Chi-squared test or a Fisher's exact test in R:
     70{{{
     71   data <- matrix(c(400, 2600, 300, 16700), ncol=2, byrow=T)
     72   chisq.test(data)
     73   fisher.test(data)
     74}}}
     75'''Method 3'''. Use the GSEA tool from the Broad Institute. Use the raw array data and genes bound as the gene data set to look for enrichment [http://www.broadinstitute.org/gsea/index.jsp GSEA]