Processing After Normalization
- Two packages in Bioconductor can be used to determined differentially expressed genes. Significant Analysis of Microarrays (SAM) is implement by the package siggenes and the other is limma (recommended). Also see, Limma User's Guide (Oct 28, 2010) (or a more recent version) for examples of lots of experimental designs.
- Processing Affy data:
Create an expression set (ie. an eset object) from normalized expression values or ratios, either can be used but it needs to be log-transformed (see limma: Linear Models for Microarray Data, Section 23.2). If using the rma function, output values are log2 intensities.
library(limma) eset=rma(Data) eset.rma.expr=exprs(eset)
Create a design file which shows your targets, example
#If using read.delim, then column/header names are shifted to the left (see below) # Make sure that sample names in the first field is sorted Control Experiment Sample1 0 1 Sample2 0 1 Sample3 1 0 Sample4 1 0 #read in the design file design <- read.delim("myDesignFile.txt")
For other examples of design: Affy and Agilent
Fit linear model
rma.fit = lmFit(eset.rma.expr,design)
Set Contrast depending on question(s) being asked, eg. experiment vs control
contrast.matrix=makeContrasts(Experiment-Control, levels=design) #check if contrast is correct contrast
Get linear model fit for the contrasts. Use eBayes to estimate the variance using emperical Bayes approach.
rma.fit2=contrasts.fit(rma.fit, contrast.matrix) rma.fit2.ebayes = eBayes(rma.fit2)
Create MA and Volcano Plot to look for outliers (ie. differentially expressed genes)
plot(rma.fit2.ebayes$Amean, rma.fit2.ebayes$coefficients[,1]) plot(rma.fit2.ebayes$coeficients[,1], -log10(rma.fit2.ebayes$p.value[,1])
Write results to file after adjusting using FDR
write.fit(rma.fit2.ebayes, file="limma.out", adjust="fdr")
To find differentially expressed genes, sort the data by both log2 ratio and adjusted p-value. Choose a log2 ratio and p-value cutoff (eg. abs(log2 ratio) > 1 and adjusted p-value of < 0.05), a combination of both p-values and FC should be used to select DE genes [BioC] topTable threshold on p-value and logFC, Reply from Gordon Smyth. Another way to examine DE genes is to look at the volcano-plot, the tips of "V" in the volcano would generally have these DE genes.
Another Example: Affy_hairpin_analysis_GB.R
Other Issues
- For limma, the design file that lists the files should be in the same order as what is read-in.
Correcting for dye bias
Example from the User's Guide (8.1.2):
- design = cbind(DyeEffect=1,MUvsWT=c(1,-1,1,-1))
- fit = lmFit(MA, design); fit = eBayes(fit)
- To get genes with a dye effect: topTable(fit, coef="DyeEffect")
- To get MUvsWT diff: topTable(fit, coef="MUvsWT")
- "The fold changes and significant tests in this list are corrected for dye-effects. Including the dye-effect in the model in this way uses up one degree of freedom which might otherwise be used to estimate the residual variability, but it is valuable if many genes show non-negligible dye-effects."
For more information, see the third session of | An introduction to R and Bioconductor: A BaRC Short Course