= Identifying enriched biological themes in gene sets = === DAVID === [[http://david.abcc.ncifcrf.gov/home.jsp | DAVID]] * DAVID is generally the best place to start your enrichment analysis. * Instructions for using DAVID can be found under //Functional Annotation// on the DAVID web site. * You'll probably end up running DAVID multiple times, with different types of annotations, to get the more informative combination. * Full output can be downloaded and viewed as a spreadsheet. === Gene Set Enrichment Analysis (GSEA) === [[http://www.broadinstitute.org/gsea/index.jsp|Broad GSEA]] [[https://www.gsea-msigdb.org/gsea/login.jsp|Download the GSEA software and additional resources to analyze, annotate and interpret enrichment results.]] [[https://www.gsea-msigdb.org/gsea/msigdb/index.jsp|Explore the Molecular Signatures Database (MSigDB), ]]a collection of annotated gene sets for use with GSEA software. [[http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page | GSEA and MSigDB documentation]] [[http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Using_RNA-seq_Datasets_with_GSEA | Guidelines for using RNA-seq datasets with GSEA]] ==== Ranked List ==== 1. Create a two column file with gene names as first column and numeric values for second column (eg. log2 fold change, log2 ratio). The file does not need to be sorted and it should have extension ".rnk". * The second column, used to rank genes, could be log2 fold change, t-statistic, or another scoring scheme that takes into account both log ratio and p-value. 2. To run using the GUI * 1. Upload your ranked file "file.rnk". Click on "Steps in GSEA analysis -> Load data" * 2. Click on "Tools -> GseaPreranked" * 3. Select one of the gene sets from the "Gene sets database". We recommend starting with the Hallmarks set (h.all). You can find more information about the sets [[https://www.gsea-msigdb.org/gsea/msigdb/index.jsp|here ]] * 4. Select your uploaded ranked list and click the run button. 3. To run the same type of analysis on the command line, use a command like {{{ java -Xmx512m -cp /usr/lib/share/gsea2/gsea2-2.2.2.jar xtools.gsea.GseaPreranked -gmx gseaftp.broadinstitute.org://pub/gsea/gene_sets/h.all.v5.2.symbols.gmt -collapse false -mode Max_probe -norm meandiv -nperm 1000 -rnk ./MY_COMPARISON.rnk -scoring_scheme weighted -rpt_label GSEA_out_v1 -chip gseaftp.broadinstitute.org://pub/gsea/annotations/GENE_SYMBOL.chip -include_only_symbols true -make_sets true -plot_top_x 20 -rnd_seed timestamp -set_max 500 -set_min 2 -zip_report false -out GSEA_OUT.TEST_v1 -gui false }}} ==== Fast gene set enrichment analysis (fgsea) ==== [https://bioconductor.org/packages/release/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html/ fgsea] ==== Unranked List ==== GSEA will rank the genes 1. Create necessary files in correct format for expression, phenotype and chip annotation ([[http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Using_RNA-seq_Datasets_with_GSEA | see GSEA wiki]]) 1. Use MSigDB for gene sets or create custom gene sets in correct format 1. Run GSEA, use default options to start ==== Single-sample GSEA (ssGSEA) ==== An extension of GSEA that can be used to determine enrichment of gene sets in individual samples. [[http://rowley.mit.edu/caw_web/ssGSEAProjection/ssGSEA_caw_BIG_120314.pptx | ssGSEA Presentation from MIT BMC ]] [[http://software.broadinstitute.org/webservices/gpModuleRepository/download/prod/module/?file=/ssGSEAProjection/broad.mit.edu:cancer.software.genepattern.module.analysis/00270/7.6/ssGSEAProjection.zip | Broad's ssGSEA from GenePattern R/jar scripts]] * NOTE: GSEA should be run on the entire dataset, and not a subset of genes as this may bias the results. See [[https://groups.google.com/forum/#!searchin/gsea-help/subset%7Csort:date/gsea-help/INJ1RpLOBWk/1qwnsMOYAQAJ | GSEA pre-ranked questions (Google Groups)]] === BiNGO === [[http://chianti.ucsd.edu/cyto_web/plugins/displayplugininfo.php?name=BiNGO|BiNGO Plugin]] \\ You need to have [[http://cytoscape.org | Cytoscape]] installed to use BiNGO 1. Start BiNGO via Cytoscape , Plugins->Start BiNGO 1. Get genes from cluster/network or paste gene list 1. Select the correct options (eg. species) 1. Run BiNGO === GeneGO === [[http://portal.genego.com/ | GeneGO Login (Password Required)]] 1. Upload gene list and activate 1. One-click analysis -> Select GeneGo Pathway Maps == Other/Useful Links == [[http://go.princeton.edu/cgi-bin/GOTermFinder | GO Term Finder]] : significant GO terms shared among a list of genes from your organism.[[BR]] [[http://go.princeton.edu/cgi-bin/GOTermMapper | GO Term Mapper]] : maps the granular GO annotations for genes in a list to a set of GO slim terms, allowing you to bin your genes into broad categories. [[http://www.ingenuity.com/products/ipa | Ingenuity IPA]], subscription required. [[http://www.advaitabio.com/ipathwayguide.html | Advaita iPathwayGuide]], login required - subscription required for downloading. == More Information == Hot Topics: [[http://jura.wi.mit.edu/bio/education/hot_topics/enrichment/Gene_list_enrichment_Mar10.pdf | Gene List Enrichment ]]