wiki:SOPs/go_annotation

Version 25 (modified by gbell, 4 years ago) ( diff )

--

Identifying enriched biological themes in gene sets

DAVID

  • DAVID is generally the best place to start your enrichment analysis.
  • DAVID is a tool that analyzes a subset of assayed genes, asking the general question, "What's special about these genes compared to a random list of genes of the same size?"
  • Instructions for using DAVID can be found under Functional Annotation on the DAVID web site.
  • You'll probably end up running DAVID multiple times, with different types of annotations, to get the more informative combination.
  • Full output can be downloaded and viewed as a spreadsheet.

Gene Set Enrichment Analysis (GSEA)

GSEA is very different different from tools like DAVID. GSEA takes as input all assayed genes, along with a metric that GSEA uses to order the genes. Then it asks the general question, "What's special about the order of these genes compared to a randomly ordered list of the same genes?" In other words, it looks for gene annotations that are enriched at the top or bottom of your ordered genes.

GSEA can be run on about any operating system (so on your own computer or on a Whitehead Linux server like tak).

Introductory information about GSEA

GSEAPreranked: start with a list of genes and values

  1. Create a two column file with gene names as first column and numeric values for second column (eg. log2 fold change, log2 ratio). The file does not need to be sorted and it should have extension ".rnk".
    • The second column, used to rank genes, could be log2 fold change, t-statistic, or another scoring scheme that takes into account both log ratio and p-value.
  2. To run using the GUI
  • 1. Start GSEA. On tak, the command is 'gsea'.
  • 2. Upload your ranked file "file.rnk". Click on "Steps in GSEA analysis -> Load data"
  • 3. Click on "Tools -> GseaPreranked"
  • 4. Select one of the gene sets from the "Gene sets database". We recommend starting with the Hallmarks set (h.all). You can find more information about the sets here
  • 5. Select your uploaded ranked list and click the run button.
  1. To run the same type of analysis on the command line, you can see the command the GUI used clicking the "Command" button and run that command in your Linux machine. You will need java 11. For Whitehead users: switch to java 11 with this command "ml java/10" (after this if you run "java --version" you will get "openjdk 11.0.8 2020-07-14")
    gsea4cli GSEAPreranked -gmx ftp.broadinstitute.org://pub/gsea/gene_sets/h.all.v7.2.symbols.gmt -norm meandiv -nperm 1000 -rnk myFile.rnk -scoring_scheme weighted -rpt_label my_analysis -create_svgs false -make_sets true -plot_top_x 20 -rnd_seed timestamp -set_max 500 -set_min 15 -zip_report false -out ./output
    
    • Notice that in our system gseacli points to gsea-cli.sh. If you download your own version of GSEA you will have to run gsea-cli.sh

Traditional GSEA

  1. Create necessary files in correct format for expression, phenotype and chip annotation ( see GSEA wiki)
  2. Use MSigDB for gene sets or create custom gene sets in correct format
  3. Run GSEA, use default options to start

Fast gene set enrichment analysis (fgsea)

| fgsea

Single-sample GSEA (ssGSEA)

An extension of GSEA that can be used to determine enrichment of gene sets in individual samples.

More information

BiNGO

BiNGO Plugin
You need to have Cytoscape installed to use BiNGO

  1. Start BiNGO via Cytoscape , Plugins->Start BiNGO
  2. Get genes from cluster/network or paste gene list
  3. Select the correct options (eg. species)
  4. Run BiNGO

GeneGO

GeneGO Login (Password Required)

  1. Upload gene list and activate
  2. One-click analysis -> Select GeneGo Pathway Maps

GO Term Finder : significant GO terms shared among a list of genes from your organism.

GO Term Mapper : maps the granular GO annotations for genes in a list to a set of GO slim terms, allowing you to bin your genes into broad categories.

Ingenuity IPA, subscription required.

Advaita iPathwayGuide, login required - subscription required for downloading.

More Information

Hot Topics: Gene List Enrichment

Note: See TracWiki for help on using the wiki.