wiki:SOPs/go_annotation

Context Navigation

Version 17 (modified by ibarrasa, 5 years ago) ( diff )
--

Identifying enriched biological themes in gene sets

DAVID

DAVID is generally the best place to start your enrichment analysis.
Instructions for using DAVID can be found under Functional Annotation on the DAVID web site.
You'll probably end up running DAVID multiple times, with different types of annotations, to get the more informative combination.
Full output can be downloaded and viewed as a spreadsheet.

Gene Set Enrichment Analysis (GSEA)

Broad GSEA

Download the GSEA software and additional resources to analyze, annotate and interpret enrichment results.

Explore the Molecular Signatures Database (MSigDB), a collection of annotated gene sets for use with GSEA software.

GSEA and MSigDB documentation

Guidelines for using RNA-seq datasets with GSEA

Ranked List

Create a two column file with gene names as first column and numeric values for second column (eg. log2 fold change, log2 ratio). The file does not need to be sorted and it should have extension ".rnk".
- The second column, used to rank genes, could be log2 fold change, t-statistic, or another scoring scheme that takes into account both log ratio and p-value.
To run using the GUI
- 1. Upload your ranked file "file.rnk". Click on "Steps in GSEA analysis -> Load data"
- 2. Click on "Tools -> GseaPreranked"
- 3. Select one of the gene sets from the "Gene sets database". We recommend starting with the Hallmarks set (h.all). You can find more information about the sets here
- 4. Select your uploaded ranked list and click the run button.
To run the same type of analysis on the command line, you can see the command the GUI used clicking the "Command" button and run that command in your Linux machine. You will need java 11. For Whitehead users: switch to java 11 with this command "ml java/10" (after this if you run "java --version" you will get "openjdk 11.0.8 2020-07-14")
```
gsea-cli.sh GSEAPreranked -gmx ftp.broadinstitute.org://pub/gsea/gene_sets/h.all.v7.2.symbols.gmt -norm meandiv -nperm 1000 -rnk myFile.rnk -scoring_scheme weighted -rpt_label my_analysis -create_svgs false -make_sets true -plot_top_x 20 -rnd_seed timestamp -set_max 500 -set_min 15 -zip_report false -out ./output/sep28
```

Unranked List

GSEA will rank the genes

Create necessary files in correct format for expression, phenotype and chip annotation ( see GSEA wiki)
Use MSigDB for gene sets or create custom gene sets in correct format
Run GSEA, use default options to start

Fast gene set enrichment analysis (fgsea)

fgsea

Single-sample GSEA (ssGSEA)

An extension of GSEA that can be used to determine enrichment of gene sets in individual samples.

ssGSEA Presentation from MIT BMC

Broad's ssGSEA from GenePattern R/jar scripts

NOTE: GSEA should be run on the entire dataset, and not a subset of genes as this may bias the results. See GSEA pre-ranked questions (Google Groups)

BiNGO

BiNGO Plugin
You need to have Cytoscape installed to use BiNGO

Start BiNGO via Cytoscape , Plugins->Start BiNGO
Get genes from cluster/network or paste gene list
Select the correct options (eg. species)
Run BiNGO

GeneGO

GeneGO Login (Password Required)

Upload gene list and activate
One-click analysis -> Select GeneGo Pathway Maps

Other/Useful Links

GO Term Finder : significant GO terms shared among a list of genes from your organism.

GO Term Mapper : maps the granular GO annotations for genes in a list to a set of GO slim terms, allowing you to bin your genes into broad categories.

Ingenuity IPA, subscription required.

Advaita iPathwayGuide, login required - subscription required for downloading.

More Information

Hot Topics: Gene List Enrichment

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text