Changes between Initial Version and Version 1 of SOPs/homologous


Ignore:
Timestamp:
01/23/13 16:49:43 (12 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/homologous

    v1 v1  
     1= Identifying homologous genes, proteins, or genome regions =
     2Several ways to identify homologous are listed below.  Given that homology is the presence of shared ancestry, which is difficult to address directly, most of these resources and methods attempt to predict homology using sequence similarity.
     3
     4Homologs can be either **orthologs** (produced by a speciation event) in different species or **paralogs** (produced by a gene duplication event) in the same species.
     5
     6== Use a gene-centric homology database ==
     7
     8=== HomoloGene ===
     9
     10[[http://www.ncbi.nlm.nih.gov/homologene|HomoloGene]] is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes. We have a local MySQL database of homologene on canna.  It is updated every month on the first Wednesday of the month.
     11
     12Database fields include homologene_group_id, taxon_id, gene_id_key, gene_symbol, protein_gi and protein_acc.
     13
     14The database is used for our BaRC tool [[http://iona.wi.mit.edu/bell/homology/homologene.php|Find orthologs]].
     15
     16=== Ensembl ===
     17
     18[[http://www.ensembl.org|Ensembl]] is a comprehensive system for genome annotation that has been applied to a wide variety of organisms.  Ensembl includes the group of genes by homology.  Homolog sets can be obtained by simply going to the gene page in your reference organism, such as [[http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000136574;r=8:11534468-11617511|human GATA4]].  Clicking on the "Orthologues" link on the left-side banner opens a [[http://useast.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?g=ENSG00000136574|Orthologues]] page that lists orthologs, or clicking on a "Gene Tree" link can create
     19   * an [[http://useast.ensembl.org/Homo_sapiens/Gene/Compara_Tree?db=core;g=ENSG00000136574|interactive tree]]
     20   * a computer-readable [[http://useast.ensembl.org/Homo_sapiens/Gene/Compara_Tree/Text?db=core;g=ENSG00000136574|representation of the tree]]
     21   * a [[http://useast.ensembl.org/Homo_sapiens/Gene/Compara_Tree/Align?db=core;g=ENSG0000013657|multiple sequence alignment]] that can be customized by clicking on the "Configure this page" box at left
     22
     23
     24For genome-wide analysis, all Ensembl data (like ortholog alignments) can also be downloaded as large text files.  Most homology data is found in the ensembl_compara database.
     25
     26== Extract information from genome alignments ==
     27
     28=== UCSC Genome Bioinformatics ===
     29
     30[[http://genome.ucsc.edu/|UCSC Genome Bioinformatics]] displays and provides all the data behind genome assemblies, all sorts of data mapped to these assemblies, and genome-genome alignments.  To get a genome-genome alignment of a region of interest, point your genome browser to the desired location like [[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr8:11561717-11617509|human GATA4]] and turn on the [[http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way|Conservation]] track (or similar track for another reference genome) to "full".  Clicking on the [[http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way|Conservation]] link lets one select the genomes to include in the alignment.  Then clicking on the "Multiz Alignments of 46 Vertebrates" (or similar) track  creates a configurable detailed alignment in MAF format (but of not more than 30,000 nt).  Alignments of multiple regions can be obtained using the [[http://genome.ucsc.edu/cgi-bin/hgTables?org=Human&db=hg19|Table Browser]] and selecting
     31  * the desired clade, genome, and assembly
     32  * group = Comparative Genomics; track = Conservation
     33  * table = multiz46way (or related table for another assembly)
     34
     35Genome alignment and conservation metrics can also be downloaded in bulk.  BaRC has placed some alignment files (in MAF format) on tak at /nfs/genomes/GENOME_NAME/maf/ .  Others are available from sites like these:
     36
     37  * [[http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/|Multiple alignments of 45 vertebrate genomes with Human]]
     38  * [[http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/|Conservation scores for alignments of 45 vertebrate genomes with Human]]
     39  * [[http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phyloP46way/|Basewise conservation scores (phyloP) of 45 vertebrate genomes with Human]]
     40  * [[http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/|FASTA alignments of 45 vertebrate genomes with Human for CDS regions]]
     41
     42=== VISTA ===
     43[[http://genome.lbl.gov/vista/index.shtml|VISTA]] also has genome-genome alignments available for download., but the last update appears to be May 2008.
     44
     45== Extract information about protein families ==
     46
     47Many databases are available that contain pre-aligned sequences for protein families.
     48
     49=== Pfam ===
     50The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). It is available at [[http://pfam.janelia.org/|several sites]] and appears to last be updated on May 2010.
     51
     52
     53== Do database searches ==
     54
     55If your favorite species or genes are not included in the above resources, you will have to identify homologs yourself.  On the other hand even if your species and genes are included in the above resources, you may want to verify known or identify new homologs with the methods below:
     56
     57=== Sequence Searching ===
     58We have a tool [[http://iona.wi.mit.edu/bell/comparative/index.php|Find similar genes in another species]] that does a blastp all vs. all comparison to identify similar genes in another species.  The blast searches are redone once a month on the second Wednesday of the month.
     59
     60A reciprocal blast search is a good way of finding homologues.
     61=== Profile Searching ===
     62[[http://hmmer.janelia.org/|hmmer]] is an excellent tool to search for distant homologues.  Rather than searching a database with a single sequence, HMMER can build a profile of related sequences and, thus, more sensitively search a sequence database. The complete [[ftp://selab.janelia.org/pub/software/hmmer3/3.0/Userguide.pdf|user's guide]] for HMMER3 is online. HMMER3 is installed on tak and the cluster.