Review Articles

Review article by Cedric Notredame
Genome Technology article by Fran and George

Start with sequences from a database

Useful alignment algorithms - available on the web, for desktop computers, and for Linux systems

muscle muscle -in MySeqs.fa -out MySeqs.aligned.fa

t_coffee t_coffee MySeqs.fa [will create MySeqs.aln]

mafft mafft --auto MySeqs.fa > MySeqs.aligned.fa

The output from these programs can then be visualized in ClustalX or Jalview.

Our favorite method is to use the T-COFFEE suite (more specifically, M-Coffee) to run multiple alignment methods and then create a consensus alignment, a sort of a meta-alignment. This can be done with a single command like

t_coffee my_proteins.fa -method=t_coffee_msa,mafft_msa,probcons_msa,muscle_msa -output=fasta_aln

The final consensus alignment will appear in the file my_proteins.fasta_aln, which can them be viewed in ClustalX.

Start with genome coordinates

This method will detail how to extract a slice of a pre-computed genome-genome alignment from the UCSC Genome Browser.

  • Select region in UCSC Genome Browser (ex: chr19:100,000-100,100 in human hg19 assembly)
  • Click on Tools > Table Browser
  • In the Table Browser, make these selections:
    • group => Comparative Genomics
    • table => Multiz Align
    • region => position (which should show the region you began with in the genome browser)
    • output format => maf
  • Click "get output" button
  • Copy/paste or save MAF file (ex: MyRegion.maf)
  • Few programs understand MAF format, so you may need to convert the alignment to a format like fasta.
    • By selecting the "-" strand, the program will reverse-complement the alignment.
    • 'refGenome' should be the UCSC Bioinformatics assembly name (like hg19 or mm9).
# USAGE: mafFile refGenome regionName strand[+-] > regionName.fa
Sample command:
  /nfs/BaRC_Public/BaRC_code/Perl/maf_alignment_to_fasta/ MyRegion.maf hg19 myRegion + > Gata4.NEW.fa