Review Articles
Review article by Cedric Notredame
Genome Technology article by Fran and George
Start with sequences from a database
Useful alignment algorithms - available on the web, for desktop computers, and for Linux systems
muscle muscle -in MySeqs.fa -out MySeqs.aligned.fa
t_coffee t_coffee MySeqs.fa [will create MySeqs.aln]
mafft mafft --auto MySeqs.fa > MySeqs.aligned.fa
The output from these programs can then be visualized in ClustalX or Jalview.
Our favorite method is to use the T-COFFEE suite (more specifically, M-Coffee) to run multiple alignment methods and then create a consensus alignment, a sort of a meta-alignment. This can be done with a single command like
t_coffee my_proteins.fa -method=t_coffee_msa,mafft_msa,probcons_msa,muscle_msa -output=fasta_aln
The final consensus alignment will appear in the file my_proteins.fasta_aln, which can them be viewed in ClustalX.
Start with genome coordinates
This method will detail how to extract a slice of a pre-computed genome-genome alignment from the UCSC Genome Browser.
- Select region in UCSC Genome Browser (ex: chr19:100,000-100,100 in human hg19 assembly)
- Click on Tools > Table Browser
- In the Table Browser, make these selections:
- group => Comparative Genomics
- table => Multiz Align
- region => position (which should show the region you began with in the genome browser)
- output format => maf
- Click "get output" button
- Copy/paste or save MAF file (ex: MyRegion.maf)
- Few programs understand MAF format, so you may need to convert the alignment to a format like fasta.
- By selecting the "-" strand, the program will reverse-complement the alignment.
- 'refGenome' should be the UCSC Bioinformatics assembly name (like hg19 or mm9).
# USAGE: maf_alignment_to_fasta.pl mafFile refGenome regionName strand[+-] > regionName.fa Sample command: /nfs/BaRC_Public/BaRC_code/Perl/maf_alignment_to_fasta/maf_alignment_to_fasta.pl MyRegion.maf hg19 myRegion + > Gata4.NEW.fa