== Predicting protein structure from sequence using AlphaFold == === Background === The success of [https://www.nature.com/articles/s41586-021-03819-2 DeepMind's AlphaFold protein folding algorithm] in the [https://onlinelibrary.wiley.com/toc/10970134/2021/89/12 CASP14 structural prediction assessment] has been widely [https://www.nature.com/articles/d41586-020-03348-4 celebrated] and has profoundly invigorated the structural biology community. Today, if you have a protein sequence for which you'd like to learn a high quality predicted structure, an excellent place to start is the [https://alphafold.ebi.ac.uk/ AlphaFold Protein Structure Database]. An alternative database to search is the [https://esmatlas.com/resources?action=fold ESM Metagenomic Atlas], where you may find predicted structures for orphan proteins with few sequence homologs. === Running AlphaFold using ChimeraX === If you cannot find a predicted structure for your protein within the databases listed above, perhaps because amino acid substitutions relative to the reference sequence are present, [https://www.cgl.ucsf.edu/chimerax/ ChimeraX] is an [https://www.youtube.com/watch?v=gIbCAcMDM7E easy place to start due to its graphical user interface] and convenient visualization tools. You will need to install ChimeraX on a desktop or laptop computer, but the AlphaFold predictions will be made using computing resources in the cloud via the [https://www.nature.com/articles/s41592-022-01488-1 ColabFold] implementation of AlphaFold, which uses [https://www.nature.com/articles/nbt.3988 MMseqs2] to efficiently compute an initial multiple sequence alignment. === Running AlphaFold using computing resources at Whitehead === It may happen that the freely available computational resources accessed via ChimeraX are a constraint on completing your AlphaFold predictions. In that case, you can make the predictions locally using a command like the following: {{{ sbatch --export=ALL,FASTA_NAME=example.fa,USERNAME='user',FASTA_PATH=/path/to/fasta/file,AF2_WORK_DIR=/path/to/working/directory ./RunAlphaFold_2.3.2_slurm.sh }}} In the command above, substitute your own user id, fasta file and the paths to both the fasta file and the working directory. In this example, the job that is submitted to the SLURM scheduler might look like: {{{ #!/bin/bash #SBATCH --job-name=AF2 # friendly name for job. #SBATCH --nodes=1 # ensure cores are on one node #SBATCH --ntasks=1 # run a single task #SBATCH --cpus-per-task=8 # number of cores/threads requested. #SBATCH --mem=64gb # memory requested. #SBATCH --partition=nvidia-t4-20 # partition (queue) to use #SBATCH --output output-%j.out # %j inserts jobid to STDOUT #SBATCH --gres=gpu:1 # Required for GPU access export TF_FORCE_UNIFIED_MEMORY=1 export XLA_PYTHON_CLIENT_MEM_FRACTION=4 export OUTPUT_NAME='model_1' export ALPHAFOLD_DATA_PATH='/alphafold/data.2023b' # Specify ALPHAFOLD_DATA_PATH cd $AF2_WORK_DIR singularity run -B $AF2_WORK_DIR:/af2 -B $ALPHAFOLD_DATA_PATH:/data -B .:/etc --pwd /app/alphafold --nv /alphafold/alphafold_2.3.2.sif --data_dir=/data/ --output_dir=/af2/$FASTA_PATH --fasta_paths=/af2/$FASTA_PATH/$FASTA_NAME --max_template_date=2050-01-01 --db_preset=full_dbs --bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --uniref30_database_path=/data/uniref30/UniRef30_2023_02 --uniref90_database_path=/data/uniref90/uniref90.fasta --mgnify_database_path=/data/mgnify/mgy_clusters_2022_05.fa --template_mmcif_dir=/data/pdb_mmcif/mmcif_files --obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat --use_gpu_relax=True --model_preset=monomer --pdb70_database_path=/data/pdb70/pdb70 # Email the STDOUT output file to specified address. /usr/bin/mail -s "$SLURM_JOB_NAME $SLURM_JOB_ID" $USERNAME@wi.mit.edu < $AF2_WORK_DIR/output-${SLURM_JOB_ID}.out }}}