Changes between Version 4 and Version 5 of SOPs/AlphaFold


Ignore:
Timestamp:
04/22/24 22:15:17 (11 months ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/AlphaFold

    v4 v5  
    22
    33=== Background ===
    4 The success of [https://www.nature.com/articles/s41586-021-03819-2 DeepMind's AlphaFold protein folding algorithm] in the CASP14 structural prediction assessment has been widely celebrated and has profoundly invigorated the structural biology community. Today, if you have a protein sequence for which you'd like to learn a high quality predicted structure, an excellent place to start is the [https://alphafold.ebi.ac.uk/ AlphaFold Protein Structure Database]. An alternative database to search is the [https://esmatlas.com/resources?action=fold ESM Metagenomic Atlas], where you may find predicted structures for orphan proteins with few sequence homologs.
     4The success of [https://www.nature.com/articles/s41586-021-03819-2 DeepMind's AlphaFold protein folding algorithm] in the CASP14 structural prediction assessment has been widely celebrated and has profoundly invigorated the structural biology community. Today, if you have a protein sequence for which you'd like to learn a high quality predicted structure, an excellent place to start is the [https://alphafold.ebi.ac.uk/ AlphaFold Protein Structure Database]. An alternative database to search is the [https://esmatlas.com/resources?action=fold ESM Metagenomic Atlas], where you may find predicted structures for orphan proteins with few sequence homologs.
     5
     6=== Running AlphaFold using ChimeraX ===
     7
     8If you cannot find a predicted structure for your protein within the databases listed above, perhaps because amino acid substitutions relative to the reference sequence are present, [https://www.cgl.ucsf.edu/chimerax/ ChimeraX] is an [https://www.youtube.com/watch?v=gIbCAcMDM7E easy place to start due to its graphical user interface] and convenient visualization tools.
     9
     10=== Running AlphaFold locally ===
     11
     12It may happen that the freely available computational resources accessed via ChimeraX are a constraint on completing your AlphaFold predictions.  In that case, you can make the predictions locally using a command like the following:
     13 
     14{{{
     15sbatch --export=ALL,FASTA_NAME=example.fa,USERNAME='user',FASTA_PATH=proteins,AF2_WORK_DIR=/path/to/working/directory ./RunAlphaFold_2.3.2_slurm.sh
     16}}}
     17
     18In this example, the job that is submitted to the SLURM scheduler might look like:
     19
     20{{{
     21#!/bin/bash
     22
     23#SBATCH --job-name=AF2                  # friendly name for job.
     24#SBATCH --nodes=1                       # ensure cores are on one node
     25#SBATCH --ntasks=1                      # run a single task
     26#SBATCH --cpus-per-task=8               # number of cores/threads requested.
     27#SBATCH --mem=64gb                      # memory requested.
     28#SBATCH --partition=nvidia-t4-20        # partition (queue) to use
     29#SBATCH --output output-%j.out          # %j inserts jobid to STDOUT
     30#SBATCH --gres=gpu:1                    # Required for GPU access
     31
     32export TF_FORCE_UNIFIED_MEMORY=1
     33export XLA_PYTHON_CLIENT_MEM_FRACTION=4
     34
     35export OUTPUT_NAME='model_1'
     36export ALPHAFOLD_DATA_PATH='/alphafold/data.2023b' # Specify ALPHAFOLD_DATA_PATH
     37
     38cd $AF2_WORK_DIR
     39singularity run -B $AF2_WORK_DIR:/af2 -B $ALPHAFOLD_DATA_PATH:/data -B .:/etc --pwd /app/alphafold --nv /alphafold/alphafold_2.3.2.sif --data_dir=/data/ --output_dir=/af2/$FASTA_PATH --fasta_paths=/af2/$FASTA_PATH/$FASTA_NAME --max_template_date=2050-01-01 --db_preset=full_dbs --bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --uniref30_database_path=/data/uniref30/UniRef30_2023_02 --uniref90_database_path=/data/uniref90/uniref90.fasta --mgnify_database_path=/data/mgnify/mgy_clusters_2022_05.fa --template_mmcif_dir=/data/pdb_mmcif/mmcif_files --obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat --use_gpu_relax=True --model_preset=monomer --pdb70_database_path=/data/pdb70/pdb70
     40
     41# Email the STDOUT output file to specified address.
     42/usr/bin/mail -s "$SLURM_JOB_NAME $SLURM_JOB_ID" $USERNAME@wi.mit.edu < $AF2_WORK_DIR/output-${SLURM_JOB_ID}.out
     43}}}