Skip to content

chenxingqiang/alphafold-notebooks

Repository files navigation

logo

A reference of 'AlphaFold2 Codec' include everything of AlphaFold2.

proteins


Learning Source Availability

Papers

PPT

  • My Public talk on Alphafold2 Paper Reading By Xingqiang,Chen .Key/.pptx in AF2-PPT file.
  • Sergey Ovchinnikov talk on AF2 slides /.pptx in AF2-PPT file.

Learning by Code

📓 AlphaFold2 Algorithm Notebooks (32 Complete!)

We provide 32 Jupyter Notebooks covering every algorithm from the AlphaFold2 supplementary materials. Each notebook includes:

  • Algorithm pseudocode/image reference
  • Source code location mapping
  • NumPy implementation
  • Executable test cases with verification

👉 Full Algorithm Index

Quick Links by Category

Category Algorithms Notebooks
Data Preprocessing MSA Block Deletion Algorithm 1
Embedding Input Embedder, relpos, one_hot Alg 3, Alg 4, Alg 5
Evoformer Stack, MSA Attention, Triangle Ops Alg 6-15
Templates Pair Stack, Pointwise Attention Alg 16, Alg 17
Extra MSA Stack, Global Attention Alg 18, Alg 19
Structure Module IPA, Backbone, Atom Coords Alg 20-25
Losses FAPE, Torsion, pLDDT Alg 26-29
Recycling Inference, Training, Embedder Alg 30, Alg 31, Alg 32
Main Pipeline Full Inference Algorithm 2
📋 Complete Algorithm List (Click to Expand)
# Algorithm Notebook Link
1 MSA Block Deletion algorithm-1-MSABlockDeletion.ipynb
2 Inference algorithm-2-Inference.ipynb
3 Input Embedder algorithm-3-InputEmbedder.ipynb
4 relpos algorithm-4-relpos.ipynb
5 one_hot algorithm-5-one_hot.ipynb
6 Evoformer Stack algorithm-6-EvoformerStack.ipynb
7 MSA Row Attention with Pair Bias algorithm-7-MSARowAttentionWithPairBias.ipynb
8 MSA Column Attention algorithm-8-MSAColumnAttention.ipynb
9 MSA Transition algorithm-9-MSATransition.ipynb
10 Outer Product Mean algorithm-10-OuterProductMean.ipynb
11 Triangle Multiplication (Outgoing) algorithm-11-TriangleMultiplicationOutgoing.ipynb
12 Triangle Multiplication (Incoming) algorithm-12-TriangleMultiplicationIncoming.ipynb
13 Triangle Attention (Starting Node) algorithm-13-TriangleAttentionStartingNode.ipynb
14 Triangle Attention (Ending Node) algorithm-14-TriangleAttentionEndingNode.ipynb
15 Pair Transition algorithm-15-PairTransition.ipynb
16 Template Pair Stack algorithm-16-TemplatePairStack.ipynb
17 Template Pointwise Attention algorithm-17-TemplatePointwiseAttention.ipynb
18 Extra MSA Stack algorithm-18-ExtraMsaStack.ipynb
19 MSA Column Global Attention algorithm-19-MSAColumnGlobalAttention.ipynb
20 Structure Module algorithm-20-StructureModule.ipynb
21 Rigid from 3 Points algorithm-21-rigidFrom3Points.ipynb
22 Invariant Point Attention algorithm-22-InvariantPointAttention.ipynb
23 Backbone Update algorithm-23-BackboneUpdate.ipynb
24 Compute All Atom Coordinates algorithm-24-computeAllAtomCoordinates.ipynb
25 makeRotX algorithm-25-makeRotX.ipynb
26 Rename Symmetric Ground Truth Atoms algorithm-26-renameSymmetricGroundTruthAtoms.ipynb
27 Torsion Angle Loss algorithm-27-torsionAngleLoss.ipynb
28 Compute FAPE algorithm-28-computeFAPE.ipynb
29 Predict Per-Residue LDDT algorithm-29-predictPerResidueLDDT.ipynb
30 Recycling (Inference) algorithm-30-RecyclingInference.ipynb
31 Recycling (Training) algorithm-31-RecyclingTraining.ipynb
32 Recycling Embedder algorithm-32-RecyclingEmbedder.ipynb

📓 AlphaFold3 Algorithm Notebooks (NEW!)

We now include AlphaFold3 algorithm notebooks! AF3 introduces significant architectural changes including diffusion-based structure prediction.

👉 AlphaFold3 Algorithm Index

Key AF3 Components

Category Key Algorithms Notebooks
Input MSA Features, Templates, Atom Features Alg 1-4
MSA Module Outer Product, MSA Attention Alg 5-7
Pairformer Triangle Ops, Single Attention Alg 8-14
Diffusion Diffusion Module, AdaLN, Transformer Alg 15, Alg 16
Confidence Distogram, Confidence, LDDT Alg 20-23

AF3 Source Code Submodules

# Official AlphaFold3
AF3-Ref-src/alphafold3-official/

# PyTorch Implementation (lucidrains)
AF3-Ref-src/alphafold3-pytorch/

# Architecture Walkthrough
AF3-Ref-src/alphafold3-walkthrough/

📓 Boltz Algorithm Notebooks (NEW!)

We now include Boltz algorithm notebooks! Boltz is a family of models for biomolecular interaction prediction:

  • Boltz-1: First fully open source model to approach AlphaFold3 accuracy
  • Boltz-2: Adds binding affinity prediction, approaching FEP accuracy 1000x faster

👉 Boltz Algorithm Index

Key Boltz Components

Category Key Algorithms Notebooks
Input Processing Input Embedder, Atom Encoder, RelPos Alg 1-3
MSA Module MSA Module, Outer Product, Pair Averaging Alg 4-6
Pairformer Pairformer, Triangle Ops, Attention Alg 7-11
Diffusion Diffusion Module, Transformer, Fourier Alg 12-15
Confidence & Affinity Confidence, Distogram, Affinity (Boltz-2) Alg 16-18
Loss Functions Diffusion Loss, Confidence Loss Alg 19-20

Boltz Source Code Submodule

# Official Boltz Repository
Boltz-Ref-src/boltz-official/

Papers:

📓 Boltz-2 Specific Notebooks (NEW!)

Boltz-2 introduces binding affinity prediction - the first DL model approaching FEP accuracy while being 1000x faster.

👉 Boltz-2 Algorithm Index

Boltz-2 New Features

Category Key Algorithms Notebooks
Affinity Prediction Affinity Module, Gaussian Smearing Alg 1-2
Contact Guidance Contact Conditioning Alg 3
Enhanced v2 Modules Input v2, Template v2, Diffusion v2 Alg 5-7
Improved Confidence Confidence v2, B-Factor Alg 8, 10

Boltz-2 Submodules

# Official Repository (contains both Boltz-1 and Boltz-2)
Boltz-Ref-src/boltz-official/

# Boltzina - Virtual Screening with Boltz-2
Boltz-Ref-src/boltzina/

Practice on Modeling Test of AF2

MD+Alphafold2


🔧 Fine-tuning Framework (NEW!)

We provide a comprehensive fine-tuning framework for adapting protein structure prediction models to downstream tasks.

👉 Full Fine-tuning Guide

Supported Models

Model Framework Fine-tuning Support
AlphaFold2 JAX/Haiku ✅ Full, Head-only, LoRA
AlphaFold3 JAX/Haiku ✅ Full, Head-only, LoRA
Boltz-1 PyTorch ✅ Full, LoRA, Adapter
Boltz-2 PyTorch ✅ Full, LoRA, Adapter

Fine-tuning Strategies

Strategy Trainable Params Use Case
LoRA ~0.1% Small datasets, efficient fine-tuning
Adapter ~1% Modular, multiple tasks
Head-only ~5% New prediction tasks
Full 100% Large datasets, maximum performance

Supported Tasks (50+ Task Types)

We support comprehensive task coverage inspired by production platforms like ProteinBase.com:

💊 Drug Discovery
Task Outputs Applications
Binding Affinity pKd, pIC50, ΔG, Ki Lead optimization, SAR
Virtual Screening Hit probability, ranking HTS prioritization
ADMET Absorption, metabolism, toxicity Compound triage
🔬 Protein Engineering
Task Outputs Applications
Stability ΔΔG, Tm shift Thermostabilization
Solubility Expression score Biomanufacturing
Mutation Effects Fitness, pathogenicity Variant analysis
🧫 Antibody Design
Task Outputs Applications
Affinity Maturation CDR binding, ΔΔG Therapeutic optimization
Humanization Humanness score Drug development
Developability Aggregation, viscosity Manufacturing
⚗️ Enzyme Engineering
Task Outputs Applications
Activity kcat, Km, kcat/Km Catalyst design
Specificity Substrate profiles Industrial enzymes
Directed Evolution Fitness landscapes Protein engineering
🔗 Protein-Protein Interactions
Task Outputs Applications
PPI Binding Kd, interface stability Complex analysis
Interface Prediction Contact residues Structure analysis
Hot Spot Detection ΔΔG per residue PPI drug targets
🧬 Function Prediction
Task Outputs Applications
GO Terms MF, BP, CC Annotation
EC Numbers Enzyme classification Function discovery
Localization Subcellular compartment Systems biology
🛡️ Immunology
Task Outputs Applications
B-cell Epitopes Epitope probability Vaccine design
T-cell Epitopes MHC binding Immunotherapy
Immunogenicity ADA risk Drug safety
📊 Structure Quality
Task Outputs Applications
Confidence pLDDT, pAE, pTM Model validation
Disorder IDR prediction Structure analysis
Contacts Distance maps Validation

Quick Start

from finetuning import TaskRegistry, create_finetuning_pipeline
from finetuning.modules import LoRAModule
from finetuning.heads import AffinityHead

# Option 1: Use Task Registry (Recommended)
# List all available tasks
print(TaskRegistry.list_all_tasks())  # 50+ tasks

# Get task info and recommendations
info = TaskRegistry.get_task_info("binding_affinity")
print(f"Recommended LoRA rank: {info.recommended_rank}")

# Create pipeline automatically
pipeline = create_finetuning_pipeline(
    task="binding_affinity",
    base_model=model,
    strategy="lora",
)

# Option 2: Manual Setup
from finetuning import FineTuningConfig, Trainer

# 1. Load pretrained model
model = load_pretrained_boltz2()

# 2. Apply LoRA (only ~0.1% parameters trainable)
lora_model = LoRAModule(model, rank=8, alpha=16.0)

# 3. Add task-specific head
affinity_head = AffinityHead(AffinityHeadConfig())

# 4. Train
config = FineTuningConfig(
    strategy="lora",
    task="binding_affinity",
    lora_rank=8,
)
trainer = Trainer(lora_model, config, train_loader, val_loader)
trainer.train()

# 5. Save lightweight LoRA weights
lora_model.save_lora_weights("./lora_weights.pt")

Module Overview

finetuning/
├── configs/           # Configuration classes
│   ├── base_config.py      # FineTuningConfig, ModelConfig, TrainingConfig
│   ├── lora_config.py      # LoRA-specific configuration
│   └── task_config.py      # 25+ task configurations (ProteinBase-style)
├── modules/           # Fine-tuning modules
│   ├── lora.py             # LoRA implementation (PyTorch & JAX)
│   ├── adapter.py          # Adapter modules
│   └── prompt_tuning.py    # Prompt tuning
├── heads/             # Task-specific prediction heads (15+ specialized heads)
│   ├── affinity_head.py    # Binding affinity (Boltz-2 style)
│   ├── property_head.py    # Protein property prediction
│   ├── contact_head.py     # Contact prediction
│   ├── antibody_head.py    # Affinity maturation, humanization, developability
│   ├── ppi_head.py         # PPI binding, interface, hot spots
│   ├── enzyme_head.py      # Activity, specificity, evolution
│   ├── function_head.py    # GO terms, EC numbers, localization
│   └── epitope_head.py     # B-cell, T-cell epitopes, immunogenicity
├── trainers/          # Training utilities
│   ├── trainer.py          # Main trainer class
│   ├── distributed_trainer.py  # Multi-GPU training
│   └── callbacks.py        # Training callbacks (EarlyStopping, Wandb, etc.)
├── data/              # Data utilities
│   ├── datasets.py         # 10+ dataset classes for all task types
│   └── transforms.py       # Data augmentation (rotation, MSA dropout)
├── examples/          # Tutorial notebooks
│   └── finetuning_tutorial.ipynb  # Complete walkthrough
├── registry.py        # Task registry and factory pattern
└── utils/             # Utility functions
    ├── checkpoint.py       # Model checkpointing
    └── metrics.py          # Evaluation metrics (lDDT, TM-score, AUROC, etc.)

Blogs

References

reference papers

📦 AlphaFold2 Reference Source Code (Submodules)

# Official AlphaFold (DeepMind)
AF2-Ref-src/alphafold-official/

# OpenFold (PyTorch implementation)
AF2-Ref-src/openfold/

# ColabFold (Colab-friendly version)
AF2-Ref-src/colabfold/

# MMseqs2 (Sequence search)
AF2-Ref-src/mmseqs2/

# HH-suite (Template search)
AF2-Ref-src/hh-suite/

# trRosetta2 (Predecessor model)
AF2-Ref-src/trRosetta2/

# ESM (Facebook protein language model)
AF2-Ref-src/esm/

# UniRep (Protein representations)
AF2-Ref-src/unirep/

# SeqVec (Sequence embeddings)
AF2-Ref-src/seqvec/

To initialize submodules after cloning:

git submodule update --init --recursive

Data availability

All input data are freely available from public sources.

Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out).

Training used a version of the PDB downloaded 28/08/2019, while CASP14 template search used a version downloaded 14/05/2020. Template search also used the PDB70 data- base, downloaded 13/05/2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).

We show experimental structures from the PDB with accessions 6Y4F76, 6YJ177, 6VR478, 6SK079, 6FES80, 6W6W81, 6T1Z82, and 7JTL83.

For MSA lookup at both training and prediction time,

we used UniRef90 v2020_01 (https://ftp.ebi.ac.uk/pub/databases/uniprot/previous_releases/release-2020_01/uniref/),

BFD (https://bfd.mmseqs.com), Uniclust30 v2018_08 (https://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/),

and MGnify clusters v2018_12 (https://ftp.ebi.ac.uk/pub/databases/metagenomics/peptide_database/2018_12/). Uniclust30 v2018_08 was further used as input for constructing a distillation structure dataset.

Code and programmings availability

Source code

for the AlphaFold model, trained weights, and an inference script is available under an open-source license at https://github.com/deepmind/alphafold.

Neural networks

Neural networks were developed with

MSA search

For MSA search on

  • UniRef90, MGnify clusters, and reduced BFD we used jackhmmer and for template search on the PDB SEQRES we used
  • hmmsearch, both from HMMER v3.3 (http://eddylab.org/soft-ware/hmmer/).

For template search against PDB70, we used HHsearch from HH-suite v3.0-beta.3 14/07/2017 (https://github.com/soedinglab/hh-suite). For constrained relaxation of structures, we used OpenMM v7.3.1 (https://github.com/openmm/openmm) with the Amber99sb force field.

Docking analysis

Docking analysis on DGAT used

Data analysis

Data analysis used

Structure analysis

Structure analysis used Pymol v2.3.0 (https://github.com/schrodinger/pymol-open-source).

About

A C, C++, Python project focusing on Docking analysis, Source code, Blogs, Data availability, References.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors