Combine AlphaFold2 structure prediction with molecular docking for drug discovery. Learn the complete workflow from structure preparation to virtual screening and hit identification.
#Why Dock to AlphaFold2 Structures?
For most drug targets, experimental structures don't exist. AlphaFold2 enables structure-based drug design for:
- Novel therapeutic targets without crystal structures
- Isoform-specific inhibitor design
- Proteins from non-model organisms
- Orphan GPCRs and understudied targets
Success Stories
#Structure Preparation
Step 1: Quality Assessment
Before docking, ensure your structure is suitable:
- pLDDT > 80: For binding site residues
- pTM > 0.7: Overall structure confidence
- Low PAE: Binding site region should be well-defined
- No clashes: Check for steric conflicts
Red Flags
- Binding site has pLDDT < 70
- Known to exist in multiple conformations
- Significant disagreement between 5 models
Step 2: Protein Preparation
# Using Biopython and RDKit
from Bio.PDB import PDBParser, PDBIO, Select
from rdkit import Chem
from rdkit.Chem import AllChem
# Load AlphaFold2 structure
parser = PDBParser()
structure = parser.get_structure('protein', 'alphafold_model.pdb')
# 1. Remove heteroatoms (keep only protein)
class ProteinSelect(Select):
def accept_residue(self, residue):
return residue.id[0] == ' '
io = PDBIO()
io.set_structure(structure)
io.save('protein_only.pdb', ProteinSelect())
# 2. Add hydrogens (for docking)
# Use reduce, pdb2pqr, or OpenBabel
import subprocess
subprocess.run(['reduce', 'protein_only.pdb', '>',
'protein_H.pdb'], shell=True)
# 3. Assign charges (using pdb2pqr with PARSE force field)
subprocess.run(['pdb2pqr', '--ff=PARSE',
'protein_H.pdb', 'protein_prepared.pqr'])
#Binding Site Identification
Option 1: Known Binding Site
If you know the binding site from literature or homology:
- Identify conserved residues from sequence alignment
- Use structure superposition with known complexes
- Check binding site confidence (pLDDT scores)
Option 2: Binding Site Prediction
For unknown binding sites, use prediction tools:
- fpocket: Geometry-based pocket detection
- P2Rank: Machine learning-based prediction
- SiteMap: Druggability assessment
- DoGSiteScorer: Pocket detection and scoring
# Using fpocket for binding site detection
import subprocess
# Run fpocket
subprocess.run(['fpocket', '-f', 'protein_prepared.pdb'])
# Parse output to get top pockets
# Output: pockets ranked by druggability score
# Example output:
# Pocket 1: Score 25.3, Volume 450 Ų (likely binding site)
# Pocket 2: Score 12.1, Volume 180 Ų
# ...#Docking Methods
AutoDock Vina
Fast and accurate, ideal for virtual screening:
# 1. Prepare protein (PDBQT format)
prepare_receptor4.py -r protein_prepared.pdb -o protein.pdbqt
# 2. Define search space (binding site)
# Create config file: config.txt
center_x = 25.0
center_y = 30.0
center_z = 15.0
size_x = 20.0
size_y = 20.0
size_z = 20.0
exhaustiveness = 32
# 3. Prepare ligand
obabel ligand.sdf -O ligand.pdbqt
# 4. Run docking
vina --receptor protein.pdbqt \
--ligand ligand.pdbqt \
--config config.txt \
--out ligand_docked.pdbqtGlide (Schrödinger)
High accuracy for lead optimization:
- HTVS: High-throughput virtual screening (fast)
- SP: Standard precision (balanced)
- XP: Extra precision (accurate, slow)
GOLD
Flexible docking with genetic algorithm:
- Better for flexible binding sites
- Handles metal coordination
- Accurate for challenging targets
#Virtual Screening Workflow
Step 1: Compound Library Preparation
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski
def filter_druglike(sdf_file, output_file):
"""Filter for drug-like compounds (Lipinski's Rule of 5)"""
supplier = Chem.SDMolSupplier(sdf_file)
writer = Chem.SDWriter(output_file)
for mol in supplier:
if mol is None:
continue
# Lipinski's Rule of 5
mw = Descriptors.MolWt(mol)
logp = Descriptors.MolLogP(mol)
hbd = Descriptors.NumHDonors(mol)
hba = Descriptors.NumHAcceptors(mol)
if (mw <= 500 and logp <= 5 and
hbd <= 5 and hba <= 10):
writer.write(mol)
writer.close()
# Example: Filter ZINC database
filter_druglike('zinc_library.sdf', 'zinc_druglike.sdf')Step 2: High-Throughput Docking
# Parallel docking with Vina
# Split library into chunks for parallelization
# For each chunk:
for ligand in ligand_library/*.pdbqt; do
vina --receptor protein.pdbqt \
--ligand $ligand \
--config config.txt \
--out docked_$(basename $ligand) &
done
wait # Wait for all jobs to complete
# Collect results
python collect_docking_scores.py > screening_results.csvStep 3: Hit Selection
Prioritize compounds based on:
- Binding affinity: Docking score (lower = better)
- Binding mode: Key interactions with active site
- Ligand efficiency: Affinity per heavy atom
- Druglikeness: ADME properties
import pandas as pd
# Load docking results
df = pd.read_csv('screening_results.csv')
# Calculate ligand efficiency
df['LE'] = -df['binding_affinity'] / df['num_heavy_atoms']
# Filter top hits
top_hits = df[(df['binding_affinity'] < -8.0) & # Strong binding
(df['LE'] > 0.3) & # Good efficiency
(df['num_interactions'] >= 3)] # Multiple contacts
# Sort by binding affinity
top_hits = top_hits.sort_values('binding_affinity')
print(f"Selected {len(top_hits)} hits for experimental validation")#Result Analysis
Binding Mode Analysis
Analyze predicted protein-ligand interactions:
- Hydrogen bonds: Donor-acceptor distances < 3.5Å
- Hydrophobic contacts: Carbon-carbon distances < 4.5Å
- π-π stacking: Aromatic ring interactions
- Salt bridges: Charged residue interactions
from rdkit import Chem
from rdkit.Chem import AllChem
def analyze_interactions(protein_pdb, ligand_mol2):
"""Identify protein-ligand interactions"""
# Load structures
protein = Chem.MolFromPDBFile(protein_pdb)
ligand = Chem.MolFromMol2File(ligand_mol2)
# Find interactions
interactions = {
'h_bonds': [],
'hydrophobic': [],
'pi_stacking': []
}
# Analyze distances between protein and ligand atoms
# Classify by interaction type
return interactions
# Visualize in PyMOL
interactions = analyze_interactions('complex.pdb', 'ligand.mol2')
print(f"H-bonds: {interactions['h_bonds']}")
print(f"Hydrophobic: {interactions['hydrophobic']}")Visualization
# PyMOL visualization script
pymol -c << EOF
load protein.pdb
load ligand_docked.pdb
# Style protein
hide everything, protein
show cartoon, protein
color grey80, protein
# Style ligand
show sticks, ligand
color yellow, ligand
util.cbay ligand
# Show binding site residues
select binding_site, (protein within 5 of ligand)
show sticks, binding_site
color green, binding_site
# Show interactions
distance h_bonds, (ligand), (binding_site), mode=2
hide labels, h_bonds
# Save image
ray 1200, 1200
png complex.png
EOF#Experimental Validation
Biophysical Assays
- SPR/BLI: Direct binding measurement (Kd)
- ITC: Binding thermodynamics
- DSF: Thermal shift assay (ligand stabilization)
- NMR: Binding site mapping
Functional Assays
- Enzyme inhibition assays (IC50)
- Cell-based assays (EC50, cell viability)
- Reporter gene assays
- Phenotypic screening
Hit Rate Expectations
- Virtual screening hit rate: 1-5% typically
- Validated binders (SPR): 10-30% of hits
- Active in functional assay: 5-15% of binders
#Advanced Techniques
Induced-Fit Docking
Account for protein flexibility upon ligand binding:
- Allow side-chain flexibility in binding site
- Refine poses with local minimization
- Better for flexible binding sites
MD Refinement
Refine docking poses with molecular dynamics:
# Short MD simulation (10-50 ns) to refine pose
# 1. Prepare complex
# 2. Solvate and add ions
# 3. Energy minimize
# 4. Equilibrate (NPT)
# 5. Production MD
# Calculate binding free energy with MM-PBSA
gmx_MMPBSA -i mmpbsa.in -cs complex.tpr -ci index.ndxFragment-Based Design
Build up ligands from fragment hits:
- Dock fragment library (MW < 300)
- Identify hot spots for each subpocket
- Link fragments to design larger inhibitors
- Optimize with structure-based design
#Case Studies
Case 1: Kinase Inhibitor Discovery
Target: Novel kinase (no experimental structure)
AlphaFold2: pTM 0.88, ATP-binding site pLDDT > 90
Virtual screening:
- Library: 50,000 kinase-focused compounds
- Docking: AutoDock Vina
- Top 100 hits selected (score < -9.0 kcal/mol)
Experimental validation:
- 47/100 showed binding by SPR (47% hit rate!)
- 12/47 active in enzyme assay (IC50 < 10 μM)
- Best hit: IC50 = 850 nM
Lead optimization ongoingCase 2: GPCR Ligand Identification
Target: Orphan GPCR
AlphaFold2-Multimer: Predicted with Gα protein for active state
Approach:
- Homology with known GPCR ligands
- Docked library of GPCR-focused compounds
- Selected 50 compounds for testing
Results:
- 3 compounds showed β-arrestin recruitment (EC50 10-50 μM)
- Confirmed first known agonists for this receptor
- Used for deorphanization studies#Tools and Software
- AutoDock Vina: Fast, free docking (recommended)
- Schrödinger Suite: Commercial, high accuracy
- GOLD: Flexible docking specialist
- rDock: Fast, free, good for VS
- PyMOL: Visualization and analysis
Need Drug Discovery Support?
Partner with us for structure-based drug design projects
We're always excited to hear about new ideas and challenges!
#Best Practices Summary
Docking Workflow Checklist
- ✓ Validate structure quality before docking (pLDDT, pTM)
- ✓ Properly prepare protein (hydrogens, charges)
- ✓ Define binding site carefully (pLDDT > 80)
- ✓ Filter compound library for drug-likeness
- ✓ Analyze binding modes, not just scores
- ✓ Validate top hits experimentally (SPR/ITC)
- ✓ Consider protein flexibility and solvation