Protein-Ligand Docking with AlphaFold2 Structures
Back to Academy
case-study20 min read

Protein-Ligand Docking with AlphaFold2 Structures

Integrate structure prediction with virtual screening: prepare structures, define binding sites, and discover hit compounds.

P

Protogen Team

Drug Discovery

February 10, 2025

Combine AlphaFold2 structure prediction with molecular docking for drug discovery. Learn the complete workflow from structure preparation to virtual screening and hit identification.

#Why Dock to AlphaFold2 Structures?

For most drug targets, experimental structures don't exist. AlphaFold2 enables structure-based drug design for:

  • Novel therapeutic targets without crystal structures
  • Isoform-specific inhibitor design
  • Proteins from non-model organisms
  • Orphan GPCRs and understudied targets

Success Stories

Several groups have successfully identified hit compounds using AlphaFold2 structures for docking, with experimental validation confirming activity.

#Structure Preparation

Step 1: Quality Assessment

Before docking, ensure your structure is suitable:

  • pLDDT > 80: For binding site residues
  • pTM > 0.7: Overall structure confidence
  • Low PAE: Binding site region should be well-defined
  • No clashes: Check for steric conflicts

Red Flags

Don't use for docking if:
  • Binding site has pLDDT < 70
  • Known to exist in multiple conformations
  • Significant disagreement between 5 models

Step 2: Protein Preparation

python
# Using Biopython and RDKit
from Bio.PDB import PDBParser, PDBIO, Select
from rdkit import Chem
from rdkit.Chem import AllChem

# Load AlphaFold2 structure
parser = PDBParser()
structure = parser.get_structure('protein', 'alphafold_model.pdb')

# 1. Remove heteroatoms (keep only protein)
class ProteinSelect(Select):
    def accept_residue(self, residue):
        return residue.id[0] == ' '

io = PDBIO()
io.set_structure(structure)
io.save('protein_only.pdb', ProteinSelect())

# 2. Add hydrogens (for docking)
# Use reduce, pdb2pqr, or OpenBabel
import subprocess
subprocess.run(['reduce', 'protein_only.pdb', '>',
                'protein_H.pdb'], shell=True)

# 3. Assign charges (using pdb2pqr with PARSE force field)
subprocess.run(['pdb2pqr', '--ff=PARSE',
                'protein_H.pdb', 'protein_prepared.pqr'])

#Binding Site Identification

Option 1: Known Binding Site

If you know the binding site from literature or homology:

  • Identify conserved residues from sequence alignment
  • Use structure superposition with known complexes
  • Check binding site confidence (pLDDT scores)

Option 2: Binding Site Prediction

For unknown binding sites, use prediction tools:

  • fpocket: Geometry-based pocket detection
  • P2Rank: Machine learning-based prediction
  • SiteMap: Druggability assessment
  • DoGSiteScorer: Pocket detection and scoring
python
# Using fpocket for binding site detection
import subprocess

# Run fpocket
subprocess.run(['fpocket', '-f', 'protein_prepared.pdb'])

# Parse output to get top pockets
# Output: pockets ranked by druggability score

# Example output:
# Pocket 1: Score 25.3, Volume 450 Ų (likely binding site)
# Pocket 2: Score 12.1, Volume 180 Ų
# ...

#Docking Methods

AutoDock Vina

Fast and accurate, ideal for virtual screening:

bash
# 1. Prepare protein (PDBQT format)
prepare_receptor4.py -r protein_prepared.pdb -o protein.pdbqt

# 2. Define search space (binding site)
# Create config file: config.txt
center_x = 25.0
center_y = 30.0
center_z = 15.0
size_x = 20.0
size_y = 20.0
size_z = 20.0
exhaustiveness = 32

# 3. Prepare ligand
obabel ligand.sdf -O ligand.pdbqt

# 4. Run docking
vina --receptor protein.pdbqt \
     --ligand ligand.pdbqt \
     --config config.txt \
     --out ligand_docked.pdbqt

Glide (Schrödinger)

High accuracy for lead optimization:

  • HTVS: High-throughput virtual screening (fast)
  • SP: Standard precision (balanced)
  • XP: Extra precision (accurate, slow)

GOLD

Flexible docking with genetic algorithm:

  • Better for flexible binding sites
  • Handles metal coordination
  • Accurate for challenging targets

#Virtual Screening Workflow

Step 1: Compound Library Preparation

python
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

def filter_druglike(sdf_file, output_file):
    """Filter for drug-like compounds (Lipinski's Rule of 5)"""
    supplier = Chem.SDMolSupplier(sdf_file)
    writer = Chem.SDWriter(output_file)

    for mol in supplier:
        if mol is None:
            continue

        # Lipinski's Rule of 5
        mw = Descriptors.MolWt(mol)
        logp = Descriptors.MolLogP(mol)
        hbd = Descriptors.NumHDonors(mol)
        hba = Descriptors.NumHAcceptors(mol)

        if (mw <= 500 and logp <= 5 and
            hbd <= 5 and hba <= 10):
            writer.write(mol)

    writer.close()

# Example: Filter ZINC database
filter_druglike('zinc_library.sdf', 'zinc_druglike.sdf')

Step 2: High-Throughput Docking

bash
# Parallel docking with Vina
# Split library into chunks for parallelization

# For each chunk:
for ligand in ligand_library/*.pdbqt; do
    vina --receptor protein.pdbqt \
         --ligand $ligand \
         --config config.txt \
         --out docked_$(basename $ligand) &
done

wait  # Wait for all jobs to complete

# Collect results
python collect_docking_scores.py > screening_results.csv

Step 3: Hit Selection

Prioritize compounds based on:

  • Binding affinity: Docking score (lower = better)
  • Binding mode: Key interactions with active site
  • Ligand efficiency: Affinity per heavy atom
  • Druglikeness: ADME properties
python
import pandas as pd

# Load docking results
df = pd.read_csv('screening_results.csv')

# Calculate ligand efficiency
df['LE'] = -df['binding_affinity'] / df['num_heavy_atoms']

# Filter top hits
top_hits = df[(df['binding_affinity'] < -8.0) &  # Strong binding
              (df['LE'] > 0.3) &                  # Good efficiency
              (df['num_interactions'] >= 3)]       # Multiple contacts

# Sort by binding affinity
top_hits = top_hits.sort_values('binding_affinity')
print(f"Selected {len(top_hits)} hits for experimental validation")

#Result Analysis

Binding Mode Analysis

Analyze predicted protein-ligand interactions:

  • Hydrogen bonds: Donor-acceptor distances < 3.5Å
  • Hydrophobic contacts: Carbon-carbon distances < 4.5Å
  • π-π stacking: Aromatic ring interactions
  • Salt bridges: Charged residue interactions
python
from rdkit import Chem
from rdkit.Chem import AllChem

def analyze_interactions(protein_pdb, ligand_mol2):
    """Identify protein-ligand interactions"""

    # Load structures
    protein = Chem.MolFromPDBFile(protein_pdb)
    ligand = Chem.MolFromMol2File(ligand_mol2)

    # Find interactions
    interactions = {
        'h_bonds': [],
        'hydrophobic': [],
        'pi_stacking': []
    }

    # Analyze distances between protein and ligand atoms
    # Classify by interaction type

    return interactions

# Visualize in PyMOL
interactions = analyze_interactions('complex.pdb', 'ligand.mol2')
print(f"H-bonds: {interactions['h_bonds']}")
print(f"Hydrophobic: {interactions['hydrophobic']}")

Visualization

python
# PyMOL visualization script
pymol -c << EOF
load protein.pdb
load ligand_docked.pdb

# Style protein
hide everything, protein
show cartoon, protein
color grey80, protein

# Style ligand
show sticks, ligand
color yellow, ligand
util.cbay ligand

# Show binding site residues
select binding_site, (protein within 5 of ligand)
show sticks, binding_site
color green, binding_site

# Show interactions
distance h_bonds, (ligand), (binding_site), mode=2
hide labels, h_bonds

# Save image
ray 1200, 1200
png complex.png
EOF

#Experimental Validation

Biophysical Assays

  • SPR/BLI: Direct binding measurement (Kd)
  • ITC: Binding thermodynamics
  • DSF: Thermal shift assay (ligand stabilization)
  • NMR: Binding site mapping

Functional Assays

  • Enzyme inhibition assays (IC50)
  • Cell-based assays (EC50, cell viability)
  • Reporter gene assays
  • Phenotypic screening

Hit Rate Expectations

  • Virtual screening hit rate: 1-5% typically
  • Validated binders (SPR): 10-30% of hits
  • Active in functional assay: 5-15% of binders

#Advanced Techniques

Induced-Fit Docking

Account for protein flexibility upon ligand binding:

  • Allow side-chain flexibility in binding site
  • Refine poses with local minimization
  • Better for flexible binding sites

MD Refinement

Refine docking poses with molecular dynamics:

bash
# Short MD simulation (10-50 ns) to refine pose
# 1. Prepare complex
# 2. Solvate and add ions
# 3. Energy minimize
# 4. Equilibrate (NPT)
# 5. Production MD

# Calculate binding free energy with MM-PBSA
gmx_MMPBSA -i mmpbsa.in -cs complex.tpr -ci index.ndx

Fragment-Based Design

Build up ligands from fragment hits:

  • Dock fragment library (MW < 300)
  • Identify hot spots for each subpocket
  • Link fragments to design larger inhibitors
  • Optimize with structure-based design

#Case Studies

Case 1: Kinase Inhibitor Discovery

bash
Target: Novel kinase (no experimental structure)
AlphaFold2: pTM 0.88, ATP-binding site pLDDT > 90

Virtual screening:
- Library: 50,000 kinase-focused compounds
- Docking: AutoDock Vina
- Top 100 hits selected (score < -9.0 kcal/mol)

Experimental validation:
- 47/100 showed binding by SPR (47% hit rate!)
- 12/47 active in enzyme assay (IC50 < 10 μM)
- Best hit: IC50 = 850 nM

Lead optimization ongoing

Case 2: GPCR Ligand Identification

bash
Target: Orphan GPCR
AlphaFold2-Multimer: Predicted with Gα protein for active state

Approach:
- Homology with known GPCR ligands
- Docked library of GPCR-focused compounds
- Selected 50 compounds for testing

Results:
- 3 compounds showed β-arrestin recruitment (EC50 10-50 μM)
- Confirmed first known agonists for this receptor
- Used for deorphanization studies

#Tools and Software

  • AutoDock Vina: Fast, free docking (recommended)
  • Schrödinger Suite: Commercial, high accuracy
  • GOLD: Flexible docking specialist
  • rDock: Fast, free, good for VS
  • PyMOL: Visualization and analysis

Need Drug Discovery Support?

Partner with us for structure-based drug design projects

We're always excited to hear about new ideas and challenges!

Let's Chat

#Best Practices Summary

Docking Workflow Checklist

  • ✓ Validate structure quality before docking (pLDDT, pTM)
  • ✓ Properly prepare protein (hydrogens, charges)
  • ✓ Define binding site carefully (pLDDT > 80)
  • ✓ Filter compound library for drug-likeness
  • ✓ Analyze binding modes, not just scores
  • ✓ Validate top hits experimentally (SPR/ITC)
  • ✓ Consider protein flexibility and solvation