Protein-Ligand Docking with AlphaFold2 Structures

Combine AlphaFold2 structure prediction with molecular docking for drug discovery. Learn the complete workflow from structure preparation to virtual screening and hit identification.

#Why Dock to AlphaFold2 Structures?

For most drug targets, experimental structures don't exist. AlphaFold2 enables structure-based drug design for:

Novel therapeutic targets without crystal structures
Isoform-specific inhibitor design
Proteins from non-model organisms
Orphan GPCRs and understudied targets

Success Stories

Several groups have successfully identified hit compounds using AlphaFold2 structures for docking, with experimental validation confirming activity.

#Structure Preparation

Step 1: Quality Assessment

Before docking, ensure your structure is suitable:

pLDDT > 80: For binding site residues
pTM > 0.7: Overall structure confidence
Low PAE: Binding site region should be well-defined
No clashes: Check for steric conflicts

Red Flags

Don't use for docking if:

Binding site has pLDDT < 70
Known to exist in multiple conformations
Significant disagreement between 5 models

Step 2: Protein Preparation

python

# Using Biopython and RDKit
from Bio.PDB import PDBParser, PDBIO, Select
from rdkit import Chem
from rdkit.Chem import AllChem

# Load AlphaFold2 structure
parser = PDBParser()
structure = parser.get_structure('protein', 'alphafold_model.pdb')

# 1. Remove heteroatoms (keep only protein)
class ProteinSelect(Select):
    def accept_residue(self, residue):
        return residue.id[0] == ' '

io = PDBIO()
io.set_structure(structure)
io.save('protein_only.pdb', ProteinSelect())

# 2. Add hydrogens (for docking)
# Use reduce, pdb2pqr, or OpenBabel
import subprocess
subprocess.run(['reduce', 'protein_only.pdb', '>',
                'protein_H.pdb'], shell=True)

# 3. Assign charges (using pdb2pqr with PARSE force field)
subprocess.run(['pdb2pqr', '--ff=PARSE',
                'protein_H.pdb', 'protein_prepared.pqr'])

#Binding Site Identification

Option 1: Known Binding Site

If you know the binding site from literature or homology:

Identify conserved residues from sequence alignment
Use structure superposition with known complexes
Check binding site confidence (pLDDT scores)

Option 2: Binding Site Prediction

For unknown binding sites, use prediction tools:

fpocket: Geometry-based pocket detection
P2Rank: Machine learning-based prediction
SiteMap: Druggability assessment
DoGSiteScorer: Pocket detection and scoring

python

# Using fpocket for binding site detection
import subprocess

# Run fpocket
subprocess.run(['fpocket', '-f', 'protein_prepared.pdb'])

# Parse output to get top pockets
# Output: pockets ranked by druggability score

# Example output:
# Pocket 1: Score 25.3, Volume 450 Ų (likely binding site)
# Pocket 2: Score 12.1, Volume 180 Ų
# ...

#Docking Methods

AutoDock Vina

Fast and accurate, ideal for virtual screening:

bash

# 1. Prepare protein (PDBQT format)
prepare_receptor4.py -r protein_prepared.pdb -o protein.pdbqt

# 2. Define search space (binding site)
# Create config file: config.txt
center_x = 25.0
center_y = 30.0
center_z = 15.0
size_x = 20.0
size_y = 20.0
size_z = 20.0
exhaustiveness = 32

# 3. Prepare ligand
obabel ligand.sdf -O ligand.pdbqt

# 4. Run docking
vina --receptor protein.pdbqt \
     --ligand ligand.pdbqt \
     --config config.txt \
     --out ligand_docked.pdbqt

Glide (Schrödinger)

High accuracy for lead optimization:

HTVS: High-throughput virtual screening (fast)
SP: Standard precision (balanced)
XP: Extra precision (accurate, slow)

GOLD

Flexible docking with genetic algorithm:

Better for flexible binding sites
Handles metal coordination
Accurate for challenging targets

#Virtual Screening Workflow

Step 1: Compound Library Preparation

python

from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

def filter_druglike(sdf_file, output_file):
    """Filter for drug-like compounds (Lipinski's Rule of 5)"""
    supplier = Chem.SDMolSupplier(sdf_file)
    writer = Chem.SDWriter(output_file)

    for mol in supplier:
        if mol is None:
            continue

        # Lipinski's Rule of 5
        mw = Descriptors.MolWt(mol)
        logp = Descriptors.MolLogP(mol)
        hbd = Descriptors.NumHDonors(mol)
        hba = Descriptors.NumHAcceptors(mol)

        if (mw <= 500 and logp <= 5 and
            hbd <= 5 and hba <= 10):
            writer.write(mol)

    writer.close()

# Example: Filter ZINC database
filter_druglike('zinc_library.sdf', 'zinc_druglike.sdf')

Step 2: High-Throughput Docking

bash

# Parallel docking with Vina
# Split library into chunks for parallelization

# For each chunk:
for ligand in ligand_library/*.pdbqt; do
    vina --receptor protein.pdbqt \
         --ligand $ligand \
         --config config.txt \
         --out docked_$(basename $ligand) &
done

wait  # Wait for all jobs to complete

# Collect results
python collect_docking_scores.py > screening_results.csv

Step 3: Hit Selection

Prioritize compounds based on:

Binding affinity: Docking score (lower = better)
Binding mode: Key interactions with active site
Ligand efficiency: Affinity per heavy atom
Druglikeness: ADME properties

python

import pandas as pd

# Load docking results
df = pd.read_csv('screening_results.csv')

# Calculate ligand efficiency
df['LE'] = -df['binding_affinity'] / df['num_heavy_atoms']

# Filter top hits
top_hits = df[(df['binding_affinity'] < -8.0) &  # Strong binding
              (df['LE'] > 0.3) &                  # Good efficiency
              (df['num_interactions'] >= 3)]       # Multiple contacts

# Sort by binding affinity
top_hits = top_hits.sort_values('binding_affinity')
print(f"Selected {len(top_hits)} hits for experimental validation")

#Result Analysis

Binding Mode Analysis

Analyze predicted protein-ligand interactions:

Hydrogen bonds: Donor-acceptor distances < 3.5Å
Hydrophobic contacts: Carbon-carbon distances < 4.5Å
π-π stacking: Aromatic ring interactions
Salt bridges: Charged residue interactions

python

from rdkit import Chem
from rdkit.Chem import AllChem

def analyze_interactions(protein_pdb, ligand_mol2):
    """Identify protein-ligand interactions"""

    # Load structures
    protein = Chem.MolFromPDBFile(protein_pdb)
    ligand = Chem.MolFromMol2File(ligand_mol2)

    # Find interactions
    interactions = {
        'h_bonds': [],
        'hydrophobic': [],
        'pi_stacking': []
    }

    # Analyze distances between protein and ligand atoms
    # Classify by interaction type

    return interactions

# Visualize in PyMOL
interactions = analyze_interactions('complex.pdb', 'ligand.mol2')
print(f"H-bonds: {interactions['h_bonds']}")
print(f"Hydrophobic: {interactions['hydrophobic']}")

Visualization

python

# PyMOL visualization script
pymol -c << EOF
load protein.pdb
load ligand_docked.pdb

# Style protein
hide everything, protein
show cartoon, protein
color grey80, protein

# Style ligand
show sticks, ligand
color yellow, ligand
util.cbay ligand

# Show binding site residues
select binding_site, (protein within 5 of ligand)
show sticks, binding_site
color green, binding_site

# Show interactions
distance h_bonds, (ligand), (binding_site), mode=2
hide labels, h_bonds

# Save image
ray 1200, 1200
png complex.png
EOF

#Experimental Validation

Biophysical Assays

SPR/BLI: Direct binding measurement (Kd)
ITC: Binding thermodynamics
DSF: Thermal shift assay (ligand stabilization)
NMR: Binding site mapping

Functional Assays

Enzyme inhibition assays (IC50)
Cell-based assays (EC50, cell viability)
Reporter gene assays
Phenotypic screening

Hit Rate Expectations

Virtual screening hit rate: 1-5% typically
Validated binders (SPR): 10-30% of hits
Active in functional assay: 5-15% of binders

#Advanced Techniques

Induced-Fit Docking

Account for protein flexibility upon ligand binding:

Allow side-chain flexibility in binding site
Refine poses with local minimization
Better for flexible binding sites

MD Refinement

Refine docking poses with molecular dynamics:

bash

# Short MD simulation (10-50 ns) to refine pose
# 1. Prepare complex
# 2. Solvate and add ions
# 3. Energy minimize
# 4. Equilibrate (NPT)
# 5. Production MD

# Calculate binding free energy with MM-PBSA
gmx_MMPBSA -i mmpbsa.in -cs complex.tpr -ci index.ndx

Fragment-Based Design

Build up ligands from fragment hits:

Dock fragment library (MW < 300)
Identify hot spots for each subpocket
Link fragments to design larger inhibitors
Optimize with structure-based design

#Case Studies

Case 1: Kinase Inhibitor Discovery

bash

Target: Novel kinase (no experimental structure)
AlphaFold2: pTM 0.88, ATP-binding site pLDDT > 90

Virtual screening:
- Library: 50,000 kinase-focused compounds
- Docking: AutoDock Vina
- Top 100 hits selected (score < -9.0 kcal/mol)

Experimental validation:
- 47/100 showed binding by SPR (47% hit rate!)
- 12/47 active in enzyme assay (IC50 < 10 μM)
- Best hit: IC50 = 850 nM

Lead optimization ongoing

Case 2: GPCR Ligand Identification

bash

Target: Orphan GPCR
AlphaFold2-Multimer: Predicted with Gα protein for active state

Approach:
- Homology with known GPCR ligands
- Docked library of GPCR-focused compounds
- Selected 50 compounds for testing

Results:
- 3 compounds showed β-arrestin recruitment (EC50 10-50 μM)
- Confirmed first known agonists for this receptor
- Used for deorphanization studies

#Tools and Software

AutoDock Vina: Fast, free docking (recommended)
Schrödinger Suite: Commercial, high accuracy
GOLD: Flexible docking specialist
rDock: Fast, free, good for VS
PyMOL: Visualization and analysis

Need Drug Discovery Support?

Partner with us for structure-based drug design projects

We're always excited to hear about new ideas and challenges!

Let's Chat

#Best Practices Summary

Docking Workflow Checklist

✓ Validate structure quality before docking (pLDDT, pTM)
✓ Properly prepare protein (hydrogens, charges)
✓ Define binding site carefully (pLDDT > 80)
✓ Filter compound library for drug-likeness
✓ Analyze binding modes, not just scores
✓ Validate top hits experimentally (SPR/ITC)
✓ Consider protein flexibility and solvation

Protein-Ligand Docking with AlphaFold2 Structures

On This Page

#Why Dock to AlphaFold2 Structures?

Success Stories

#Structure Preparation

Step 1: Quality Assessment

Red Flags

Step 2: Protein Preparation

#Binding Site Identification

Option 1: Known Binding Site

Option 2: Binding Site Prediction

#Docking Methods

AutoDock Vina

Glide (Schrödinger)

GOLD

#Virtual Screening Workflow

Step 1: Compound Library Preparation

Step 2: High-Throughput Docking

Step 3: Hit Selection

#Result Analysis

Binding Mode Analysis

Visualization

#Experimental Validation

Biophysical Assays

Functional Assays

Hit Rate Expectations

#Advanced Techniques

Induced-Fit Docking

MD Refinement

Fragment-Based Design

#Case Studies

Case 1: Kinase Inhibitor Discovery

Case 2: GPCR Ligand Identification

#Tools and Software

Need Drug Discovery Support?

#Best Practices Summary

Docking Workflow Checklist

Related Articles

Membrane Protein Structure Prediction: GPCRs, Ion Channels, and Transporters

Antibody-Antigen Structure Prediction for Therapeutic Development

Drug Discovery with AlphaFold2: A Complete Workflow