Structural superposition and RMSD calculation are fundamental techniques for comparing predicted structures with experimental data and validating AlphaFold2 predictions.
#Why Structural Comparison Matters
Comparing structures allows you to:
- Validate predictions against experimental structures
- Assess model consistency across predictions
- Identify conformational differences
- Quantify structural similarity for homologs
#Understanding RMSD
RMSD (Root Mean Square Deviation) measures the average distance between corresponding atoms in two superposed structures.
The RMSD Formula
import numpy as np
def calculate_rmsd(coords1, coords2):
"""Calculate RMSD between two sets of coordinates"""
diff = coords1 - coords2
return np.sqrt(np.mean(np.sum(diff**2, axis=1)))Interpreting RMSD Values
RMSD Ranges
- < 1.0 Å: Nearly identical structures
- 1-2 Å: Very similar, minor differences
- 2-4 Å: Moderate similarity, same fold
- > 4 Å: Significant differences or different folds
Context Matters
#Superposition Methods
Kabsch Algorithm (Least Squares)
The most common method: finds optimal rotation and translation to minimize RMSD.
from Bio.PDB import Superimposer, PDBParser
# Load structures
parser = PDBParser()
ref_structure = parser.get_structure('reference', 'experimental.pdb')
model_structure = parser.get_structure('model', 'alphafold.pdb')
# Get Cα atoms
ref_atoms = [atom for atom in ref_structure.get_atoms() if atom.name == 'CA']
model_atoms = [atom for atom in model_structure.get_atoms() if atom.name == 'CA']
# Superpose
super_imposer = Superimposer()
super_imposer.set_atoms(ref_atoms, model_atoms)
super_imposer.apply(model_structure.get_atoms())
print(f"RMSD: {super_imposer.rms:.2f} Å")Sequence-Independent Methods
For comparing structures without sequence alignment:
- TM-align: Finds optimal structural alignment
- CE (Combinatorial Extension): Fragment-based alignment
- DALI: Distance matrix-based comparison
#Practical Comparison Workflow
Step 1: Structure Preparation
- Remove heteroatoms (ligands, waters) if not relevant
- Ensure consistent chain naming
- Align sequences if needed
- Select atoms for comparison (usually Cα)
Step 2: Sequence Alignment
from Bio import pairwise2
from Bio.Seq import Seq
# Align sequences
seq1 = "MKTAYIAKQRQISFVKSHF"
seq2 = "MKTAYIAKQRQISFVK---"
alignments = pairwise2.align.globalxx(seq1, seq2)
best_alignment = alignments[0]
print(f"Alignment score: {best_alignment.score}")
print(best_alignment.seqA)
print(best_alignment.seqB)Step 3: Structural Superposition
Options for superposition:
- Global: Superpose entire structure
- Core: Superpose only conserved core regions
- Domain: Superpose individual domains separately
#Beyond Basic RMSD
TM-score
TM-score is length-independent and more suitable for comparing proteins of different sizes:
TM-score Advantages
- Range: 0 to 1 (easier interpretation)
- Length-independent (fair for different sized proteins)
- > 0.5 indicates same fold
- > 0.6 means strong similarity
GDT-TS (Global Distance Test)
Used in CASP competitions, measures % of residues within distance thresholds:
def calculate_gdt_ts(coords1, coords2, thresholds=[1, 2, 4, 8]):
"""Calculate GDT-TS score"""
percentages = []
for threshold in thresholds:
distances = np.linalg.norm(coords1 - coords2, axis=1)
within_threshold = np.sum(distances <= threshold)
percentages.append(within_threshold / len(coords1))
return np.mean(percentages) * 100
# GDT-TS > 50 indicates good modellDDT (Local Distance Difference Test)
Evaluates local structural quality without superposition:
- Considers local environment of each residue
- No global superposition needed
- Related to AlphaFold's pLDDT metric
#Validation Workflow for AlphaFold2
Comparing with Experimental Structure
# Using PyMOL
pymol experimental.pdb alphafold.pdb
# In PyMOL console:
align alphafold, experimental
rms_cur alphafold, experimental
# Visualize differences
spectrum b, rainbow, alphafold # Color by pLDDT
color green, experimentalAssessing Model Consistency
Compare all 5 AlphaFold2 models:
from itertools import combinations
models = ['model_1.pdb', 'model_2.pdb', 'model_3.pdb',
'model_4.pdb', 'model_5.pdb']
rmsd_matrix = []
for model1, model2 in combinations(models, 2):
rmsd = calculate_rmsd_between_pdbs(model1, model2)
rmsd_matrix.append(rmsd)
mean_rmsd = np.mean(rmsd_matrix)
print(f"Average pairwise RMSD: {mean_rmsd:.2f} Å")
if mean_rmsd < 1.0:
print("Highly consistent models")
elif mean_rmsd < 3.0:
print("Moderate consistency")
else:
print("High model variability - check PAE matrix")#Domain-Level Analysis
Handling Flexible Regions
Best Practice
- Identify rigid core from pLDDT scores (> 80)
- Calculate separate RMSD for core vs. flexible regions
- Report both global and core RMSD
Multi-Domain Proteins
For proteins with multiple domains:
- Superpose each domain independently
- Calculate domain-wise RMSD
- Assess inter-domain orientation separately
#Visualization Techniques
PyMOL Visualization
# Color by distance to reference
pymol> align predicted, experimental
pymol> show cartoon, predicted
pymol> spectrum b, rainbow, predicted
pymol> distance diff, predicted, experimental, mode=2
pymol> hide labels, diffUCSF ChimeraX
# Load and align structures
open experimental.pdb
open alphafold.pdb
matchmaker #2 to #1
# Color by RMSD
color byattribute bfactor #2 palette rainbow
# Show interface
surface #1
transparency 70#Case Studies
Case 1: High-Accuracy Prediction
Protein: Well-studied kinase (350 residues)
Global RMSD (Cα): 0.87 Å
Core RMSD: 0.52 Å
TM-score: 0.94
Assessment: Excellent agreement with experimental structure.
Loop regions show minor differences (expected flexibility).Case 2: Conformational Differences
Protein: Allosteric enzyme
Global RMSD: 3.2 Å
Domain A RMSD: 0.9 Å
Domain B RMSD: 1.1 Å
Assessment: Individual domains well-predicted, but relative
orientation differs. Likely different conformational state.
AlphaFold predicted apo state, experimental is ligand-bound.#Tools and Software
- PyMOL: Interactive visualization and alignment
- ChimeraX: Advanced analysis and scripting
- TM-align: Structure alignment and TM-score
- Bio.PDB (Python): Programmatic structure analysis
- ProDy: Protein dynamics and normal mode analysis
Compare Your Structures
Use our structure comparison tools
#Best Practices Summary
Structural Comparison Checklist
- ✓ Use Cα atoms for standard RMSD comparisons
- ✓ Report both global and core RMSD
- ✓ Consider TM-score for length-independent comparison
- ✓ Exclude flexible regions when appropriate
- ✓ Compare all 5 models for consistency
- ✓ Visualize differences, don't just rely on numbers
- ✓ Consider biological context (conformational states)