from IPython.core.display import HTML
def _set_css_style(css_file_path):
"""
Read the custom CSS file and load it into Jupyter.
Pass the file path to the CSS file.
"""
styles = open(css_file_path, "r").read()
s = '<style>%s</style>' % styles
return HTML(s)
_set_css_style('rise.css')
Molecular dynamics analysis¶
Molecular dynamics¶
The simulation of the physical motions of atoms using classical physics (Newton's equations of motion).
- Forces determined by a force field
- Biological force fields are primarily calibrated for proteins
- Simulation done in a small box (usually only a few proteins)
- Result is an approximation
Why simulate?¶
- assess stability of model
- observe dynamics of interactions
- observe effects of mutations
- compute properties of the ensemble
- ...
A typically simulation ranges from 10ns to 100ns, so we are limited in the sorts of processes that can be observed.
MD packages¶
Amber http://ambermd.org
- Fastest GPU implementation (in my experience)
Gromacs http://www.gromacs.org
- Open-source (LGPL)
NAMD http://www.ks.uiuc.edu/Research/namd/
- Highly optimized for cluster computing
- Integrated with VMD
LAMMPS http://lammps.sandia.gov
- Open-source (GPL)
Forcefields¶
The forcefield determines what forces are applied to each atom. For example, electrostatic interactions and torsion angle potentials. Examples of force field families:
- CHARMM
- Gromacs
- OPLS
- Amber
$$V(r^N)=\sum_\text{bonds} k_b (l-l_0)^2 + \sum_\text{angles} k_a (\theta - \theta_0)^2 $$ $$+ \sum_\text{torsions} \frac{1}{2} V_n [1+\cos(n \omega- \gamma)] +\sum_{j=1} ^{N-1} \sum_{i=j+1} ^N \biggl\{\epsilon_{i,j}\biggl[\left(\frac{r_{0ij}}{r_{ij}} \right)^{12} - 2\left(\frac{r_{0ij}}{r_{ij}} \right)^{6} \biggr]+ \frac{q_iq_j}{4\pi \epsilon_0 r_{ij}}\biggr\} $$
Timestep¶
Every timestep of the simulation (1 or 2 femtoseconds (10-15s)) all the forces exerted on every atom are calculated and the positions and velocities are updated appropriately according to Newton's laws of motion.
File formats¶
Different packages use different file formats and have different approaches to setting up and running a simulation. Typically, to start a simulation you need
- a configuration file that specifies how the simulation should be run
- a topology of your system
- initial coordinates of your system (may include velocities)
The output of the simulation is a trajectory file.
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.dcd
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd
--2025-10-08 12:17:12-- http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17559824 (17M) Saving to: ‘shmt2.prmtop’ shmt2.prmtop 100%[===================>] 16.75M 93.9MB/s in 0.2s 2025-10-08 12:17:12 (93.9 MB/s) - ‘shmt2.prmtop’ saved [17559824/17559824] --2025-10-08 12:17:12-- http://mscbio2025.csb.pitt.edu/files/shmt2.dcd Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 102836596 (98M) [application/DCD] Saving to: ‘shmt2.dcd’ shmt2.dcd 100%[===================>] 98.07M 94.5MB/s in 1.0s 2025-10-08 12:17:13 (94.5 MB/s) - ‘shmt2.dcd’ saved [102836596/102836596] --2025-10-08 12:17:13-- http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3127961 (3.0M) Saving to: ‘shmt2.inpcrd’ shmt2.inpcrd 100%[===================>] 2.98M --.-KB/s in 0.03s 2025-10-08 12:17:13 (95.1 MB/s) - ‘shmt2.inpcrd’ saved [3127961/3127961]
Configuration file¶
These are usually plain text and extremely obtuse without reading the documentation.
Amber:
md of hsp27 implicit
&cntrl
imin = 0, ntb = 0,
igb = 1, ntpr = 100, ntwx = 5000, ntwr = 5000, ntpr = 5000,
ntt = 3, gamma_ln = 1.0,
tempi = 300.0, temp0 = 300.0
nstlim = 50000000, dt = 0.002,
ntc = 2,
cut = 999
/
Topology¶
The topology stores connectivity and atom type information about a model, but no coordinates.
- .prmtop - Amber
- .psf - NAMD
- .top - Gromacs
Typically you will create a topology from a PDB using the tools provided with the simulation package. For example, you might use tleap in Amber to solvate the protein and create initial coordinates and a topology that includes information about your chosen forcefield.
!head -15 ../files/shmt2.prmtop
Coordinates¶
A coordinate file provides the (x,y,z) coordinates of each atom in your system.
For most coordinate formats, the file is useless without the topology.
Some formats also include velocities and/or energies and can be used to restart the simulation.
- .pdb - Gromacs, NAMD
- .inpcrd - Amber
- .rst - Amber Restart
!head ../files/shmt2.inpcrd
Trajectory¶
The trajectory is the result of the simulation. These files store the coordinates of every atom of the simulation for every output timestep of the simulation.
These files can be huge, so the minimum amount of information is usually stored (i.e. just coordinates) and the file is useless without the topology.
- .trj, .trr - Gromacs full trajectory
- .xtc - Gromacs compressed trajectory
- .dcd - NAMD
- .mdcrd, .nc - Amber
!head ../files/shmt2.dcd
mdanalysis¶
https://code.google.com/p/mdanalysis/
"MDAnalysis is an object-oriented python toolkit to analyze molecular dynamics trajectories generated by CHARMM, Gromacs, NAMD, LAMMPS, or Amber."
import MDAnalysis
universe = MDAnalysis.Universe('shmt2.prmtop', 'shmt2.dcd')
MDAnalysis starts with a topology and a trajectory.
Atom groups¶
universe.atoms
You can select a specific group of atoms (very similar to ProDy) using atom selections.
universe.select_atoms("protein")
Selections can work directly on AtomGroups
universe.select_atoms("resname PRO")
universe.select_atoms("byres around 5 resid 370")
prot = universe.select_atoms("protein")
prot.select_atoms("byres around 5 resid 370") #select whole residues within 5 of residue 100
Like ProDy, can iterate over atoms
for a in prot.atoms[:3]:
print(a)
Note that atoms retain information about residues (but generally not chains)
print(a.resid,a.resname,a.resnum)
Residues¶
AtomGroups can also be traversed and viewed at a residue level
prot.residues
Trajectories¶
universe.trajectory
The coordinates of atoms are determined by the current position in the trajectory (trajectory.frame)
The coordinates of selections refer to whatever the current trajectory frame is
The current frame is set by iterating over the trajectory or indexing into it.
for ts in universe.trajectory[:5]:
print(ts.frame, universe.trajectory.frame, ts.time, prot.center_of_mass())
Analysis¶
A number of packages have been contributed to MDAnalysis to perform common tasks.
import MDAnalysis.analysis
PACKAGE CONTENTS
- align - aligning structures
- contacts - native contact analysis
- density - compute water densities
- distances - for computing distances
- gnm
- hbonds - hydrogen bond analysis
- helanal - analysis of helices
- hole - for analyzing pores
- leaflet
- nuclinfo - analysis of nucleic acids
- psa - path simularity
- rms
- waterdynamics - water analysis
- x3dna - a different nucleic analysis
Alignment¶
from MDAnalysis.analysis.align import *
Can align a single structure with alignto
Use AlignTraj to align and write out a full trajectory (trajectories are not kept in memory)
universe.trajectory[0]
#if we align to ourselves, will fit to current frame
alignment = AlignTraj(universe, universe, select='protein',filename='rmsfit.dcd')
alignment.run()
MDAnalysis.rms¶
from MDAnalysis.analysis.rms import * #this pulls in an rmsd function
Root mean squared deviation (RMSD) $$\sqrt{\frac{\sum_i^n(x_i^a-x_i^b)^2+(y_i^a-y_i^b)^2+(z_i^a-z_i^b)^2}{n}}$$
universe.trajectory[0] #sets the current frame to the start
refcoord = prot.positions # once stored, _coordinates_ do NOT change with trajectory
refcoord
universe.trajectory[-1] #last frame
print(rmsd(refcoord,prot.positions))
Plot both the all-protein and calpha only (name CA) rmsd with the first frame.
universe.trajectory[0]
protref = prot.positions
caref = prot.select_atoms('name CA').positions
protrmsd = []
carmsd = []
for ts in universe.trajectory:
protrmsd.append(rmsd(protref,prot.positions))
carmsd.append(rmsd(caref,prot.select_atoms('name CA').positions))
import matplotlib.pylab as plt
%matplotlib inline
n = universe.trajectory.n_frames
plt.plot(range(n),protrmsd,range(n),carmsd)
plt.xlabel("Frame #")
plt.ylabel('RMSD')
plt.legend(['Protein','CA'],loc='lower right');
Activity¶
- Compute RMSDs
- between frame 0 and frame 40
- between frame 40 and frame 80
- between frame 0 and frame 80
- Compute all RMSDs (put in a list)
- to starting frame
- to ending frame
- Plot start/end RMSDs
- Align protein to only resids 760-924, redo above