In [ ]:
from IPython.core.display import HTML

def _set_css_style(css_file_path):
   """
   Read the custom CSS file and load it into Jupyter.
   Pass the file path to the CSS file.
   """

   styles = open(css_file_path, "r").read()
   s = '<style>%s</style>' % styles     
   return HTML(s)

_set_css_style('rise.css')

Molecular dynamics analysis¶

print view
notebook

Molecular dynamics¶

The simulation of the physical motions of atoms using classical physics (Newton's equations of motion).

  • Forces determined by a force field
  • Biological force fields are primarily calibrated for proteins
  • Simulation done in a small box (usually only a few proteins)
  • Result is an approximation

Why simulate?¶

  • assess stability of model
  • observe dynamics of interactions
  • observe effects of mutations
  • compute properties of the ensemble
  • ...

A typically simulation ranges from 10ns to 100ns, so we are limited in the sorts of processes that can be observed.

MD packages¶

Amber http://ambermd.org

  • Fastest GPU implementation (in my experience)

Gromacs http://www.gromacs.org

  • Open-source (LGPL)

NAMD http://www.ks.uiuc.edu/Research/namd/

  • Highly optimized for cluster computing
  • Integrated with VMD

LAMMPS http://lammps.sandia.gov

  • Open-source (GPL)

Forcefields¶

The forcefield determines what forces are applied to each atom. For example, electrostatic interactions and torsion angle potentials. Examples of force field families:

  • CHARMM
  • Gromacs
  • OPLS
  • Amber

$$V(r^N)=\sum_\text{bonds} k_b (l-l_0)^2 + \sum_\text{angles} k_a (\theta - \theta_0)^2 $$ $$+ \sum_\text{torsions} \frac{1}{2} V_n [1+\cos(n \omega- \gamma)] +\sum_{j=1} ^{N-1} \sum_{i=j+1} ^N \biggl\{\epsilon_{i,j}\biggl[\left(\frac{r_{0ij}}{r_{ij}} \right)^{12} - 2\left(\frac{r_{0ij}}{r_{ij}} \right)^{6} \biggr]+ \frac{q_iq_j}{4\pi \epsilon_0 r_{ij}}\biggr\} $$

Timestep¶

Every timestep of the simulation (1 or 2 femtoseconds (10-15s)) all the forces exerted on every atom are calculated and the positions and velocities are updated appropriately according to Newton's laws of motion.

File formats¶

Different packages use different file formats and have different approaches to setting up and running a simulation. Typically, to start a simulation you need

  • a configuration file that specifies how the simulation should be run
  • a topology of your system
  • initial coordinates of your system (may include velocities)

The output of the simulation is a trajectory file.

In [1]:
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.dcd
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd
--2025-10-08 12:17:12--  http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop
Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139
Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17559824 (17M)
Saving to: ‘shmt2.prmtop’

shmt2.prmtop        100%[===================>]  16.75M  93.9MB/s    in 0.2s    

2025-10-08 12:17:12 (93.9 MB/s) - ‘shmt2.prmtop’ saved [17559824/17559824]

--2025-10-08 12:17:12--  http://mscbio2025.csb.pitt.edu/files/shmt2.dcd
Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139
Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102836596 (98M) [application/DCD]
Saving to: ‘shmt2.dcd’

shmt2.dcd           100%[===================>]  98.07M  94.5MB/s    in 1.0s    

2025-10-08 12:17:13 (94.5 MB/s) - ‘shmt2.dcd’ saved [102836596/102836596]

--2025-10-08 12:17:13--  http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd
Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139
Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3127961 (3.0M)
Saving to: ‘shmt2.inpcrd’

shmt2.inpcrd        100%[===================>]   2.98M  --.-KB/s    in 0.03s   

2025-10-08 12:17:13 (95.1 MB/s) - ‘shmt2.inpcrd’ saved [3127961/3127961]

Configuration file¶

These are usually plain text and extremely obtuse without reading the documentation.

Amber:

    md of hsp27 implicit
     &cntrl
      imin = 0, ntb = 0,
      igb = 1, ntpr = 100, ntwx = 5000, ntwr = 5000, ntpr = 5000,
      ntt = 3, gamma_ln = 1.0,
      tempi = 300.0, temp0 = 300.0
      nstlim = 50000000, dt = 0.002,
      ntc = 2, 
      cut = 999
     /
     
 

Topology¶

The topology stores connectivity and atom type information about a model, but no coordinates.

  • .prmtop - Amber
  • .psf - NAMD
  • .top - Gromacs

Typically you will create a topology from a PDB using the tools provided with the simulation package. For example, you might use tleap in Amber to solvate the protein and create initial coordinates and a topology that includes information about your chosen forcefield.

In [ ]:
!head -15 ../files/shmt2.prmtop

Coordinates¶

A coordinate file provides the (x,y,z) coordinates of each atom in your system.

For most coordinate formats, the file is useless without the topology.

Some formats also include velocities and/or energies and can be used to restart the simulation.

  • .pdb - Gromacs, NAMD
  • .inpcrd - Amber
  • .rst - Amber Restart
In [ ]:
!head ../files/shmt2.inpcrd

Trajectory¶

The trajectory is the result of the simulation. These files store the coordinates of every atom of the simulation for every output timestep of the simulation.

These files can be huge, so the minimum amount of information is usually stored (i.e. just coordinates) and the file is useless without the topology.

  • .trj, .trr - Gromacs full trajectory
  • .xtc - Gromacs compressed trajectory
  • .dcd - NAMD
  • .mdcrd, .nc - Amber
In [ ]:
!head ../files/shmt2.dcd

mdanalysis¶

https://code.google.com/p/mdanalysis/

"MDAnalysis is an object-oriented python toolkit to analyze molecular dynamics trajectories generated by CHARMM, Gromacs, NAMD, LAMMPS, or Amber."

In [ ]:
import MDAnalysis

universe = MDAnalysis.Universe('shmt2.prmtop', 'shmt2.dcd')

MDAnalysis starts with a topology and a trajectory.

Atom groups¶

In [ ]:
universe.atoms

You can select a specific group of atoms (very similar to ProDy) using atom selections.

In [ ]:
 universe.select_atoms("protein")

Selections can work directly on AtomGroups

In [ ]:
 universe.select_atoms("resname PRO")
In [ ]:
 universe.select_atoms("byres around 5 resid 370")
In [ ]:
prot = universe.select_atoms("protein")
prot.select_atoms("byres around 5 resid 370") #select whole residues within 5 of residue 100

Like ProDy, can iterate over atoms

In [ ]:
for a in prot.atoms[:3]:
    print(a)

Note that atoms retain information about residues (but generally not chains)

In [ ]:
print(a.resid,a.resname,a.resnum)

Residues¶

AtomGroups can also be traversed and viewed at a residue level

In [ ]:
prot.residues

Trajectories¶

In [ ]:
universe.trajectory

The coordinates of atoms are determined by the current position in the trajectory (trajectory.frame)

The coordinates of selections refer to whatever the current trajectory frame is

The current frame is set by iterating over the trajectory or indexing into it.

In [ ]:
for ts in universe.trajectory[:5]:
    print(ts.frame, universe.trajectory.frame, ts.time, prot.center_of_mass())

Analysis¶

A number of packages have been contributed to MDAnalysis to perform common tasks.

In [ ]:
import MDAnalysis.analysis

PACKAGE CONTENTS

  • align - aligning structures
  • contacts - native contact analysis
  • density - compute water densities
  • distances - for computing distances
  • gnm
  • hbonds - hydrogen bond analysis
  • helanal - analysis of helices
  • hole - for analyzing pores
  • leaflet
  • nuclinfo - analysis of nucleic acids
  • psa - path simularity
  • rms
  • waterdynamics - water analysis
  • x3dna - a different nucleic analysis

Alignment¶

In [ ]:
from MDAnalysis.analysis.align import *

Can align a single structure with alignto

Use AlignTraj to align and write out a full trajectory (trajectories are not kept in memory)

In [ ]:
universe.trajectory[0]
#if we align to ourselves, will fit to current frame
alignment = AlignTraj(universe, universe, select='protein',filename='rmsfit.dcd')
alignment.run()

MDAnalysis.rms¶

In [ ]:
from MDAnalysis.analysis.rms import *  #this pulls in an rmsd function

Root mean squared deviation (RMSD) $$\sqrt{\frac{\sum_i^n(x_i^a-x_i^b)^2+(y_i^a-y_i^b)^2+(z_i^a-z_i^b)^2}{n}}$$

In [ ]:
universe.trajectory[0] #sets the current frame to the start
refcoord = prot.positions # once stored, _coordinates_ do NOT change with trajectory
refcoord
In [ ]:
universe.trajectory[-1] #last frame
print(rmsd(refcoord,prot.positions))

Plot both the all-protein and calpha only (name CA) rmsd with the first frame.

In [ ]:
universe.trajectory[0]
protref = prot.positions
caref = prot.select_atoms('name CA').positions
In [ ]:
protrmsd = []
carmsd = []
for ts in universe.trajectory:
    protrmsd.append(rmsd(protref,prot.positions))
    carmsd.append(rmsd(caref,prot.select_atoms('name CA').positions))
In [ ]:
import matplotlib.pylab as plt
%matplotlib inline
In [ ]:
n = universe.trajectory.n_frames
plt.plot(range(n),protrmsd,range(n),carmsd)
plt.xlabel("Frame #")
plt.ylabel('RMSD')
plt.legend(['Protein','CA'],loc='lower right');

Activity¶

  • Compute RMSDs
    • between frame 0 and frame 40
    • between frame 40 and frame 80
    • between frame 0 and frame 80
  • Compute all RMSDs (put in a list)
    • to starting frame
    • to ending frame
  • Plot start/end RMSDs
  • Align protein to only resids 760-924, redo above
In [ ]:
import MDAnalysis
from MDAnalysis.analysis.rms import *  #this pulls in an rmsd function

universe = MDAnalysis.Universe('../files/shmt2.prmtop', '../files/shmt2.dcd')
prot = universe.select_atoms('protein')

startref = prot.positions
universe.trajectory[-1]
endref = prot.positions  

startrmsd = []
endrmsd = []
for ts in universe.trajectory:
    startrmsd.append(rmsd(startref,prot.positions))
    endrmsd.append(rmsd(endref,prot.positions))