Protein dynamics with ProDy¶
Original slides courtesy Ahmet Bakan
- Introduction to ProDy conventions
- Handling protein structure files
- Protein dynamics calculations
- ProDy design principles
ProDy stands for Protein Dynamics,
- an API that is very suitable for interactive usage
- comes with several command line applications
ProDy is designed for normal mode analysis, but also is
- a powerful tool for handling macromolecular structures
- useful for analysis of molecular dynamics trajectories
- useful for sequence conservation and coevolution analysis (Evol)
Import from ProDy¶
In interactive sessions, the following is safe (no name conflicts)
from prody import *
# checkUpdates() # no longer works!
When developing software, prefer importing specific functions/classes or prefixing
import prody as pd
from prody import parsePDB
ProDy API naming conventions¶
Function names¶
ProDy functions start with an action verb followed by one or more names or an abbreviation/extension, i.e. doSomething:
parseEXT(): parse a file in EXT format, e.g.parsePDB,parseDCDwriteEXT(): write a file in EXT format, e.g.writePDB,writeDCDfetchSTH(): download a file, e.g.fetchPDB,fetchMSAcalcSTH(): calculate something, e.g.calcRMSD,calcGyradius,calcANMshowSTH(): show a plotting of something, e.g.showCrossCorrelations,showProteinsaveSTH(): save a ProDy object instance to disk, e.g.saveAtomsloadSTH(): save a ProDy object instance to disk, e.g.loadAtoms
Class names¶
Class names start with upper case letters and are abbreviations or words in camel case style, e.g. AtomGroup, Selection, ANM, etc.
Class method names¶
Class method naming is similar to function naming:
Class.numSomething: return number of the thing, e.g.AtomGroup.numAtomsreturns number of atomsClass.getSomething: return an attribute, e.g.AtomGroup.getNamesreturns atom namesClass.setSomething: set/update an attribute, e.g.AtomGroup.setNamessets atom namesClass.buildMatrix: builds a matrix, e.g.PCA.buildCovariancecalculates covariance matrix
Usage example¶
ubi = parsePDB('1ubi')
ubi
TIP: to see a list of available functions, just type the action word (calc, show, get, etc.), and use TAB completion.
ubi.numAtoms()
calcGyradius(ubi)
Visualization¶
%matplotlib inline
showProtein(ubi)
Better visualization¶
import py3Dmol
showProtein(ubi)
How many atoms does PDB 3ERK have?¶
1023
2849
3016
4872
File handling¶
ProDy has a custom format for directly saving atom groups.
saveAtoms(ubi)
Can also read/write standard protein formats (PDB,PQR)
writePDB('ubi.pdb', ubi)
parsePDB(writePDB('ubi.pdb', ubi))
Structures and AtomGroups¶
Protein Data Bank (PDB) Structure files can be parsed using parsePDB. For a given PDB identifier, this function will download the file if needed.
ag = parsePDB('1vrt')
Structure data parsed from the file is stored in an AtomGroup instance:
type(ag)
Why AtomGroup?
- not
Molecule, because structures are usually made up from multiple molecules - not
Structure, because PDB format is sometimes used for storing small-molecules
AtomGroup made sense for handling bunch of atoms, and is used by some other packages too.
Some AtomGroup methods¶
Check number of atoms and models (coordinate sets)
ag.numAtoms()
ag.numCoordsets()
Getters¶
Use getSomething methods to access data parsed from the file
names = ag.getNames()
names
len(names)
print(names[0])
ag.getBetas()
Note that get is followed by a plural name, and method returns a numpy array
Setters¶
Use setSomething methods to set attributes, but don't forget to pass an array or list with length equal to number of atoms
zeros = [0] * len(ag) # same as ag.numAtoms()
ag.setBetas(zeros)
ag.getBetas()
Atom instances¶
You can get a handle for specific atoms by indexing
a0 = ag[0]
print(a0)
Note that getSomething and setSomething methods are available, but that thing is in singular form:
a0.getName()
a0.getBeta() # we had just set it to zero
Subset of atoms¶
Slicing an AtomGroup will return a handler called Selection for specified subset of atoms. Let's get every other atom:
eoa = ag[::2] # eoa: every other atom
print(eoa)
print(len(eoa))
print(len(ag))
get and set methods in plural form are also available for Selection instances
eoa.getNames()
Hierarchical Views¶
Macromolecules have a hierarchical structure:
- chain - a polypeptide or a nucleic acid chain
- residue - an amino acid, nucleotide, small molecule, or an ion
- atom - lowest level of hierarchy
- segment - used by simulation programs and may be composed of multiple chains
ag.numChains()
ag.numResidues()
showProtein(ag)
Iterations¶
for ch in ag.iterChains():
print('%s - %d residues' % (str(ch), ch.numResidues()))
for ch in ag.iterChains():
print(ch)
for res in ch:
print(' |-%s' % res)
break
print('...')
Also iterAtoms (default), iterBonds, iterCoordsets, iterFragments, iterResidues, iterSegments
Indexing¶
Index with chain identifier to get a chain instance:
chA = ag['A']
print(chA)
type(chA)
Index with a pair of chain identifier and residue number to get a handle for a residue:
chA_res10 = ag['A', 10]
print(chA_res10)
type(chA_res10)
Of course, plural forms of get and set methods and other methods are available:
chA_res10.getNames()
chA_res10.numAtoms()
Atom Selections¶
The atom selection engine is what makes ProDy a powerful tool. Selection grammar is similar to that of VMD but with added flexibility of Python. AtomGroup.select method accepts selection arguments and returns a Selection instance:
sel = ag.select('protein and name CA')
print(sel)
print('# of atoms: %d' % sel.numAtoms())
set(sel.getNames())
Same as using the keyword 'ca' or 'calpha'
ag.select('ca') == ag.select('calpha') == ag.select('protein and name CA')
Selecting by distance¶
You can make proximity based selections:
kinase = parsePDB('3erk')
bindingsite = kinase.select('within 5 of (hetero and not water)')
print('# of atoms: %d' % bindingsite.numAtoms())
print(set(bindingsite.getResnames()))
exwithin excludes the initial selection
noligand = kinase.select('exwithin 5 of (hetero and not water)')
print('# of atoms: %d' % noligand.numAtoms())
print(set(noligand.getResnames()))
How many non-water protein atoms are there within 5A of SB4?¶
67
70
73
78
Keyword arguments¶
Words used within the selection string that are not reserved keywords can be substituted with keyword arguments.
Select atoms that are close to a point in space:
import numpy as np
origin = np.zeros(3)
print(origin)
sel = ag.select('within 10 of origin', origin=origin)
sel
calcDistance(sel, origin)[:5]
ag.select('within 5 of center',center = calcCenter(ag))
Dot selection shorthand¶
Dot operator can also be used to make selections:
ag.calpha
ag.name_CA_and_resname_ALA
See full documentation of selection grammar at: https://prody.csb.pitt.edu/manual/reference/atomic/select.html
Coordinate Sets¶
AtomGroup can handle multiple models in a PDB file. Models are called coordinate sets.
ubi = parsePDB('2k39')
ubi.numCoordsets()
Active coordinate set (ACS) can be queried or changed using getACSIndex and setACSIndex methods:
ubi.getACSIndex()
Coordinates¶
Coordinates are numpy arrays
coords = ubi.getCoords()
coords.mean(axis=0)
ubi.setACSIndex(115)
ubi.getCoords().mean(axis=0)
Selections have their own active coordinate set indices:
ubi_ca = ubi.calpha
ubi_ca.getACSIndex()
ubi_ca.setACSIndex(0)
print(ubi_ca.getACSIndex())
print(ubi.getACSIndex())
Operations with Atoms¶
Copying atoms¶
Call copy method to make a copy of AtomGroup or Selection instance
ag.copy()
ca = ag.select('ca')
ca.copy()
AtomGroup addition¶
Lot's of customization has gone into ProDy classess handling atoms. Addition of AtomGoups (__add__) results in a new AtomGroup:
ag_copy = ag.copy()
moveAtoms(ag_copy, by=np.array([50, 50, 50]))
ag_copy['A'].setChid('C')
ag_copy['B'].setChid('D')
ag_new = ag + ag_copy
showProtein(ag_new)
AtomGroup membership¶
You can use in to see whether a Selection is in another one (enabled by implementing Selection.__contains__() method):
ag.calpha in ag.backbone
ag.backbone in ag.protein
ag.water in ag.protein
ag.water in ag
Selection operations¶
You can use bitwise operations on selections as follows:
chA = ag.chain_A
chB = ag.chain_B
print(chA & chB) # intersection
sel = chA | chB # union
sel.getSelstr()
~chA #not
Store data in AtomGroup¶
You can store arbitrary atomic data in AtomGroup class and make it persistent on disk:
atoms = parsePDB('1p38')
atoms.getResnums()
resnum_fract = atoms.getResnums() / 10.
resnum_fract
atoms.setData('resnumfract', resnum_fract)
atoms.getData('resnumfract')
Saving to files¶
saveAtoms(atoms)
atoms = loadAtoms('1p38.ag.npz')
atoms.getData('resnumfract')
PDB Access¶
Fetch PDB files¶
Can request multiple files at once. Will download to local directory.
fetchPDB('1p38', '1r39', '@!~#') # searches working directory, local resources, then downloads
fetchPDBviaHTTP('1p38')
fetchPDBviaFTP('1p38')
Comparing and aligning structures¶
p38 = parsePDB('1p38')
bound = parsePDB('1zz2')
showProtein(p38, bound)
Chain matching and RMSD¶
matchChains tries to match chains by sequence. Returns a list of all pairwise matches
apo_chA, bnd_chA, seqid, overlap = matchChains(p38, bound)[0]
print(bnd_chA)
print(seqid)
print(overlap)
Match is alpha carbon only
print(len(apo_chA),len(bnd_chA),len(p38),len(bound))
set(apo_chA.getNames())
calcRMSD(apo_chA, bnd_chA)
bnd_chA, transformation = superpose(bnd_chA, apo_chA)
calcRMSD(bnd_chA, apo_chA)
showProtein(p38, bound)
Building biomolecules¶
PDB files often contain monomer coordinates, when the structure of the multimer can be obtained using symmetry operations. Let’s use some ProDy function to build tetrameric form of KcsA potassium channel:
# parse the monomer structure and PDB file header information
# the header includes transformations for forming tetramer
monomer, header = parsePDB('1k4c', header=True)
monomer
showProtein(monomer, legend=True)
Building biomolecules¶
without_K = monomer.not_name_K
without_K
tetramer = buildBiomolecules(header, without_K)
tetramer
showProtein(tetramer)