Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Simulating solvated protein

  • For the simulations in this notebook, we will make use of the crystal structure of the Villin Headpiece subdomain 1YRF. This is relatively small protein (35 residues) consisting of three alpha helices. This is a prototypical fast folding protein, see Kubelka et al. (2006) and Chiu et al. (2005), and it is therefore a popular benchmark for protein folding molecular dynamics simulations, e.g. Robustelli et al. (2018) (force field accuracy) and Shaw et al. (2014) (computational performance)

  • Before running any of the cells below make sure you select GPU from the Runtime! We will need GPU to run simulations faster.

python```!pip install -q condacolab import condacolab condacolab.install()

!conda install -c conda-forge openmm mdtraj parmed !pip install py3dmol

import shutil
from sys import stdout
import matplotlib.pyplot as plt
import mdtraj
import numpy as np
import pandas

from openmm import *
from openmm.app import *
from openmm.unit import *
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 4
      2 from sys import stdout
      3 import matplotlib.pyplot as plt
----> 4 import mdtraj
      5 import numpy as np
      6 import pandas

ModuleNotFoundError: No module named 'mdtraj'

Download structure

  • First you need to obtain the structure file of 1YRF from PDB database.

  • We will use parmed to download the PDB and select the 35 residues of protein chain omitting any other extra molecules that typically come with PDB file.

import requests

# Download the PDB file for 1YRF
url = "https://files.rcsb.org/download/1YRF.pdb"
with open("1YRF.pdb", "w") as f:
    f.write(requests.get(url).text)


import parmed as pmd

pdb = pmd.load_file("1YRF.pdb")
pdb = pdb[':1-35']  # Select first 35 residues
pdb.write_pdb("1YRF_trimmed.pdb")

Prepare solvated system

# Load in the structure and add water molecules folloing a particular forcefield
pdb = PDBFile("1YRF_trimmed.pdb")
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model="tip3p", padding=1 * nanometer)

# Forcefield together with topology is used to create openmm system containing all the forces acting on atoms. 
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, constraints=HBonds)

Minimize and save input structure

temperature = 300 * kelvin
pressure = 1 * bar

integrator = LangevinIntegrator(temperature, 
                                1 / picosecond, 
                                2 * femtoseconds)

system.addForce(MonteCarloBarostat(pressure, 
                                   temperature))

simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)
positions = simulation.context.getState(getPositions=True).getPositions()

with open("init.pdb", "w") as f:
    PDBFile.writeFile(simulation.topology, positions, f)

Run a simulation starting from a minimized structure

simulation.reporters = []
simulation.reporters.append(DCDReporter("traj.dcd", 10))
simulation.reporters.append(
    StateDataReporter(stdout, 100, step=True, temperature=True, elapsedTime=True)
)
simulation.reporters.append(
    StateDataReporter(
        "scalars.csv",
        10,
        step=True,
        time=True,
        potentialEnergy=True,
        totalEnergy=True,
        temperature=True,
    )
)
simulation.step(30000)

Make an estimate of the computational cost (wall time) on your current hardware to run a sufficiently long simulation to observe spontaneously a protein folding event for this fast folder. The required simulation time is approximately 4μs4 \mathrm{\mu s}. Would such a calculations be feasible?

Analyzing simulation

df = pandas.read_csv("scalars.csv")
df.plot(kind="line", x="Time (ps)", y="Potential Energy (kJ/mole)")
traj = mdtraj.load("traj.dcd", top="init.pdb")
view = nglview.show_mdtraj(traj)
view
References
  1. Kubelka, J., Chiu, T. K., Davies, D. R., Eaton, W. A., & Hofrichter, J. (2006). Sub-microsecond Protein Folding. Journal of Molecular Biology, 359(3), 546–553. 10.1016/j.jmb.2006.03.034
  2. Chiu, T. K., Kubelka, J., Herbst-Irmer, R., Eaton, W. A., Hofrichter, J., & Davies, D. R. (2005). High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. Proceedings of the National Academy of Sciences, 102(21), 7517–7522. 10.1073/pnas.0502495102
  3. Robustelli, P., Piana, S., & Shaw, D. E. (2018). Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences, 115(21). 10.1073/pnas.1800690115
  4. Shaw, D. E., Grossman, J. P., Bank, J. A., Batson, B., Butts, J. A., Chao, J. C., Deneroff, M. M., Dror, R. O., Even, A., Fenton, C. H., Forte, A., Gagliardo, J., Gill, G., Greskamp, B., Ho, C. R., Ierardi, D. J., Iserovich, L., Kuskin, J. S., Larson, R. H., … Young, C. (2014). Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 41–53. 10.1109/sc.2014.9