For the simulations in this notebook, we will make use of the crystal structure of the Villin Headpiece subdomain 1YRF. This is relatively small protein (35 residues) consisting of three alpha helices. This is a prototypical fast folding protein, see Kubelka et al. (2006) and Chiu et al. (2005), and it is therefore a popular benchmark for protein folding molecular dynamics simulations, e.g. Robustelli et al. (2018) (force field accuracy) and Shaw et al. (2014) (computational performance)
Before running any of the cells below make sure you select GPU from the Runtime! We will need GPU to run simulations faster.
python```!pip install -q condacolab import condacolab condacolab.install()
!conda install -c conda-forge openmm mdtraj parmed !pip install py3dmol
import shutil
from sys import stdout
import matplotlib.pyplot as plt
import mdtraj
import numpy as np
import pandas
from openmm import *
from openmm.app import *
from openmm.unit import *---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 4
2 from sys import stdout
3 import matplotlib.pyplot as plt
----> 4 import mdtraj
5 import numpy as np
6 import pandas
ModuleNotFoundError: No module named 'mdtraj'Download structure¶
First you need to obtain the structure file of 1YRF from PDB database.
We will use parmed to download the PDB and select the 35 residues of protein chain omitting any other extra molecules that typically come with PDB file.
import requests
# Download the PDB file for 1YRF
url = "https://files.rcsb.org/download/1YRF.pdb"
with open("1YRF.pdb", "w") as f:
f.write(requests.get(url).text)
import parmed as pmd
pdb = pmd.load_file("1YRF.pdb")
pdb = pdb[':1-35'] # Select first 35 residues
pdb.write_pdb("1YRF_trimmed.pdb")Prepare solvated system¶
# Load in the structure and add water molecules folloing a particular forcefield
pdb = PDBFile("1YRF_trimmed.pdb")
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model="tip3p", padding=1 * nanometer)
# Forcefield together with topology is used to create openmm system containing all the forces acting on atoms.
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, constraints=HBonds)Minimize and save input structure¶
temperature = 300 * kelvin
pressure = 1 * bar
integrator = LangevinIntegrator(temperature,
1 / picosecond,
2 * femtoseconds)
system.addForce(MonteCarloBarostat(pressure,
temperature))
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)
positions = simulation.context.getState(getPositions=True).getPositions()
with open("init.pdb", "w") as f:
PDBFile.writeFile(simulation.topology, positions, f)Run a simulation starting from a minimized structure¶
simulation.reporters = []
simulation.reporters.append(DCDReporter("traj.dcd", 10))
simulation.reporters.append(
StateDataReporter(stdout, 100, step=True, temperature=True, elapsedTime=True)
)
simulation.reporters.append(
StateDataReporter(
"scalars.csv",
10,
step=True,
time=True,
potentialEnergy=True,
totalEnergy=True,
temperature=True,
)
)
simulation.step(30000)Make an estimate of the computational cost (wall time) on your current hardware to run a sufficiently long simulation to observe spontaneously a protein folding event for this fast folder. The required simulation time is approximately . Would such a calculations be feasible?
Analyzing simulation¶
df = pandas.read_csv("scalars.csv")
df.plot(kind="line", x="Time (ps)", y="Potential Energy (kJ/mole)")traj = mdtraj.load("traj.dcd", top="init.pdb")
view = nglview.show_mdtraj(traj)
view- Kubelka, J., Chiu, T. K., Davies, D. R., Eaton, W. A., & Hofrichter, J. (2006). Sub-microsecond Protein Folding. Journal of Molecular Biology, 359(3), 546–553. 10.1016/j.jmb.2006.03.034
- Chiu, T. K., Kubelka, J., Herbst-Irmer, R., Eaton, W. A., Hofrichter, J., & Davies, D. R. (2005). High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. Proceedings of the National Academy of Sciences, 102(21), 7517–7522. 10.1073/pnas.0502495102
- Robustelli, P., Piana, S., & Shaw, D. E. (2018). Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences, 115(21). 10.1073/pnas.1800690115
- Shaw, D. E., Grossman, J. P., Bank, J. A., Batson, B., Butts, J. A., Chao, J. C., Deneroff, M. M., Dror, R. O., Even, A., Fenton, C. H., Forte, A., Gagliardo, J., Gill, G., Greskamp, B., Ho, C. R., Ierardi, D. J., Iserovich, L., Kuskin, J. S., Larson, R. H., … Young, C. (2014). Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 41–53. 10.1109/sc.2014.9