Simulating solvated protein#

  • For the simulations in this notebook, we will make use of the crystal structure of the Villin Headpiece subdomain 1YRF. This is relatively small protein (35 residues) consisting of three alpha helices. This is a prototypical fast folding protein, see 10.1016/j.jmb.2006.03.034 and 10.1073/pnas.0502495102, and it is therefore a popular benchmark for protein folding molecular dynamics simulations, e.g. 10.1073/pnas.1800690115 (force field accuracy) and 10.1109/SC.2014.9 (computational performance)

  • Before running any of the cells below make sure you select GPU from the Runtime! We will need GPU to run simulations faster.

!pip install -q condacolab
import condacolab
condacolab.install()
%%capture
!conda install -c conda-forge openmm mdtraj parmed 
!pip install py3dmol 
import shutil
from sys import stdout
import matplotlib.pyplot as plt
import mdtraj
import numpy as np
import pandas

from openmm import *
from openmm.app import *
from openmm.unit import *

Download structure#

  • First you need to obtain the structure file of 1YRF from PDB database.

  • We will use parmed to download the PDB and select the 35 residues of protein chain omitting any other extra molecules that typically come with PDB file.

import parmed

pdb = parmed.download_PDB('1YRF')

pdb = pdb[':1-35']

pdb.write_pdb('1yrf.pdb')

Prepare solvated system#

# Load in the structure and add water molecules folloing a particular forcefield
pdb = PDBFile("1yrf.pdb")
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model="tip3p", padding=1 * nanometer)

# Forcefield together with topology is used to create openmm system containing all the forces acting on atoms. 
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, constraints=HBonds)

Minimize and save input structure#

temperature = 300 * kelvin
pressure = 1 * bar

integrator = LangevinIntegrator(temperature, 
                                1 / picosecond, 
                                2 * femtoseconds)

system.addForce(MonteCarloBarostat(pressure, 
                                   temperature))

simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)
positions = simulation.context.getState(getPositions=True).getPositions()

with open("init.pdb", "w") as f:
    PDBFile.writeFile(simulation.topology, positions, f)

Run a simulation starting from a minimized structure#

simulation.reporters = []
simulation.reporters.append(DCDReporter("traj.dcd", 10))
simulation.reporters.append(
    StateDataReporter(stdout, 100, step=True, temperature=True, elapsedTime=True)
)
simulation.reporters.append(
    StateDataReporter(
        "scalars.csv",
        10,
        step=True,
        time=True,
        potentialEnergy=True,
        totalEnergy=True,
        temperature=True,
    )
)
simulation.step(30000)

Make an estimate of the computational cost (wall time) on your current hardware to run a sufficiently long simulation to observe spontaneously a protein folding event for this fast folder. The required simulation time is approximately \(4 \mathrm{\mu s}\). Would such a calculations be feasible?

Analyzing simulation#

df = pandas.read_csv("scalars.csv")
df.plot(kind="line", x="Time (ps)", y="Potential Energy (kJ/mole)")
traj = mdtraj.load("traj.dcd", top="init.pdb")
view = nglview.show_mdtraj(traj)
view