import numpy as np
import matplotlib.pyplot as pltWarm-up: a 3-state weather chain¶
A simple Markov chain on three states — sunny, cloudy, rainy — illustrates how sampling from the chain converges to the stationary distribution.
state_space = ("sunny", "cloudy", "rainy")
# Transition matrix: T[i,j] = P(next = j | current = i)
T_weather = np.array([[0.6, 0.3, 0.1],
[0.3, 0.4, 0.3],
[0.2, 0.3, 0.5]])
# Simulate the chain
n_steps = 20000
states = [0]
for _ in range(n_steps):
states.append(np.random.choice(3, p=T_weather[states[-1]]))
states = np.array(states)
# Plot convergence of empirical frequencies to stationary distribution
fig, ax = plt.subplots(figsize=(8, 4))
offsets = np.arange(1, n_steps, 5)
colors = ['gold', 'silver', 'steelblue']
for i, (label, c) in enumerate(zip(state_space, colors)):
freq = [np.sum(states[:n] == i) / n for n in offsets]
ax.plot(offsets, freq, label=label, color=c, lw=1.5)
ax.set_xlabel("Number of steps")
ax.set_ylabel("Empirical frequency")
ax.set_title("Convergence to stationary distribution")
ax.legend()
ax.grid(True, ls=':', alpha=0.5)
plt.tight_layout()
Stationary distribution from eigenvalues¶
The stationary distribution is the left eigenvector of with eigenvalue 1:
The second-largest eigenvalue controls the relaxation time — how many steps it takes to forget the initial condition.
# Eigenvalue analysis of the weather transition matrix
eigenvalues, eigvecs_left = np.linalg.eig(T_weather.T)
# Sort by magnitude (largest first)
idx = np.argsort(-np.abs(eigenvalues))
eigenvalues = eigenvalues[idx]
eigvecs_left = eigvecs_left[:, idx]
# Stationary distribution = normalized left eigenvector for eigenvalue 1
pi = np.real(eigvecs_left[:, 0])
pi = pi / pi.sum()
print("Eigenvalues:", np.round(eigenvalues, 4))
print("Stationary distribution:", np.round(pi, 4))
print(f"Relaxation time: tau = {-1/np.log(np.abs(eigenvalues[1])):.1f} steps")
# Verify: compare with long-run simulation frequencies
sim_freq = [np.mean(states == i) for i in range(3)]
print(f"Simulation frequencies: {np.round(sim_freq, 4)}")Eigenvalues: [1. 0.4 0.1]
Stationary distribution: [0.3889 0.3333 0.2778]
Relaxation time: tau = 1.1 steps
Simulation frequencies: [0.3929 0.3375 0.2696]
Chemical isomerization: A ⇌ B¶
A reversible reaction , is the simplest chemical Markov chain.
The transition matrix is:
Detailed balance gives the equilibrium: , so and .
p = 0.3 # A -> B
q = 0.7 # B -> A
T_iso = np.array([[1-p, p],
[q, 1-q]])
# Simulate
n_steps = 2000
current = 0
history = np.zeros((n_steps, 2))
history[0, current] = 1
for step in range(1, n_steps):
current = np.random.choice(2, p=T_iso[current])
history[step, current] = 1
cum_prob = np.cumsum(history, axis=0) / np.arange(1, n_steps+1).reshape(-1, 1)
# Exact equilibrium
pi_A = q / (p + q)
pi_B = p / (p + q)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(cum_prob[:, 0], label='$P(A)$', color='steelblue', lw=1.5)
ax1.plot(cum_prob[:, 1], label='$P(B)$', color='coral', lw=1.5)
ax1.axhline(pi_A, color='steelblue', ls='--', alpha=0.5, label=f'$\pi_A = {pi_A:.3f}$')
ax1.axhline(pi_B, color='coral', ls='--', alpha=0.5, label=f'$\pi_B = {pi_B:.3f}$')
ax1.set(xlabel='Steps', ylabel='Cumulative frequency', title='Approach to equilibrium')
ax1.legend()
ax1.grid(True, ls=':', alpha=0.5)
# Equilibrium bar chart
sim_freq = np.sum(history, axis=0) / n_steps
ax2.bar(['A', 'B'], sim_freq, color=['steelblue', 'coral'], alpha=0.7, label='Simulation')
ax2.bar(['A', 'B'], [pi_A, pi_B], color='none', edgecolor='black', lw=2, label='Exact')
ax2.set(ylabel='Probability', title='Equilibrium distribution')
ax2.legend()
fig.tight_layout()<>:27: SyntaxWarning: invalid escape sequence '\p'
<>:28: SyntaxWarning: invalid escape sequence '\p'
<>:27: SyntaxWarning: invalid escape sequence '\p'
<>:28: SyntaxWarning: invalid escape sequence '\p'
/tmp/ipykernel_3579/100161158.py:27: SyntaxWarning: invalid escape sequence '\p'
ax1.axhline(pi_A, color='steelblue', ls='--', alpha=0.5, label=f'$\pi_A = {pi_A:.3f}$')
/tmp/ipykernel_3579/100161158.py:28: SyntaxWarning: invalid escape sequence '\p'
ax1.axhline(pi_B, color='coral', ls='--', alpha=0.5, label=f'$\pi_B = {pi_B:.3f}$')

As , the fraction of time spent in each state converges to the theoretical equilibrium from detailed balance: , .
The relaxation eigenvalue is , giving steps.
Ion channel gating¶
Ion channels in cell membranes switch stochastically between closed and open conformations. A classic model is a 3-state channel:
Closed → Open: the channel activates (rate )
Open → Inactivated: the channel inactivates (rate )
Reverse transitions have rates ,
This is the basis of Hodgkin-Huxley-type models in neuroscience. Let’s simulate single-channel dynamics and see how the dwell-time distributions emerge.
# Ion channel: Closed (0) <-> Open (1) <-> Inactivated (2)
# Transition probabilities per time step (discretized rates)
a1, b1 = 0.08, 0.04 # Closed <-> Open
a2, b2 = 0.03, 0.01 # Open <-> Inactivated
T_channel = np.array([
[1-a1, a1, 0 ], # Closed
[b1, 1-b1-a2, a2 ], # Open
[0, b2, 1-b2], # Inactivated
])
# Simulate single-channel trace
n_steps = 50000
state = 0
trace = np.zeros(n_steps, dtype=int)
for t in range(1, n_steps):
state = np.random.choice(3, p=T_channel[state])
trace[t] = state
# Extract dwell times in the Open state
is_open = (trace == 1).astype(int)
transitions = np.diff(is_open)
open_starts = np.where(transitions == 1)[0] + 1
open_ends = np.where(transitions == -1)[0] + 1
n_dwells = min(len(open_starts), len(open_ends))
dwell_times = open_ends[:n_dwells] - open_starts[:n_dwells]
# Stationary distribution
eigenvalues, eigvecs = np.linalg.eig(T_channel.T)
idx = np.argmax(np.abs(eigenvalues))
pi_ch = np.real(eigvecs[:, idx])
pi_ch /= pi_ch.sum()
# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
labels_ch = ['Closed', 'Open', 'Inactivated']
colors_ch = ['#4477AA', '#EE6677', '#CCBB44']
# Single-channel trace (first 2000 steps)
ax = axes[0]
window = slice(0, 2000)
ax.step(np.arange(2000), trace[window], where='post', lw=0.6, color='k')
ax.set_yticks([0, 1, 2])
ax.set_yticklabels(labels_ch)
ax.set(xlabel='Time step', title='Single-channel trace')
# Shade open state
ax.fill_between(np.arange(2000), 0, 2, where=(trace[window]==1),
alpha=0.2, color=colors_ch[1], step='post')
# Dwell time distribution (Open state)
ax = axes[1]
ax.hist(dwell_times, bins=30, density=True, alpha=0.7, color=colors_ch[1], edgecolor='white')
# Expected: geometric distribution with rate (b1 + a2) per step
t_range = np.arange(1, max(dwell_times))
p_exit = b1 + a2
ax.plot(t_range, p_exit * (1-p_exit)**(t_range-1), 'k-', lw=2, label=f'Geometric($p={p_exit:.2f}$)')
ax.set(xlabel='Dwell time (steps)', ylabel='Density', title='Open-state dwell times')
ax.legend()
# Stationary distribution
ax = axes[2]
sim_freq = [np.mean(trace == i) for i in range(3)]
x = np.arange(3)
ax.bar(x - 0.15, sim_freq, 0.3, color=colors_ch, alpha=0.7, label='Simulation')
ax.bar(x + 0.15, pi_ch, 0.3, color='none', edgecolor='black', lw=2, label='Eigenvalue')
ax.set_xticks(x)
ax.set_xticklabels(labels_ch)
ax.set(ylabel='Probability', title='Stationary distribution')
ax.legend()
fig.tight_layout()
Protein conformational dynamics¶
Many proteins fluctuate between distinct conformational states. A coarse-grained Markov State Model (MSM) captures the essential kinetics:
Key observables:
Relaxation timescales from eigenvalues: how long does it take to fold?
Flux at steady state: under non-equilibrium conditions, is there a net current through the cycle?
Let’s simulate a 3-state protein with an added rare pathway U → F (misfolding or direct folding).
# Protein MSM: U(0) <-> I(1) <-> F(2), plus rare U <-> F shortcut
T_protein = np.array([
[0.90, 0.08, 0.02], # Unfolded
[0.05, 0.85, 0.10], # Intermediate
[0.01, 0.04, 0.95], # Folded
])
states_label = ['Unfolded', 'Intermediate', 'Folded']
# Eigenvalue analysis
eigvals, eigvecs = np.linalg.eig(T_protein.T)
idx = np.argsort(-np.abs(eigvals))
eigvals = eigvals[idx].real
eigvecs = eigvecs[:, idx].real
pi_prot = eigvecs[:, 0] / eigvecs[:, 0].sum()
tau_slow = -1 / np.log(np.abs(eigvals[1]))
tau_fast = -1 / np.log(np.abs(eigvals[2]))
print(f"Stationary distribution: U={pi_prot[0]:.3f}, I={pi_prot[1]:.3f}, F={pi_prot[2]:.3f}")
print(f"Eigenvalues: {np.round(eigvals, 4)}")
print(f"Slow relaxation time: tau_1 = {tau_slow:.1f} steps (folding timescale)")
print(f"Fast relaxation time: tau_2 = {tau_fast:.1f} steps (U <-> I equilibration)")Stationary distribution: U=0.181, I=0.249, F=0.570
Eigenvalues: [1. 0.9066 0.7934]
Slow relaxation time: tau_1 = 10.2 steps (folding timescale)
Fast relaxation time: tau_2 = 4.3 steps (U <-> I equilibration)
# Simulate starting from Unfolded and watch relaxation
n_steps = 5000
n_traj = 500
# Run many independent trajectories starting from U
all_states = np.zeros((n_traj, n_steps), dtype=int)
for traj in range(n_traj):
s = 0 # start unfolded
all_states[traj, 0] = s
for t in range(1, n_steps):
s = np.random.choice(3, p=T_protein[s])
all_states[traj, t] = s
# Compute time-dependent populations P(state, t) averaged over trajectories
P_t = np.zeros((n_steps, 3))
for i in range(3):
P_t[:, i] = np.mean(all_states == i, axis=0)
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
colors_prot = ['#CC6677', '#DDCC77', '#88CCEE']
# Left: relaxation of populations
ax = axes[0]
for i, (lbl, c) in enumerate(zip(states_label, colors_prot)):
ax.plot(P_t[:, i], label=lbl, color=c, lw=2)
ax.axhline(pi_prot[i], color=c, ls='--', alpha=0.4)
ax.axvline(tau_slow, color='gray', ls=':', alpha=0.6, label=f'$\\tau_{{slow}}={tau_slow:.0f}$')
ax.set(xlabel='Time step', ylabel='$P(\mathrm{state}, t)$',
title='Relaxation from Unfolded state')
ax.legend(fontsize=9)
ax.grid(True, ls=':', alpha=0.4)
# Right: implied timescales (demonstrate lag-time dependence)
lag_times = [1, 2, 5, 10, 20, 50]
implied_ts = []
for lag in lag_times:
# Build count matrix at this lag
C = np.zeros((3, 3))
for traj in range(n_traj):
for t in range(0, n_steps - lag, lag):
C[all_states[traj, t], all_states[traj, t+lag]] += 1
# Row-normalize to get transition matrix at lag
T_lag = C / C.sum(axis=1, keepdims=True)
eigs = np.sort(np.abs(np.linalg.eigvals(T_lag)))[::-1]
# Implied timescale = -lag / ln(lambda)
its = [-lag / np.log(e) if e > 0 and e < 1 else np.nan for e in eigs[1:]]
implied_ts.append(its)
implied_ts = np.array(implied_ts)
ax = axes[1]
ax.plot(lag_times, implied_ts[:, 0], 'o-', color=colors_prot[0], lw=2, label='$\\tau_1$ (slow)')
ax.plot(lag_times, implied_ts[:, 1], 's-', color=colors_prot[1], lw=2, label='$\\tau_2$ (fast)')
ax.axhline(tau_slow, color=colors_prot[0], ls='--', alpha=0.4)
ax.axhline(tau_fast, color=colors_prot[1], ls='--', alpha=0.4)
ax.set(xlabel='Lag time', ylabel='Implied timescale (steps)',
title='Implied timescales vs lag time')
ax.legend()
ax.grid(True, ls=':', alpha=0.4)
fig.tight_layout()<>:28: SyntaxWarning: invalid escape sequence '\m'
<>:28: SyntaxWarning: invalid escape sequence '\m'
/tmp/ipykernel_3579/3079419031.py:28: SyntaxWarning: invalid escape sequence '\m'
ax.set(xlabel='Time step', ylabel='$P(\mathrm{state}, t)$',

Left panel: Starting from 100% Unfolded, the populations relax toward equilibrium on a timescale set by (the folding time).
Right panel: The implied timescale test is a standard validation for Markov State Models. If the model is Markovian at lag time , the implied timescales plateau — they become independent of lag time. This is how researchers validate MSMs built from molecular dynamics simulations.
Problems¶
Problem 1: Varying forward and reverse rates¶
For the isomerization reaction A ⇌ B:
Systematically vary and (e.g., with ).
For each pair, compute the relaxation time and verify it against the simulation.
Plot vs and explain the trend physically.
Problem 2: Ion channel pharmacology¶
Modify the ion channel model to simulate a drug that stabilizes the inactivated state (reduce from 0.01 to 0.001).
How does the stationary probability of the Open state change?
How does the mean open dwell time change?
Plot the single-channel trace and compare with the untreated case.
Problem 3: First passage time to folding¶
Using the protein MSM:
Run 1000 trajectories starting from the Unfolded state.
Record the first passage time to the Folded state for each trajectory.
Plot the distribution of first passage times. Is it exponential?
Compare the mean first passage time to from eigenvalue analysis. Are they the same? Why or why not?