Random Walk and Limit Theorems

Random Walk in 1D¶

Consider a system with a binary outcome, where each trial results in either a “+” or “−” outcome with fixed probabilities

p_{+} + p_{-} = 1

(1)

A classic example is a one-dimensional random walk of $N$ steps, where a particle moves right $(+1)$ or left $(-1)$ at each step.

Other equivalent examples include:

tossing $N$ coins (heads or tails),
counting $N$ non-interacting molecules on the left vs right side of a container.

Microstates¶

Each experiment generates a specific sequence of outcomes, for example

+1, -1, -1, +1, -1

(2)

or equivalently for coin tosses

HTHTT

(3)

Each sequence represents a single microstate.
Since each step has two possible outcomes, the total number of microstates is

\Omega = 2^N

(4)

For an unbiased random walk $(p_{+} = p_{-} = 1/2)$ , all microstates are equally probable, with probability $1/2^N$ .
For a biased random walk $(p_{+} \neq p_{-})$ , microstate probabilities depend on the number of $+$ and $-$ steps.

Macrostates¶

Often, we are not interested in the exact order of steps, but only in how many steps go to the right or left.

These quantities define a macrostate:

N_{+} + N_{-} = N

(6)

The net displacement from the origin is

\Delta N = N_{+} - N_{-}

(7)

Many different microstates correspond to the same macrostate.

For large $N$ , the binomial is well-approximated by a normal distribution, so $x$ is also approximately normal:

x \;\approx\; \mathcal{N}\!\big(N(p_{+}-p_{-}),\,4Np_{+}p_{-}\big).

(10)

Equivalently, the standardized displacement converges in distribution to a standard Gaussian:

z = \frac{x-N(p_{+}-p_{-})}{2\sqrt{Np_{+}p_{-}}} \;\xrightarrow[N\to\infty]{}\; \mathcal{N}(0,1), \qquad

(11)

P(x\mid N,p_{+}) \approx \frac{1}{2\sqrt{2\pi Np_{+}p_{-}}} \exp\!\left[-\frac{\big(x-N(p_{+}-p_{-})\big)^2}{8Np_{+}p_{-}}\right]

(12)

Gaussian or large $N$ limit of Binomial Distribution

The binomial distribution for large values of $N$ has a sharply peaked distribution around its maximum (most likely) value $\tilde{n}$ . This motivates us to seek a continuous approximation by Taylor expanding the probability distribution around its maximum value $\Delta n = n - \tilde{n}$ and keeping terms up to quadratic order.

P_N(n) = \frac{N!}{n! (N - n)!} p^{n} (1-p)^{N-n}

(13)

Thus, from the onset, we aim for a Gaussian distribution. The task is to find the coefficients and justify that the third term in the Taylor expansion is negligible compared to the second.

\log P(n) = \log P(\tilde{n}) + \frac{1}{2} B_2 \Delta n^2 + O(\Delta n^3)

(14)

\log P(n) = \log N! - \log n! - \log( N - n)! + n \log(p) + N_{-} \log(1-p)

(15)

We evaluate the derivative of $\log n!$ in the limit of $n \gg 1$ as:

\frac{d}{dn} \log n! = \frac{\log(n+1)! - \log n!}{n+1 - n} \approx \log(n+1) \approx \log(n)

(16)

We could also arrive at the same result by using Stirling’s approximation $\log N! \approx N \log N - N$ .
Taking the first derivative of the Taylor expansion for the binomial distribution, we find the peak of the distribution around which we expand:

\frac{d}{dn} \log P(n) \Big|_{n = \tilde{n}} = - \log n + \log(N - n) + \log(p) - \log(1-p) = 0

(17)

\log \left( \frac{N - n}{n} \cdot \frac{p}{1 - p} \right) = 0 \quad \Rightarrow \quad \tilde{n} = N p

(18)

We recall that $\tilde{n} = N p$ is also the mean of the binomial distribution!
Having found the peak of the distribution and knowing the first derivative, we now proceed to compute the second derivative:

B_2 = \frac{d^2}{d n^2} \log P(n) \Big|_{n = \tilde{n}}

(19)

= \frac{d}{dn} \log \left( \frac{N - n}{n} \cdot \frac{p}{1 - p} \right) = \left( -\frac{1}{N - n} - \frac{1}{n} \right) \Bigg|_{n = \tilde{n}} = -\frac{1}{N p(1 - p)}

(20)

While the first derivative gave us the mean of the binomial distribution, we notice that the second derivative produces the variance $\sigma^2 = N p (1 - p)$ .
Now, all that remains is to plug the coefficients into our approximated probability distribution and normalize it. Why normalize? The binomial was already properly normalized, but since we made an approximation by neglecting higher-order terms, we must re-normalize.

P(n) \approx P(\tilde{n}) e^{-(n - \tilde{n})^2 / 2N p (1 - p)}

(21)

Normalizing the Gaussian distribution is done via the following integral:

\int_{-\infty}^{+\infty} e^{-a x^2} dx = \left(\frac{\pi}{a}\right)^{1/2}

(22)

\int P(\tilde{n}) e^{-(n - \tilde{n})^2 / 2N p (1 - p)} dn = P(\tilde{n}) (2\pi N p (1 - p))^{1/2} = 1

(23)

Finally, we obtain the normalized Gaussian approximation to the binomial distribution:

P(n) \approx \frac{1}{(2\pi N p (1 - p))^{1/2}} e^{-(n - \tilde{n})^2 / 2N p (1 - p)} = \frac{1}{(2\pi \sigma^2)^{1/2}} e^{-(n - \mu)^2 / 2\sigma^2}

(24)

Poisson limit or the limit of large $N$ and small $p$ such that $Np=const$

This is a situation of rare events like rains in forest or radioactive decay of uranium where each individual event has small chance of happening $p \rightarrow 0$ yet there are large number of samples $N\rightarrow \infty$ such that one has a constant average rate of events $\lambda = pN = const$
In this limit distirbution is no longer well described by the gaussian as the shape of distribution is heavily skewed due to tiny values of p.

P_N(n) = \frac{N!}{n! (N-n)!} p^n (1-p)^{(N-n)}

(25)

Writing factorial $N!/(N-n)!$ explicitely we realize that it is dominated $N^n$ and also $N-n \approx N$

P_N(n) = \frac{N(N-1)...(N-1+1))}{n!} p^n (1-p)^{(N-n)} \approx \frac{N^n}{n!} p^n (1-p)^{N}

(26)

Next let us plug in $\lambda = pN = const$ and recall the definition of exponential $lim_{x\rightarrow \infty }(1-1/x)^x = e^{-x}$

P(n) = \frac{N^n}{n!} \Big( \frac{\lambda}{N} \Big)^n \Big( 1-\frac{\lambda}{N} \Big)^{N} = \frac{\lambda^n}{n!} \Big( 1-\frac{\lambda}{N} \Big)^{N} \approx \frac{\lambda^n}{n!} e^{-\lambda}

(27)

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import comb

def P_x(N, x, p):
    """
    Binomial probability written in terms of
    x = N_+ - N_- with N fixed
    """
    # Only valid when (N + x) is even
    if (N + x) % 2 != 0:
        return 0.0

    k = (N + x) // 2  # number of '+' outcomes
    return comb(N, k) * (p**k) * ((1 - p)**(N - k))


# -------------------------------
# Parameters students can change
# -------------------------------
p = 0.5            # probability of '+'
N_values = [10, 50, 100, 200]

# -------------------------------
# Plot
# -------------------------------
plt.figure(figsize=(7, 5))

for N in N_values:
    x_vals = np.arange(-N, N + 1, 2)
    x_norm = x_vals / N

    P_vals = [P_x(N, x, p) for x in x_vals]
    plt.plot(x_norm, P_vals, marker="o", label=f"N = {N}")

plt.xlabel(r"$x/N$")
plt.ylabel(r"$P(x)$")
plt.title(fr"Binomial distribution in $x = N_+ - N_-$  (p = {p})")
plt.legend()
plt.tight_layout()
plt.show()

Log of Macrostate Probability, Entropy, and Fluctuations¶

A macrostate of a random walk with $N$ steps is specified by the number of “+” steps $N_{+}$ (with $N_{-} = N - N_{+}$ ).

The logarithm of the probability of observing a given macrostate is

\log P(N_{+} \mid N, p_{+}) = \log \frac{N!}{N_{+}! N_{-}!} + \log \left[ p_{+}^{N_{+}} \, p_{-}^{N_{-}} \right]

(28)

Frequencies and Probabilities¶

In simulations, we measure the fractions of steps

f_{\pm} = \frac{N_{\pm}}{N}

(29)

These fractions fluctuate in finite simulations but converge to the true probabilities in the long-time limit:

f_{\pm} \to p_{\pm}

(30)

Since $f_{+} + f_{-} = 1$ and $p_{+} + p_{-} = 1$ , we introduce the shorthand notation

f \equiv f_{+}, \qquad p \equiv p_{+}

(31)

With this notation, the logarithm of the macrostate probability can be written as

\log P(f \mid N, p) = S(f) - E(f)

(32)

where

$S(f)$ is an entropy term, related to the number of microstates in a macrostate
$E(f)$ is an energy-like term encoding the bias in step probabilities

Energy as a Measure of Bias¶

The probability factor contributes an energy-like term

E = - \log \left[ p^{Nf} (1-p)^{N(1-f)} \right]

(33)

which simplifies to

E = -N \left[ f \log p + (1-f) \log (1-p) \right] = -N \, \epsilon(f)

(34)

where $\epsilon(f) = E/N$ is the energy per step.

When $p = \frac{1}{2}$ , there is no bias and the energy reduces to

E = N \log 2

(35)

Entropy as the Logarithm of the Number of Microstates¶

The entropy term arises from the combinatorial factor. Using Stirling’s approximation,

S(f) = \log \frac{N!}{N_{+}! N_{-}!} \approx N \log N - N_{+} \log N_{+} - N_{-} \log N_{-}

(36)

Rewriting in terms of the fraction $f = N_{+}/N$ gives

S(f) = N \left[ - f \log f - (1-f) \log (1-f) \right]

(37)

where $s(f) = S/N$ is the entropy per step.

Large Deviation Theory¶

For many systems, the probability of observing a macrostate characterized by a fraction $f$ has the form

\log P_N(f) \approx - N \big[ \epsilon(f) - s(f) \big] = -N I(f)

(40)

where $I(f)$ is a function that does not depend on $N$ .
As the number of steps, molecules, or components $N$ increases, the probability distribution becomes sharply concentrated near the minimum of $I(f)$ .
This function therefore controls both the shape and decay of the probability distribution in the large- $N$ limit. This general scaling behavior is known as Large Deviation Theory.

Example: Random Walk¶

For a random walk, the large deviation function measures how deviations of the empirical fractions $f_{\pm}$ from the true probabilities $p_{\pm}$ are suppressed as $N$ increases:

I(f) = f_{+} \log \frac{f_{+}}{p_{+}} + f_{-} \log \frac{f_{-}}{p_{-}}

(42)

As $N$ grows, fluctuations away from $f_{+} = p_{+}$ become exponentially unlikely.

# Re-import required libraries since execution state was reset
import numpy as np
import matplotlib.pyplot as plt

# Define parameters
p_plus = 0.7  # Biased probability
p_minus = 1 - p_plus  # Complementary probability

# Define range for f_+
f_plus_values = np.linspace(0.01, 0.99, 200)  # Avoid log(0) issues
f_minus_values = 1 - f_plus_values  # f_- = 1 - f_+

# Compute entropy component s(f_+)
s_values = -(f_plus_values * np.log(f_plus_values) + f_minus_values * np.log(f_minus_values))

# Compute energy component ε(f_+)
epsilon_values = - (f_plus_values * np.log(p_plus) + f_minus_values * np.log(p_minus))

# Compute large deviation rate function I(f_+)
I_values = f_plus_values * np.log(f_plus_values / p_plus) + f_minus_values * np.log(f_minus_values / p_minus)

# Compute probability P_N(f_+) using large deviation approximation
N = 50  # Arbitrary large N
P_x_values = np.exp(-N * I_values)  # Exponential suppression
P_x_values /= np.trapz(P_x_values, f_plus_values)  # Normalize for probability density

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# First subplot: Entropy and Energy Components
axes[0].plot(f_plus_values, s_values, label=r"$s(f)$ (Entropy)", color="blue")
axes[0].plot(f_plus_values, epsilon_values, label=r"$\epsilon(f_+)$ (Energy)", color="green")
axes[0].plot(f_plus_values, I_values, label=r"$I(f)$ (Rate Function)", color="red")
axes[0].axvline(p_plus, linestyle="--", color="black", label=r"$f_+ = p_+$")
axes[0].set_xlabel(r"$f_+$")
axes[0].set_ylabel("Value")
axes[0].set_title("Entropy, Energy, and Rate Function")
axes[0].legend()
axes[0].grid()

# Second subplot: Probability Distribution P_N(f_+)
axes[1].plot(f_plus_values, P_x_values, label=r"$P_N(f_+)$", color="purple")
axes[1].axvline(p_plus, linestyle="--", color="black", label=r"$f_+ = p_+$")
axes[1].set_xlabel(r"$f_+$")
axes[1].set_ylabel(r"$P_N(f_+)$")
axes[1].set_title("Probability Distribution")
axes[1].legend()
axes[1].grid()

plt.tight_layout()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import comb

def plot_large_deviation(N, theta, color):
    """Plots the large deviation approximation for given N and bias theta."""
    f = np.linspace(0.01, 0.99, 200)  # Avoid log(0) issues

    # Compute the rate function I(f)
    I = f * np.log(f / theta) + (1 - f) * np.log((1 - f) / (1 - theta))

    # Compute normalized probability P_LDT(f) ∼ exp(-N I(f))
    p_ldt = np.exp(-N * I)
    p_ldt /= np.trapz(p_ldt, f)  # Normalize using trapezoidal rule for integration

    plt.plot(f, p_ldt, color=color, linestyle="-", linewidth=2, label=f"LDT Approx. (N={N})")

def plot_binomial(N, theta, color):
    """Plots the exact binomial distribution for given N and bias theta."""
    n = np.arange(N + 1)
    f = n / N  # Convert discrete counts to fractions

    # Compute binomial probability mass function
    prob = comb(N, n) * theta**n * (1 - theta)**(N - n)

    # Normalize probability for direct comparison with LDT curve
    prob /= np.trapz(prob, f)

    plt.plot(f, prob, 'o', color=color, markersize=5, label=f"Binomial (N={N})")

# Parameters
theta = 0.5  # Fair coin
Ns = [5, 10, 20, 50, 100]  # Different values of N
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(Ns)))  # Use colormap for better distinction

# Create the plot
plt.figure(figsize=(8, 6))
for i, N in enumerate(Ns):
    plot_large_deviation(N, theta, colors[i])
    plot_binomial(N, theta, colors[i])

# Labels and formatting
plt.xlabel(r"$f_+$", fontsize=14)
plt.ylabel(r"Probability Density", fontsize=14)
plt.title("Comparison of Binomial Distribution (Points) and Large Deviation Approximation (Lines)", fontsize=12)
plt.legend(loc="upper left", fontsize=10)
plt.grid()
plt.show()

Gaussian Nature of Fluctuations¶

For large systems, the probability of observing a fraction $f$ typically has the large-deviation form

P(f) \sim e^{-N I(f)}

(43)

where $I(f)$ is the large deviation function. The most probable value $f_{\text{min}}$ minimizes $I(f)$ . To understand small fluctuations, we expand $I(f)$ around this minimum:

I(f)= I(f_{\text{min}}) + \frac{1}{2} I''(f_{\text{min}}) (f - f_{\text{min}})^2 + \cdots

(44)

Keeping only the quadratic term we once again we see that for large $N$ , fluctuations around the most probable value are Gaussian, with a width that shrinks as $1/\sqrt{N}$ .

P(f) \approx \exp\left[ - \frac{N}{2} I''(f_{\text{min}}) (f - f_{\text{min}})^2 \right]

(45)

Example of Gaussian limit of Large Deviation function for random walk

The large deviation rate function for a simple random walk is given by:

I(f) = f_+ \log \frac{f_+}{p_+} + f_- \log \frac{f_-}{p_-}

(46)

where $f_+$ and $f_-$ are empirical step probabilities, and $p_+$ , $p_-$ are their expected values with $p_+ + p_- = 1$ .

Expansion Around the Minimum

The function $I(f)$ is minimized at $f_+ = p_+$ , $f_- = p_-$ . Introducing small deviations $\delta f$ :

f_+ = p_+ + \delta f, \quad f_- = p_- - \delta f.

(47)

Expanding the logarithms:

\log \frac{p_+ + \delta f}{p_+} \approx \frac{\delta f}{p_+} - \frac{(\delta f)^2}{2 p_+^2}, \quad \log \frac{p_- - \delta f}{p_-} \approx -\frac{\delta f}{p_-} - \frac{(\delta f)^2}{2 p_-^2}.

(48)

Substituting into $I(f)$ , the linear terms cancel, and we obtain:

I(f) \approx \frac{(\delta f)^2}{2} \left( \frac{1}{p_+} + \frac{1}{p_-} \right).

(49)

Gaussian Limit

By the large deviation principle:

P(f) \approx e^{-N I(f)} = e^{-N \frac{(\delta f)^2}{2} \left( \frac{1}{p_+} + \frac{1}{p_-} \right)} = e^{- \frac{(\delta f)^2}{2 p_{+}p_{-}/N} }

(50)

This is a Gaussian with variance:

\sigma^2 = \frac{p_+ p_-}{N}.

(51)

Thus, the empirical frequency $f_+$ follows a Gaussian distribution in the large $N$ limit.

Big Picture: LLN, CLT, and LDT for a 1D Random Walk¶

Consider a biased random walk with

$N$ steps
step $s_i = \pm 1$
$P(s_i=+1)=p$ , $P(s_i=-1)=1-p$

Define total displacement

x = \sum_{i=1}^N s_i, \qquad \mu = \mathbb{E}[s_i]=p-(1-p)=2p-1, \qquad \sigma^2=\mathrm{Var}(s_i)=4p(1-p).

(52)

Theorem	What it describes	Scaling with $N$	Statement for random walk	What it tells us physically
Law of Large Numbers (LLN)	Typical value	Fluctuations $\sim 1/\sqrt{N}$	$\frac{x}{N} \to \mu$ (53)	The average velocity converges to the drift
Central Limit Theorem (CLT)	Typical fluctuations	Width $\sim \sqrt{N}$	$\frac{x-N\mu}{\sqrt{N}} \Rightarrow \mathcal{N}(0,\sigma^2)$ (54)	Near the mean, distribution becomes Gaussian
Large Deviation Theory (LDT)	Rare events	Probability $\sim e^{-N I(f)}$	$P\!\left(\frac{x}{N}=f\right)\approx e^{-N I(f)}$ (55)	Exponentially small probability of atypical drift

Hierarchy of Approximations¶

Conceptually:

LLN → tells us where the distribution concentrates
CLT → tells us its Gaussian shape near the peak
LDT → tells us how the tails decay far from the peak

Graphically:

LLN identifies the maximum of the distribution
CLT describes the quadratic expansion near the maximum
LDT provides the full rate function governing global shape

\boxed{ P\!\left(\frac{x}{N}=f\right) \approx \exp[-N I(f)] }

(56)

Near the minimum of $I(f)$ :

I(f) \approx \frac{(f-\mu)^2}{2\sigma^2} \quad \Longrightarrow \quad \text{CLT Gaussian}

(57)