Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Numpy and Probabilities

Open In Colab
import numpy as np

# Given joint probability matrix p(x, y)
pxy = np.array([[0.0888395 , 0.04135306, 0.07692385, 0.04716456],
                [0.05191984, 0.04715928, 0.06539948, 0.04739613],
                [0.09325686, 0.05106682, 0.05221257, 0.05864018],
                [0.08283109, 0.11219392, 0.02191071, 0.06173214]])

Problem: Compute Full, Marginal, and Conditional Probabilities Given p(x,y) p(x, y)

Given the joint probability distribution p(x,y) p(x, y) , perform the following tasks:

  1. Check for Normalization:

    • Verify that the sum of all elements in p(x,y) p(x, y) equals 1 (i.e., x,yp(x,y)=1 \sum_{x,y} p(x, y) = 1 ).

  2. Compute the Marginal Probability p(x) p(x) :

    • Compute p(x) p(x) by summing over all values of y y :

      p(x)=yp(x,y)p(x) = \sum_{y} p(x, y)
  3. Compute a Specific Joint Probability:

    • Find the probability of x=1 x = 1 and y=2 y = 2 , i.e., p(x=1,y=2) p(x=1, y=2) .

  4. Compute the Conditional Probability p(xy=3) p(x | y=3) :

    • Use the definition of conditional probability:

      p(xy=3)=p(x,y=3)p(y=3)p(x | y=3) = \frac{p(x, y=3)}{p(y=3)}
    • Compute p(y=3) p(y=3) by summing over x x , then use it to compute p(xy=3) p(x | y=3) .

Exploration: How likely is it that (n) out of (N) molecules are in the left half?

When a wall separating an ideal gas is removed, each molecule can be found in the left or right half of the container. Assume each molecule is independently in the left half with probability p=12p=\tfrac12.

The probability that exactly (n) molecules out of (N) are in the left half is:

PN(n)=(Nn)(12)NP_N(n)=\binom{N}{n}\left(\frac12\right)^N

Implement PN(n)P_N(n) in NumPy

  • Create a Python function P_N(n, N) that returns the binomial probability values.

  • Use math.comb(N,n) for the binomial coefficient.

Plot the probability distribution

For each of these values of (N):

[N = 10,; 50,; 200,; 1000]

  • Create an array n = np.arange(N+1)

  • Compute P = P_N(n, N)

  • Plot PN(n)P_N(n) vs nn

Check normalization

  • Numerically verify that:

    n=0NPN(n)=1\sum_{n=0}^N P_N(n) = 1

Measure fluctuations

For each NN, compute and print:

  • The mean n\langle n\rangle

  • The standard deviation σ\sigma

  • The relative fluctuation σ/N\sigma/N

Problem: Understanding Covariance and Correlation Using NumPy

Consider two random variables, XX and YY, representing the heights (in cm) and weights (in kg) of a group of individuals. The relationship between these variables is often analyzed using covariance and correlation.

  1. Generate synthetic data:

    • Let XX (heights) be normally distributed with mean 170 cm and standard deviation 10 cm.

    • Let YY (weights) be generated as a linear function of XX with some added random noise:

      Y=0.5X+10+random noiseY = 0.5X + 10 + \text{random noise}

      where the random noise follows a normal distribution with mean 0 and standard deviation 5.

  2. Compute the sample covariance between XX and YY using NumPy.

  3. Compute the Pearson correlation coefficient using NumPy.

  4. Interpretation:

    • If the covariance is positive, what does it imply?

    • How does the correlation coefficient relate to the strength of the relationship?

Hints:

  • Use np.random.normal to generate data.

  • Compute covariance using np.cov(X, Y).

  • Compute correlation using np.corrcoef(X, Y).

Problem: Understanding Histograms, Normalization, and Bin Size in Statistical Mechanics

Scenario: Energy Distribution of Particles in a System

In statistical mechanics, the distribution of particle energies in a system at thermal equilibrium follows the Boltzmann distribution. To analyze this, we simulate a system of particles with energies distributed according to:

P(E)eE/kBTP(E) \propto e^{-E / k_B T}

where:

  • E E is the energy of a particle,

  • kB k_B is the Boltzmann constant (set to 1 for simplicity),

  • T T is the temperature (choose a reasonable value, e.g., T=300 T = 300 ).

Tasks:

  1. Generate Energy Data:

    • Simulate 10,000 particle energies following the Boltzmann distribution using an exponential distribution.

    • Use NumPy’s np.random.exponential(scale=kB*T, size=N) to generate energy values.

  2. Plot Histograms with Different Bin Sizes:

    • Create histograms with 20, 50, and 100 bins.

    • Compare how bin size affects the appearance of the histogram.

  3. Normalize the Histogram:

    • Convert the histogram into a probability density function (PDF) by ensuring the total area under the histogram sums to 1.

    • Use the density=True option in plt.hist().

  4. Overlay the Theoretical Boltzmann Distribution:

    • Plot the theoretical function P(E)=(1/kBT)eE/kBT P(E) = (1/k_B T) e^{-E / k_B T} on top of the histogram.

  5. Analyze the Results:

    • How does the choice of bin size affect the smoothness of the energy distribution?

    • Why is normalization important in probability distributions?

    • How does increasing the number of particles affect the histogram shape?

Hints:

  • Use np.random.exponential(scale=kB*T, size=N) to generate data.

  • Use plt.hist(energies, bins=n, density=True, alpha=0.5) to plot normalized histograms.

  • Overlay the theoretical Boltzmann distribution using plt.plot(E, P(E)).