Numpy and Probabilities#

Problem: Compute Full, Marginal, and Conditional Probabilities Given p(x,y)#

Given the joint probability distribution p(x,y), perform the following tasks:

  1. Check for Normalization:

    • Verify that the sum of all elements in p(x,y) equals 1 (i.e., x,yp(x,y)=1).

  2. Compute the Marginal Probability p(x):

    • Compute p(x) by summing over all values of y:

      p(x)=yp(x,y)
  3. Compute a Specific Joint Probability:

    • Find the probability of x=1 and y=2, i.e., p(x=1,y=2).

  4. Compute the Conditional Probability p(x|y=3):

    • Use the definition of conditional probability: $p(x|y=3)=p(x,y=3)p(y=3)$

    • Compute p(y=3) by summing over x, then use it to compute p(x|y=3).

import numpy as np

# Given joint probability matrix p(x, y)
pxy = np.array([[0.0888395 , 0.04135306, 0.07692385, 0.04716456],
                [0.05191984, 0.04715928, 0.06539948, 0.04739613],
                [0.09325686, 0.05106682, 0.05221257, 0.05864018],
                [0.08283109, 0.11219392, 0.02191071, 0.06173214]])

Problem: Understanding Covariance and Correlation Using NumPy#

Consider two random variables, X and Y, representing the heights (in cm) and weights (in kg) of a group of individuals. The relationship between these variables is often analyzed using covariance and correlation.

  1. Generate synthetic data:

    • Let X (heights) be normally distributed with mean 170 cm and standard deviation 10 cm.

    • Let Y (weights) be generated as a linear function of X with some added random noise:

      Y=0.5X+10+random noise

      where the random noise follows a normal distribution with mean 0 and standard deviation 5.

  2. Compute the sample covariance between X and Y using NumPy.

  3. Compute the Pearson correlation coefficient using NumPy.

  4. Interpretation:

    • If the covariance is positive, what does it imply?

    • How does the correlation coefficient relate to the strength of the relationship?

Hints:#

  • Use np.random.normal to generate data.

  • Compute covariance using np.cov(X, Y).

  • Compute correlation using np.corrcoef(X, Y).

Problem: Understanding Histograms, Normalization, and Bin Size in Statistical Mechanics#

Scenario: Energy Distribution of Particles in a System#

In statistical mechanics, the distribution of particle energies in a system at thermal equilibrium follows the Boltzmann distribution. To analyze this, we simulate a system of particles with energies distributed according to:

P(E)eE/kBT

where:

  • E is the energy of a particle,

  • kB is the Boltzmann constant (set to 1 for simplicity),

  • T is the temperature (choose a reasonable value, e.g., T=300).

Tasks:#

  1. Generate Energy Data:

    • Simulate 10,000 particle energies following the Boltzmann distribution using an exponential distribution.

    • Use NumPy’s np.random.exponential(scale=kB*T, size=N) to generate energy values.

  2. Plot Histograms with Different Bin Sizes:

    • Create histograms with 20, 50, and 100 bins.

    • Compare how bin size affects the appearance of the histogram.

  3. Normalize the Histogram:

    • Convert the histogram into a probability density function (PDF) by ensuring the total area under the histogram sums to 1.

    • Use the density=True option in plt.hist().

  4. Overlay the Theoretical Boltzmann Distribution:

    • Plot the theoretical function P(E)=(1/kBT)eE/kBT on top of the histogram.

  5. Analyze the Results:

    • How does the choice of bin size affect the smoothness of the energy distribution?

    • Why is normalization important in probability distributions?

    • How does increasing the number of particles affect the histogram shape?

Hints:#

  • Use np.random.exponential(scale=kB*T, size=N) to generate data.

  • Use plt.hist(energies, bins=n, density=True, alpha=0.5) to plot normalized histograms.

  • Overlay the theoretical Boltzmann distribution using plt.plot(E, P(E)).