Numpy and Probabilities#
Problem: Compute Full, Marginal, and Conditional Probabilities Given \( p(x, y) \)#
Given the joint probability distribution \( p(x, y) \), perform the following tasks:
Check for Normalization:
Verify that the sum of all elements in \( p(x, y) \) equals 1 (i.e., \( \sum_{x,y} p(x, y) = 1 \)).
Compute the Marginal Probability \( p(x) \):
Compute \( p(x) \) by summing over all values of \( y \):
\[ p(x) = \sum_{y} p(x, y) \]
Compute a Specific Joint Probability:
Find the probability of \( x = 1 \) and \( y = 2 \), i.e., \( p(x=1, y=2) \).
Compute the Conditional Probability \( p(x | y=3) \):
Use the definition of conditional probability: $\( p(x | y=3) = \frac{p(x, y=3)}{p(y=3)} \)$
Compute \( p(y=3) \) by summing over \( x \), then use it to compute \( p(x | y=3) \).
import numpy as np
# Given joint probability matrix p(x, y)
pxy = np.array([[0.0888395 , 0.04135306, 0.07692385, 0.04716456],
[0.05191984, 0.04715928, 0.06539948, 0.04739613],
[0.09325686, 0.05106682, 0.05221257, 0.05864018],
[0.08283109, 0.11219392, 0.02191071, 0.06173214]])
Problem: Understanding Covariance and Correlation Using NumPy#
Consider two random variables, \(X\) and \(Y\), representing the heights (in cm) and weights (in kg) of a group of individuals. The relationship between these variables is often analyzed using covariance and correlation.
Generate synthetic data:
Let \(X\) (heights) be normally distributed with mean 170 cm and standard deviation 10 cm.
Let \(Y\) (weights) be generated as a linear function of \(X\) with some added random noise:
\[ Y = 0.5X + 10 + \text{random noise} \]where the random noise follows a normal distribution with mean 0 and standard deviation 5.
Compute the sample covariance between \(X\) and \(Y\) using NumPy.
Compute the Pearson correlation coefficient using NumPy.
Interpretation:
If the covariance is positive, what does it imply?
How does the correlation coefficient relate to the strength of the relationship?
Hints:#
Use
np.random.normal
to generate data.Compute covariance using
np.cov(X, Y)
.Compute correlation using
np.corrcoef(X, Y)
.
Problem: Understanding Histograms, Normalization, and Bin Size in Statistical Mechanics#
Scenario: Energy Distribution of Particles in a System#
In statistical mechanics, the distribution of particle energies in a system at thermal equilibrium follows the Boltzmann distribution. To analyze this, we simulate a system of particles with energies distributed according to:
where:
\( E \) is the energy of a particle,
\( k_B \) is the Boltzmann constant (set to 1 for simplicity),
\( T \) is the temperature (choose a reasonable value, e.g., \( T = 300 \)).
Tasks:#
Generate Energy Data:
Simulate 10,000 particle energies following the Boltzmann distribution using an exponential distribution.
Use NumPy’s
np.random.exponential(scale=kB*T, size=N)
to generate energy values.
Plot Histograms with Different Bin Sizes:
Create histograms with 20, 50, and 100 bins.
Compare how bin size affects the appearance of the histogram.
Normalize the Histogram:
Convert the histogram into a probability density function (PDF) by ensuring the total area under the histogram sums to 1.
Use the
density=True
option inplt.hist()
.
Overlay the Theoretical Boltzmann Distribution:
Plot the theoretical function \( P(E) = (1/k_B T) e^{-E / k_B T} \) on top of the histogram.
Analyze the Results:
How does the choice of bin size affect the smoothness of the energy distribution?
Why is normalization important in probability distributions?
How does increasing the number of particles affect the histogram shape?
Hints:#
Use
np.random.exponential(scale=kB*T, size=N)
to generate data.Use
plt.hist(energies, bins=n, density=True, alpha=0.5)
to plot normalized histograms.Overlay the theoretical Boltzmann distribution using
plt.plot(E, P(E))
.