Math — Probability Distributions
A progressive study of four fundamental probability distributions implemented as Python classes — binomial, normal (Gaussian), Poisson, and exponential — with parameter estimation from data and PMF/PDF/CDF computation.
Learning Objectives
| # | Concept |
|---|---|
| 1 | Implement the binomial distribution: Bernoulli trials, and parameters |
| 2 | Implement the normal (Gaussian) distribution: mean , standard deviation |
| 3 | Implement the Poisson distribution: rate parameter , counting processes |
| 4 | Implement the exponential distribution: rate parameter , waiting times |
| 5 | Estimate distribution parameters from data using the method of moments |
| 6 | Compute PMF (probability mass function) for discrete distributions |
| 7 | Compute PDF (probability density function) for continuous distributions |
| 8 | Compute CDF (cumulative distribution function) for all four distributions |
| 9 | Convert between z-scores and x-values on the normal curve |
Task-by-Task Reference
Each task below highlights the unique challenge it posed and the new technique introduced — techniques from earlier tasks are not repeated.
Task 0 — Binomial Distribution (binomial.py)
Challenge: Model the number of successes in independent Bernoulli trials, each with probability — implementing the binomial PMF from scratch using combinatorial formulas.
Approach: The constructor accepts either explicit and or estimates them from data. From data, compute the mean, then variance, then solve for and . The PMF computes using iterative factorial accumulation to avoid overflow.
New techniques introduced:
| Technique | Purpose |
|---|---|
| Method of moments estimation | Estimate and from sample mean and variance |
| Iterative binomial coefficient | Compute without factorials via product |
round() vs int() for parameter estimation | Round to nearest integer (not truncate) |
self.p = float(p), self.n = int(n) | Explicit type casting for distribution parameters |
Key takeaway: The binomial distribution models "number of successes in trials." Parameters can be estimated from data: , then .
Task 1 — Normal Distribution (normal.py)
Challenge: Model the bell-shaped Gaussian distribution and compute probabilities on it — implementing the PDF formula from scratch.
Approach: Store (mean) and (stddev) as floats. Provide z_score(x) to convert x-values to z-scores, x_value(z) to convert back, pdf(x) for the density, and cdf(x) using the error function approximation. Class constants e and pi are hardcoded for precision control.
New techniques introduced:
| Technique | Purpose |
|---|---|
z = (x - mean) / stddev | Standardize a value to z-score (number of stddevs from mean) |
x = stddev * z + mean | Reverse standardization: z-score back to raw value |
| Gaussian PDF formula | |
| CDF via error function approximation | Compute cumulative probability using polynomial approximation of erf |
Class constants e and pi | Pre-defined mathematical constants at module level |
Key takeaway: The normal distribution is defined by (center) and (spread). Z-scores standardize any normal to . The CDF answers "what's the probability of being below x?"
Task 2 — Poisson Distribution (poisson.py)
Challenge: Model the number of events occurring in a fixed interval — implementing the Poisson PMF and CDF as a sum.
Approach: The rate parameter (lambtha) is either given or estimated as the sample mean. The PMF computes using iterative factorial accumulation. The CDF sums PMF values from to using the same iterative factorial approach for efficiency.
New techniques introduced:
| Technique | Purpose |
|---|---|
| Estimate Poisson rate as the arithmetic mean of the data | |
| Iterative accumulation | Compute factorial incrementally to avoid recomputation |
| CDF = | Cumulative probability is the sum of individual PMF values |
Key takeaway: The Poisson distribution models count data — "how many events in a fixed interval?" is both the mean AND the variance. The PMF uses as the base probability of zero events.
Task 3 — Exponential Distribution (exponential.py)
Challenge: Model the waiting time between events in a Poisson process — implementing the exponential PDF and CDF .
Approach: The rate is either given or estimated as of the data (the reciprocal of the sample mean). The PDF computes directly. The CDF uses — a simple closed form, unlike the Poisson which requires summation.
New techniques introduced:
| Technique | Purpose |
|---|---|
| Estimate exponential rate as reciprocal of sample mean | |
| Exponential PDF: | Memoryless continuous distribution for waiting times |
| Exponential CDF: | Closed-form cumulative probability — no summation needed |
Key takeaway: The exponential distribution is the continuous counterpart to the discrete Poisson. It models waiting times with the "memoryless" property: . The rate is the inverse of the expected waiting time.
Technique Inventory
| Task | New technique summarized | Category |
|---|---|---|
| 0 | Binomial PMF, method of moments for and | Discrete Distributions |
| 1 | Gaussian PDF/CDF, z-score standardization, erf approximation | Continuous Distributions |
| 2 | Poisson PMF/CDF, as rate, iterative factorial summation | Discrete Distributions |
| 3 | Exponential PDF/CDF, , memoryless property | Continuous Distributions |