Mathematics for Data ScienceNot Started
What is a Distribution?
Understanding how data is spread — the shape behind the numbers
0%
Knowledge Debt detected
You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.
Explanation
A distribution describes how values in a dataset are spread across possible outcomes.
Think of it as: "if I pick a random value from this dataset, how likely is it to be near X?"
Key shapes:
- Symmetric / Bell-shaped — values cluster around the center equally on both sides
- Right-skewed (positive skew) — tail extends to the right; mean > median (e.g. income)
- Left-skewed (negative skew) — tail extends to the left; mean < median (e.g. exam scores where most score high)
- Uniform — all values equally likely (e.g. dice roll)
- Bimodal — two peaks (e.g. heights of a mixed male/female group)
Skewness rule of thumb:
- Mean > Median → right-skewed
- Mean < Median → left-skewed
- Mean ≈ Median → symmetric
Examples
Visualizing skew with a histogram
exponential distribution mimics income data
import matplotlib.pyplot as plt
import numpy as np
# Right-skewed: income-like data
data = np.random.exponential(scale=2, size=1000)
plt.hist(data, bins=50)
plt.title('Right-skewed distribution')
plt.show()Next in Mathematics for Data Science
The Normal Distribution