Variance & Standard Deviation
How spread out is your data? The two core measures of dispersion
Knowledge Debt detected
You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.
Explanation
Variance and standard deviation measure how spread out values are from the mean.
Variance — average of squared differences from the mean: `` data = [2, 4, 4, 4, 5, 5, 7, 9], mean = 5 differences²: (2-5)²=9, (4-5)²=1, (4-5)²=1, (4-5)²=1, (5-5)²=0, (5-5)²=0, (7-5)²=4, (9-5)²=16 variance = (9+1+1+1+0+0+4+16) / 8 = 4.0
Standard Deviation — square root of variance (same units as data): `` std = √4.0 = 2.0
Why this matters: A small std means values cluster tightly around the mean. A large std means they are spread far apart. This is critical in ML — features with very different scales need to be normalized.
Population vs Sample: Divide by N for population variance, by (N-1) for sample variance (Bessel's correction).
Examples
NumPy makes this easy
ddof=1 uses N-1 for sample std
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
print(np.var(data)) # 4.0 (population)
print(np.std(data)) # 2.0 (population)
print(np.std(data, ddof=1)) # 2.138 (sample)Next in Mathematics for Data Science
What is a Distribution?