Variance & Standard Deviation

How spread out is your data? The two core measures of dispersion

Knowledge0%

Learn & Drill

Fluency0%

Drill & Speed

Retention0%

Mastery & Review

Confidence0%

All modes

Practice

Knowledge

Fluency

Retention

Knowledge Debt detected

You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.

Mean, Median & Mode0%

Explanation

Variance and standard deviation measure how spread out values are from the mean.

Variance — average of squared differences from the mean: `` data = [2, 4, 4, 4, 5, 5, 7, 9], mean = 5 differences²: (2-5)²=9, (4-5)²=1, (4-5)²=1, (4-5)²=1, (5-5)²=0, (5-5)²=0, (7-5)²=4, (9-5)²=16 variance = (9+1+1+1+0+0+4+16) / 8 = 4.0

Standard Deviation — square root of variance (same units as data): `` std = √4.0 = 2.0

Why this matters: A small std means values cluster tightly around the mean. A large std means they are spread far apart. This is critical in ML — features with very different scales need to be normalized.

Population vs Sample: Divide by N for population variance, by (N-1) for sample variance (Bessel's correction).

Examples

NumPy makes this easy

ddof=1 uses N-1 for sample std

import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]
print(np.var(data))    # 4.0   (population)
print(np.std(data))    # 2.0   (population)
print(np.std(data, ddof=1))  # 2.138 (sample)

How well did you understand this?

Next in Mathematics for Data Science

What is a Distribution?

Continue

Unlocks

What is a Distribution?