AtomLearn
DashboardGoalsGraphAchievementsReviewSign In
DashboardMathematics for Data ScienceVariance & Standard Deviation
Mathematics for Data ScienceNot Started

Variance & Standard Deviation

How spread out is your data? The two core measures of dispersion

0%

Knowledge Debt detected

You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.

Explanation

Variance and standard deviation measure how spread out values are from the mean.

Variance — average of squared differences from the mean: `` data = [2, 4, 4, 4, 5, 5, 7, 9], mean = 5 differences²: (2-5)²=9, (4-5)²=1, (4-5)²=1, (4-5)²=1, (5-5)²=0, (5-5)²=0, (7-5)²=4, (9-5)²=16 variance = (9+1+1+1+0+0+4+16) / 8 = 4.0

Standard Deviation — square root of variance (same units as data): `` std = √4.0 = 2.0

Why this matters: A small std means values cluster tightly around the mean. A large std means they are spread far apart. This is critical in ML — features with very different scales need to be normalized.

Population vs Sample: Divide by N for population variance, by (N-1) for sample variance (Bessel's correction).

Examples

NumPy makes this easy

ddof=1 uses N-1 for sample std

import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]
print(np.var(data))    # 4.0   (population)
print(np.std(data))    # 2.0   (population)
print(np.std(data, ddof=1))  # 2.138 (sample)

Next in Mathematics for Data Science

What is a Distribution?

Continue