Overfitting vs Underfitting

The two ways a model can fail — and how to diagnose each

Knowledge0%

Learn & Drill

Fluency0%

Drill & Speed

Retention0%

Mastery & Review

Confidence0%

All modes

Practice

Knowledge

Fluency

Retention

Knowledge Debt detected

You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.

Train/Test Split0%

Explanation

Overfitting — model learns the training data too well, including noise. Fails on new data. - Train accuracy: high - Test accuracy: much lower - Cause: model too complex, too little data, too many features

Underfitting — model is too simple to capture the underlying pattern. - Train accuracy: low - Test accuracy: also low - Cause: model too simple, not enough features, insufficient training

The bias-variance tradeoff:

High bias = underfitting (model makes strong assumptions, too simple)
High variance = overfitting (model is too sensitive to training data)

How to fix overfitting:

Get more training data
Reduce model complexity
Regularization (adds penalty for complexity)
Dropout (neural networks)
Feature selection (remove irrelevant features)

How to fix underfitting:

Use a more complex model
Add more relevant features
Train longer (for neural networks)
Reduce regularization

Examples

Diagnosing with learning curves

learning_curve() plots accuracy as training size grows

from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np

# Overfitting: train score much higher than val score
# Underfitting: both scores are low

# Rule of thumb:
# train_score - val_score > 0.1 → investigate overfitting
# val_score < 0.7 for a "should be easy" problem → underfitting

How well did you understand this?

Next in Machine Learning Basics

Supervised vs Unsupervised Learning

Continue

Unlocks

Supervised vs Unsupervised Learning