Overfitting vs Underfitting
The two ways a model can fail — and how to diagnose each
Knowledge Debt detected
You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.
Explanation
Overfitting — model learns the training data too well, including noise. Fails on new data. - Train accuracy: high - Test accuracy: much lower - Cause: model too complex, too little data, too many features
Underfitting — model is too simple to capture the underlying pattern. - Train accuracy: low - Test accuracy: also low - Cause: model too simple, not enough features, insufficient training
The bias-variance tradeoff:
- High bias = underfitting (model makes strong assumptions, too simple)
- High variance = overfitting (model is too sensitive to training data)
How to fix overfitting:
- Get more training data
- Reduce model complexity
- Regularization (adds penalty for complexity)
- Dropout (neural networks)
- Feature selection (remove irrelevant features)
How to fix underfitting:
- Use a more complex model
- Add more relevant features
- Train longer (for neural networks)
- Reduce regularization
Examples
Diagnosing with learning curves
learning_curve() plots accuracy as training size grows
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np
# Overfitting: train score much higher than val score
# Underfitting: both scores are low
# Rule of thumb:
# train_score - val_score > 0.1 → investigate overfitting
# val_score < 0.7 for a "should be easy" problem → underfittingNext in Machine Learning Basics
Supervised vs Unsupervised Learning