AtomLearn
DashboardGoalsGraphAchievementsReviewSign In
DashboardMachine Learning BasicsOverfitting vs Underfitting
Machine Learning BasicsNot Started

Overfitting vs Underfitting

The two ways a model can fail — and how to diagnose each

0%

Knowledge Debt detected

You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.

Explanation

Overfitting — model learns the training data too well, including noise. Fails on new data. - Train accuracy: high - Test accuracy: much lower - Cause: model too complex, too little data, too many features

Underfitting — model is too simple to capture the underlying pattern. - Train accuracy: low - Test accuracy: also low - Cause: model too simple, not enough features, insufficient training

The bias-variance tradeoff:

  • High bias = underfitting (model makes strong assumptions, too simple)
  • High variance = overfitting (model is too sensitive to training data)

How to fix overfitting:

  • Get more training data
  • Reduce model complexity
  • Regularization (adds penalty for complexity)
  • Dropout (neural networks)
  • Feature selection (remove irrelevant features)

How to fix underfitting:

  • Use a more complex model
  • Add more relevant features
  • Train longer (for neural networks)
  • Reduce regularization

Examples

Diagnosing with learning curves

learning_curve() plots accuracy as training size grows

from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np

# Overfitting: train score much higher than val score
# Underfitting: both scores are low

# Rule of thumb:
# train_score - val_score > 0.1 → investigate overfitting
# val_score < 0.7 for a "should be easy" problem → underfitting

Next in Machine Learning Basics

Supervised vs Unsupervised Learning

Continue