Machine Learning Diagnose:

Diagnosing bias vs Variance

high bias means UNDER FIT high variance means OVER FIT

B => U V => O



##Regularization and Bias/Variance Low lambda => overfit High lambda => underfit



Learning curves

Condition 1: High bias png more training data is not likely to help.

Condition 2:High variance png more training data is likely to help.

Deciding what to do next revisited

Solution for bias/variance.


Action Result Reason
more training sets fix high variance More training sets so no overfit
Less features fix high variance Less parameters so no overfit
More features fix high bias More parameters so no underfit
More polynomial features fix high bias More parameters so no under-fit
Decreasing fix high bias low more parameters
Increasing fix high variance high less parameters

Improved model selection

Given a training set instead split into three pieces

  1. Training set (60%) - m values
  2. Cross validation (CV) set (20%)mcv
  3. Test set (20%) mtest

As before, we can calculate:

  • Training error
  • Cross validation error
  • Test error

Using CV to train the degree of polynomial d or lambda:

  • The degree of a model will increase as you move towards overfitting
  • Lets define training and cross validation error as before
  • Now plot
    • x = degree of polynomial d
    • y = error for both training and cross validation (two lines)
      • CV error and test set error will be very similar


本文作者Boqiang Hu, 欢迎评论、交流。
转载请务必标注出处: Machine Learning Diagnose