Machine Learning Diagnose：
Diagnosing bias vs Variance
high bias means UNDER FIT high variance means OVER FIT
B => U V => O
##Regularization and Bias/Variance Low lambda => overfit High lambda => underfit
Condition 1： High bias more training data is not likely to help.
Condition 2：High variance more training data is likely to help.
Deciding what to do next revisited
Solution for bias/variance.
|more training sets||fix high variance||More training sets so no overfit|
|Less features||fix high variance||Less parameters so no overfit|
|More features||fix high bias||More parameters so no underfit|
|More polynomial features||fix high bias||More parameters so no under-fit|
|Decreasing||fix high bias||low more parameters|
|Increasing||fix high variance||high less parameters|
Improved model selection
Given a training set instead split into three pieces
- Training set (60%) - m values
- Cross validation (CV) set (20%)mcv
- Test set (20%) mtest
As before, we can calculate:
- Training error
- Cross validation error
- Test error
Using CV to train the degree of polynomial d or lambda:
- The degree of a model will increase as you move towards overfitting
- Lets define training and cross validation error as before
- Now plot
- x = degree of polynomial d
- y = error for both training and cross validation (two lines)
- CV error and test set error will be very similar
«Previous: [BigData-Spark]Classification using Spark.