Source From Here
Getting Start
Outline & Motivation (link)
Motivation
* ML/AI has become popular in recent years
* Amazing results: ML can analyze and predict disease on par with human expert
* AlphaGo/Deep Reinforcement Learning beat world champion at strategy game Go
* Self-driving cars -> will remove element of human error
* Google announced they are "machine learning first"
* ML is embedded into many different types of products in many industries
* Will open up a wide array of career opportunities
Outline
* Bias-variance trade-off
* Bootstrap
* Bagging (applying bootstrap to ML models)
* Random forest
* AdaBoost
Where to get the Code and Data (link)
* Github Location
All Data is the Same (link)
Plug-and-Play (link)
* A lot of people ask "Why is there so much math in ML?"
* Sorry to burst your bubble: Machine learning IS math
* NOT plug-and-play into Scikit-learn
* At your real job, you will probably plug-and-play all the time
* But to be a good data scientist, this would be in addition to learning how the algorithms work
* To understand why ensembles are good for plug-and-play, you need theory from this course.
Bias-Variance Trade-Off
Bias-Variance Key Terms (link)
Irreducible error
* Data-generating processes are noisy
* Noise is by definition random (not deterministic)
* Can't predict its values, only its statistics (like mean & variance)
Bias
* Bias refers to the delta between your average model and the true f(x)
* Some sources refer to the square of this as bias, we won't: bias = E[f(x) - f_hat(x)]
Variance
* Has nothing to do with accuracy
* Variance just measures how "inconsistent" a predictor is, over different training sets
* Remember: goal is not to achieve lowest possible error
* Goal is to find true f(x)
* Being close to training points is only a proxy solution
Model complexity
* You might assume linear modes are not complex because nonlinear models are more "expressive"
* Linear doesn't necessarily mean not complex
* Large D linear model can be more complex than small D nonlinear model
* "Complexity" not a universal measurement
Bias-Variance Trade-Off (link)
* In ML we strive to minimize error
* We've already seen the best we can do is the irreducible error
* We can achieve this when we know the true f(x)
* In this case the reducible part of the errors is 0
* Goal is to make bias and variance as small as possible!
* Is it possible to achieve lower bias and lower variance at the same time?
* Trade-off occurs in the context of altering the complexity of the same model
* What if we combine models?
Bias-Variance Decomposition (link)
Expected eror = bias^2 +variance + irreducible error
Polynomial Regression Demo (link)
Here use simple sample code (bias_variance_demo.py) to show the Bias-Variance trade-off by using different dimension of Poly features of linear regression. First is the few selection of result from different degree:
Below show the tendency of "Bias" and "Variance" while increasing the degree:
(degree up->bias down, variance up)
Finally, the optimal value of degree will locate at the bottom of testing error line:
K-Nearest Neighbor and Decision Tree Demo (link)
This part will use sample code (knn_dt_demo.py) to demonstrate the outlook on situations "Low bias & High variance" and "High Bias & Low variance" among decision tree and K-means result. Firstly, let's take a look on regression task:
Then is the classification task:
Cross-Validation as a Method for Optimizing Model Complexity (link)
Cross-Validation
* Cross-validation can help us to optimize the bias-variance trade-off
* We've already looked at cross-validation as a way of choosing hyperparameters.
* Motivation: we didn't just want good training error, we wanted good generalization error too.
* In polynomial regression example, we saw that test error coincides with sum of bias^2 + variance.
* So by optimizing test error, we optimize bias-variance as well
Sample code of K-Fold Cross Validation:
- scores = []
- sz = N / K # N is the number of records in dataset
- for i in range(K):
- Xvalid, Yvalid = X[i*sz: (i+1)*sz], Y[i*sz: (i+1)*sz]
- Xtrain, Ytrain = np.concatenate((X[0:i*sz], X[(i+1)*sz:N]), axis=0), np.concatenate((Y[0:i*sz], Y[(i+1)*sz:N]), axis=0)
- model.fit(Xtrain, Ytrain)
- scores.append(model.score(Xvalid, Yvalid))
For Scikit-learn (Obtaining predictions by cross-validation):
- from sklearn.model_selection import cross_val_predict
- predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
- metrics.accuracy_score(iris.target, predicted)
Supplement
* ML In Action - Improving classification with the AdaBoost meta-algorithm
* Bias, Variance, and Overfitting – Machine Learning Overview part 4 of 4
* Scikit- learn - Selecting the best model in scikit-learn using cross-validation
* Intro2ML - Ch6. Model Evaluation and Improvement - Cross Validation
沒有留言:
張貼留言