2017年8月1日 星期二

[Udemy] Ensemble Machine Learning in Python: Random Forest, AdaBoost - Part1

Source From Here 
Getting Start 

Outline & Motivation (link
Motivation 
* ML/AI has become popular in recent years
* Amazing results: ML can analyze and predict disease on par with human expert
* AlphaGo/Deep Reinforcement Learning beat world champion at strategy game Go
* Self-driving cars -> will remove element of human error
* Google announced they are "machine learning first"
* ML is embedded into many different types of products in many industries
* Will open up a wide array of career opportunities

Outline 
* Bias-variance trade-off
* Bootstrap
* Bagging (applying bootstrap to ML models)
* Random forest
* AdaBoost


Where to get the Code and Data (link
Github Location 

All Data is the Same (link

Plug-and-Play (link
* A lot of people ask "Why is there so much math in ML?"
* Sorry to burst your bubble: Machine learning IS math
* NOT plug-and-play into Scikit-learn
* At your real job, you will probably plug-and-play all the time
* But to be a good data scientist, this would be in addition to learning how the algorithms work
* To understand why ensembles are good for plug-and-play, you need theory from this course.


Bias-Variance Trade-Off 

Bias-Variance Key Terms (link

Irreducible error 
* Data-generating processes are noisy
* Noise is by definition random (not deterministic)
* Can't predict its values, only its statistics (like mean & variance)

Bias 
* Bias refers to the delta between your average model and the true f(x)
* Some sources refer to the square of this as bias, we won't: bias = E[f(x) - f_hat(x)]


Variance 
* Has nothing to do with accuracy
* Variance just measures how "inconsistent" a predictor is, over different training sets
* Remember: goal is not to achieve lowest possible error
* Goal is to find true f(x)
* Being close to training points is only a proxy solution

Model complexity 
* You might assume linear modes are not complex because nonlinear models are more "expressive"
* Linear doesn't necessarily mean not complex
* Large D linear model can be more complex than small D nonlinear model
* "Complexity" not a universal measurement


Bias-Variance Trade-Off (link
* In ML we strive to minimize error
* We've already seen the best we can do is the irreducible error
* We can achieve this when we know the true f(x)
* In this case the reducible part of the errors is 0
* Goal is to make bias and variance as small as possible!


* Is it possible to achieve lower bias and lower variance at the same time?
* Trade-off occurs in the context of altering the complexity of the same model
What if we combine models?


Bias-Variance Decomposition (link
Expected eror = bias^2 +variance + irreducible error


Polynomial Regression Demo (link
Here use simple sample code (bias_variance_demo.py) to show the Bias-Variance trade-off by using different dimension of Poly features of linear regression. First is the few selection of result from different degree: 


Below show the tendency of "Bias" and "Variance" while increasing the degree: 
 (degree up->bias down, variance up

Finally, the optimal value of degree will locate at the bottom of testing error line: 

K-Nearest Neighbor and Decision Tree Demo (link
This part will use sample code (knn_dt_demo.py) to demonstrate the outlook on situations "Low bias & High variance" and "High Bias & Low variance" among decision tree and K-means result. Firstly, let's take a look on regression task: 


Then is the classification task: 

Cross-Validation as a Method for Optimizing Model Complexity (link
Cross-Validation 
* Cross-validation can help us to optimize the bias-variance trade-off
* We've already looked at cross-validation as a way of choosing hyperparameters.
* Motivation: we didn't just want good training error, we wanted good generalization error too.
* In polynomial regression example, we saw that test error coincides with sum of bias^2 + variance.
* So by optimizing test error, we optimize bias-variance as well

Sample code of K-Fold Cross Validation: 
  1. scores = []  
  2. sz = N / K # N is the number of records in dataset  
  3. for i in range(K):  
  4.     Xvalid, Yvalid = X[i*sz: (i+1)*sz], Y[i*sz: (i+1)*sz]  
  5.     Xtrain, Ytrain = np.concatenate((X[0:i*sz], X[(i+1)*sz:N]), axis=0), np.concatenate((Y[0:i*sz], Y[(i+1)*sz:N]), axis=0)  
  6.     model.fit(Xtrain, Ytrain)  
  7.     scores.append(model.score(Xvalid, Yvalid))  
For Scikit-learn (Obtaining predictions by cross-validation): 
  1. from sklearn.model_selection import cross_val_predict  
  2. predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)  
  3. metrics.accuracy_score(iris.target, predicted)   


Supplement 
ML In Action - Improving classification with the AdaBoost meta-algorithm 
Bias, Variance, and Overfitting – Machine Learning Overview part 4 of 4 
Scikit- learn - Selecting the best model in scikit-learn using cross-validation 
Intro2ML - Ch6. Model Evaluation and Improvement - Cross Validation

沒有留言:

張貼留言

[ Py DS ] Ch5 - Machine Learning (Part2)

Source From  Here   Introducing Scikit-Learn   There are several Python libraries that provide solid implementations of a range of machin...