程式扎記

Source From Here
Preface

Agenda
* What is the K-nearest neighbors classification model?
* What are the four steps for model training and prediction in scikit-learn?
* How can I apply this pattern to other machine learning models?

Reviewing the iris dataset
* 150 observationss
* 4 features (sepal length, sepal width, petal length, petal width)
* Response variable is the iris species
* Classification problem since response is categorical

How To Use Scikit-learn to train model
Loading the Data

>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> type(iris)

>>> X = iris.data # Store feature matrix in 'X'
>>> y = iris.target # Store response vector in 'y'
>>> print X.shape
(150, 4)
>>> print y.shape
(150,)

scikit-learn 4-step modeling pattern
Step1: Import the class you plan to use

>>> from sklearn.neighbors import KNeighborsClassifier

Step2: "Instantiate" the "estimator" (Here is KNeighborsClassifier)

// n_neighbors: Number of neighbors to use by default for k_neighbors queries.
>>> knn = KNeighborsClassifier(n_neighbors=1)
>>> knn
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=1, p=2,
weights='uniform')

Step3: Fit the model with data (aka "Model training")
* Model is learning the relationship between X and y
* Occurs in-place

>>> knn.fit(X, y)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=1, p=2,
weights='uniform')

Step4: Predict the response for a new observation
* New observations are called "out-of-sample" data
* Use the information it learned during the model training process

>>> knn.predict([3, 5, 4, 2])
...
array([2])

* Return a NumPy array
* Can predict for multiple observations at once

>>> X_new = [[3,5,4,2], [5,4,3,2]]
>>> knn.predict(X_new)
array([2, 1])

Using a different value for K

>>> knn = KNeighborsClassifier(n_neighbors=5)
>>> knn.fit(X, y)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=5, p=2,
weights='uniform')
>>> knn.predict(X_new)
array([1, 1])

Using a different classification model
Consistent APIs make you easily to use other model relatively easily. Below will use LogisticRegression instead:

>>> from sklearn.linear_model import LogisticRegression
>>> logreg = LogisticRegression() # Instaniate the model
>>> logreg.fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
>>> logreg.predict(X_new)
array([2, 0])

Supplement
* Previous section - Getting started in scikit-learn with the famous iris dataset
* Next section - Comparing machine learning models in scikit-learn
* Supervised Learning - 1.6 Nearest Neighbors
* 1.1.11. Logistic regression
* In-depth introduction to machine learning in 15 hours of expert videos

程式扎記

標籤

2016年12月14日星期三

[ Scikit- learn ] Training a machine learning model with scikit-learn

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2016年12月14日 星期三