Source From Here
PrefaceBefore we dive in, let’s start with a quick definition. In machine learning, a hyperparameter (sometimes called a tuning or training parameter) is defined as any parameter whose value is set/chosen at the onset of the learning process. Whereas other parameter values are computed during training.
In this blog, we will discuss some of the important hyperparameters involved in the following machine learning classifiers: K-Nearest Neighbors, Decision Trees and Random Forests, AdaBoost and Gradient Boost, and Support Vector Machines. Specifically, I will focus on the hyperparameters that tend to have the greatest effect on the bias-variance tradeoff. Please note that the below lists are by no means meant to be exhaustive, and I encourage everyone to research each parameter on their own.
This blog assumes a basic understanding of each classifier, so we will skip a theory overview and mostly dive right into the tuning process utilizing Scikit Learn.
K-Nearest Neighbors (KNN)
In the KNN classifier (documentation), a data point is labeled based on its proximity to its neighbors. But how many neighbors should be considered in the classification?
N_neighbors (K)
Simply put, K is the number of neighbors that defines an unlabeled datapoint’s classification boundary.
At K=3, the black datapoint is labeled as red since red has blue outnumbered 2:1
K takes in a range of integers (default = 5), finds the K-nearest neighbors, calculates the distance from each unlabeled point to those K-neighbors. How distance is calculated is defined by the metrics parameter explained below.
Notes.
Bias-Variance Tradeoff: in general, the smaller the K, the tighter the fit (of the model). Therefore, by decreasing K, you are decreasing bias and increasing variance, which leads to a more complex model.
Other Parameters for Consideration
* Leaf_size
* Weights
* Metric
Note: we will skip the algorithm hyperparameter because it is preferred to set this parameter to “auto” so that the computer tells you which tree algorithm is best. I would recommend reading sklearn’s documentation for more information on the differences between the BallTree and KDTree algorithms.
Sample K optimizing code block (assumes F1 is best scoring metric
Decision Trees and Random Forests
When building a Decision Tree (documentation) and/or Random Forest (documentation), there are many important hyperparameters to be considered. Some of these are found in both classifiers while some are just specific to forests. Let’s take a look at the hyperparameters that are most likely to have the largest effect on bias and variance.
* N_estimators
* Max_depth
* Min_samples_split
* Min_samples_leaf
Other Parameters for Consideration
* Criterion
GridSearchCV Optimization for Random Forests
AdaBoost and Gradient Boosting
By utilizing weak learners (aka “stumps”), boosting algos like AdaBoost (documentation) and Gradient Boosting (documentation) focus on what the model misclassifies. By overweighting these misclassified data points, the model focuses on what it got wrong in order to learn how to get them right.
Similar to Decision Trees and Random Forests, we will focus on the bias-variance tradeoff usual suspects.
* N_estimators
Other Important Parameters
* Learning_rate
* Base_estimator/Loss
GridSearchCV Optimization for AdaBoost:
Support Vector Machines (SVM)
This classifier, which is formally defined by a separating hyperplane (let’s take a minute to appreciate how awesome the word hyperplane is), has many tuning parameters to consider, but we will only focus on three: C, Kernel, and Gamma.
* C
* Gamma
A Very Important Hyperparameter
* Kernel
GridSearchCV Optimization for SVM:
Final Word
We have seen multiple was to train a model using sklearn, specifically GridSearchCV. However, it is very, very important to keep in mind the bias-variance tradeoff, as well as the tradeoff between computational costs and scoring metrics. Ideally, we want a model with low bias and low variance to limit overall error, but is it really worth the extra run-time, memory, etc. for only slight improvements? I will let you answer that.
沒有留言:
張貼留言