* How do I use the pandas library to read data into Python
* How do I use the seaborn library to visualize data?
* What is linear regression, and how does it work?
* What are some evaluation metrics for regression problems?
* How do I choose which features to include in my model?
Types of supervised learning
Reading data using pandas
Pandas: Popular Python library for data exploration, manipulation, and analysis. (Installation guide)
Primary object types:
What are the features?
What is response?
What else do we know?
Visualizing data using seaborn
Seaborn: Python library for statistic data visualization built on top of matplotlib
Preparing X and y using pandas
Splitting X and y into training and testing sets
Linear regression in scikit-learn
Interpreting model coefficients
How do we interpret the TV coefficient (0.0466)?
Model evaluation metrics for regression
Evaluation metrics for classification problems, such as accuracy, are not useful for regression problems. Instead, we need evaluation metrics designed for comparing continuous values. Let's create some example numeric predictions, and calculate three common evaluation metrics for regression problems:
Comparing these metrics:
Computing the RMSE for our Sales predictions
Does Newspaper "belong" in our model? In other words, does it improve the quality of our predictions? Let's remove it from the model and check the RMSE:
The RMSE decreased when we remove Newspaper from the model. (Error is something we want minimize, so a lower number for RMSE is better.) Thus, it is unlikely that this feature is useful for predicting Sales, and should be removed from the model.
* Prev - Comparing machine learning models in scikit-learn
* Next - Selecting the best model in scikit-learn using cross-validation