Instructor Notes
This is a placeholder file. Please add content here.
Introduction
Supervised methods - Regression
Instructor Note
Francis Anscombe was a statistician who created a set of datasets that all have the same basic statistical properties, such as means and standard deviations, but which look very different when plotted. This is a great example of how statistics can be misleading, and why visualising data is important.
Instructor Note
The Mean Squared Error (MSE) is a common metric used to measure the quality of a regression model. It calculates the average of the squares of the errors, which are the differences between predicted and actual values. The RMSE is simply the square root of the MSE, and it provides a measure of how far off our predictions are from the actual values in the same units as our labels.
Why not Mean Absolute Error? or Mean Squared Error?
RMSE more heavily penalises large errors, and because we square and then take the root, the value is in the same units as our data, which makes it easier to interpret. It also has nice mathematical properties that make it easier to work with in some cases.
Instructor Note
PolynomialFeatures is a class in Scikit-Learn that
generates polynomial features. We specify a degree, then “fit_transform”
our data to create a new feature set that includes all polynomial
combinations, so for example PolynomialFeatures(degree=2)
will create a new feature set that includes a constant term, the
original feature, and the square of the original feature.