Instructor Notes

This is a placeholder file. Please add content here.

Introduction


Supervised methods - Regression


Instructor Note

Francis Anscombe was a statistician who created a set of datasets that all have the same basic statistical properties, such as means and standard deviations, but which look very different when plotted. This is a great example of how statistics can be misleading, and why visualising data is important.



Instructor Note

The Mean Squared Error (MSE) is a common metric used to measure the quality of a regression model. It calculates the average of the squares of the errors, which are the differences between predicted and actual values. The RMSE is simply the square root of the MSE, and it provides a measure of how far off our predictions are from the actual values in the same units as our labels.

Why not Mean Absolute Error? or Mean Squared Error?

RMSE more heavily penalises large errors, and because we square and then take the root, the value is in the same units as our data, which makes it easier to interpret. It also has nice mathematical properties that make it easier to work with in some cases.



Instructor Note

PolynomialFeatures is a class in Scikit-Learn that generates polynomial features. We specify a degree, then “fit_transform” our data to create a new feature set that includes all polynomial combinations, so for example PolynomialFeatures(degree=2) will create a new feature set that includes a constant term, the original feature, and the square of the original feature.



Supervised methods - Classification


Ensemble methods


Unsupervised methods - Clustering


Unsupervised methods - Dimensionality reduction


Neural Networks


Ethics and the Implications of Machine Learning


Find out more