Instructor Notes

This is a placeholder file. Please add content here.

Francis Anscombe was a statistician who created a set of datasets that all have the same basic statistical properties, such as means and standard deviations, but which look very different when plotted. This is a great example of how statistics can be misleading, and why visualising data is important.

Instructor Note

The Mean Squared Error (MSE) is a common metric used to measure the quality of a regression model. It calculates the average of the squares of the errors, which are the differences between predicted and actual values. The RMSE is simply the square root of the MSE, and it provides a measure of how far off our predictions are from the actual values in the same units as our labels.

Why not Mean Absolute Error? or Mean Squared Error?

RMSE more heavily penalises large errors, and because we square and then take the root, the value is in the same units as our data, which makes it easier to interpret. It also has nice mathematical properties that make it easier to work with in some cases.

Instructor Note

PolynomialFeatures is a class in Scikit-Learn that generates polynomial features. We specify a degree, then “fit_transform” our data to create a new feature set that includes all polynomial combinations, so for example PolynomialFeatures(degree=2) will create a new feature set that includes a constant term, the original feature, and the square of the original feature.

Instructor Notes

Introduction

Supervised methods - Regression

Instructor Note

Instructor Note

Instructor Note

Supervised methods - Classification

Ensemble methods

Unsupervised methods - Clustering

Unsupervised methods - Dimensionality reduction

Neural Networks

Ethics and the Implications of Machine Learning

Find out more