When fitting a model to noisy data (ALWAYS), we make the assumption that the data have been generated from some “TRUE” model by making predictions at given values of the inputs, then adding some amount of noise to each point, where the noise is drawn from a normal distribution with an unknown variance.
Our task is to discover both this model and the width of the noise distribution. In doing so, we aim for a compromise between bias, where our model does not follow the right trend in the data (and so does not match well with the underlying truth), and variance, where our model fits the data points too closely, fitting the noise rather than trying to capture the true distribution. These two extremes are known as underfitting and overfitting.
Overfitting and Underfitting :
The number of parameters in a model; the higher, the more complexly the model can fit the data. If the number of parameters in our model is larger than that the “true one”, then we risk overfitting, and if our model contains fewer parameters than the truth, we could underfit.
The illustration below shows how increasing the number of parameters in the model can result in overfitting. The 9 data points are generated from a cubic polynomialwhich contains 4 parameters (the true model) and adding noise. We can see that by selecting candidate models containing more parameters than the truth, we can reduce, and even eliminate, any mismatch between the data points and our model. This occurs when the number of parameters is the same as the number of data points (an 8th order polynomial has 9 parameters) :