What is underfitting? How to detect underfitting? Explain the way to solve the underfitting
- The underfitting means model has a low accuracy score on both training data and test data. An underfit model fails to significantly grasp the relationship between the input values and target variables.
- Underfitting happens when the algorithm used to build a prediction model is very simple and not able to learn the complex patterns from the training data. In that case, accuracy will be low on seen training data as well as unseen test data. Generally, It happens with Linear Algorithms.
- A underfit model makes incorrect assumptions about the dataset to make the target function easier to learn. If training data distribution is nonlinear and you apply the linear algorithm to build the prediction model, in that case, would not be able to learn the nonlinear relationship between the target value and features, and model accuracy will suffer.
- Underfit Model is very simple because of the assumptions made about the data, pays very little attention to the training data, and oversimplifies the model. The high Bias model always leads to a high error on training as well as on test data.
- For example, as shown in the figure below, the model is trained to classify between the circles and crosses. However, it is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.
Detection of underfitting model:
The model may underfit the data, but it is necessary to know when it does so. The following steps are the checks that are used to determine if the model is underfitting or not.
1. Training and Validation Loss: During training and validation, it is important to check the loss that is generated by the model. If the model is underfitting, the loss for both training and validation will be significantly high.
2 Over Simplistic Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is over-simplistic, then the model is suffering from underfitting. A more complex model is to be tried out. A lot of classes will be misclassified in the training set as well as the validation set. On data visualization, the graph would indicate that if there was a more complex model, more classes would have been correctly classified.
Fix for an underfitting model: If the model is underfitting, the developer can take the following steps to recover from the underfitting state:
1. Train Longer: Since underfitting means less model complexity, training longer can help in learning more complex patterns.
2. Train a more complex model: The main reason behind the model to underfit is using a model of lesser complexity than required for the data. Hence, the most obvious fix is to use a more complex model.
3. Obtain more features: If the data set lacks enough features to get a clear inference, then Feature Engineering or collecting more features will help fit the data better.
4. Decrease Regularization: Regularization is the process that helps Generalize the model by avoiding overfitting. However, if the model is learning less or underfitting, then it is better to decrease or completely remove Regularization techniques so that the model can learn better.
Comments
Post a Comment