R-Squared
What is R-Squared
R2 measures the proportion of the variance for a target that can be explained by the input (or features). Often used on regression model. Also known as coefficient of determination
R2 = [ Variance(mean) - Variance(best-model) ] / Variance(mean)
or :
R2 = 1 - [ Variance(mean) - Variance(best-model) ]
There are a few steps to calculate the R2 of a regression model:
- Fit a line to the data, the best model of linear regression
- Calculate variance(mean), which is the sum of squared-error between target and mean (μ)
- Calculate variance(best-model), which is the sum of squared-error between target and the predicted value using the best-model
R2 ranges from 0 to 1, because the Variance(best-model) should not perform worse than the mean.
Interpretation
R2 = 0 means the regression model performs the same as average.
R2 = 0.9 means 90% of the variance of target can be explained by the model. That’s pretty good!!
There are a few other points I read from different blogs that I don’t quite understand. To be claried later:
- R2 works well with single variable regression model, but not for multi-variables regression. Is it true? link
- what is adjusted R2? link
- high R2 not always mean a good fit, likewise low R2 is not always bad. Is it true? Why?
Reference: R-squared clearly explained by StatQuest