What is R-Squared

R2 measures the proportion of the variance for a target that can be explained by the input (or features). Often used on regression model. Also known as coefficient of determination

R2 = [ Variance(mean) - Variance(best-model) ] / Variance(mean)

or :

R2 = 1 - [ Variance(mean) - Variance(best-model) ]

There are a few steps to calculate the R2 of a regression model:

  1. Fit a line to the data, the best model of linear regression
  2. Calculate variance(mean), which is the sum of squared-error between target and mean (μ)
  3. Calculate variance(best-model), which is the sum of squared-error between target and the predicted value using the best-model

R2 ranges from 0 to 1, because the Variance(best-model) should not perform worse than the mean.

Interpretation

R2 = 0 means the regression model performs the same as average.

R2 = 0.9 means 90% of the variance of target can be explained by the model. That’s pretty good!!

There are a few other points I read from different blogs that I don’t quite understand. To be claried later:

  • R2 works well with single variable regression model, but not for multi-variables regression. Is it true? link
  • what is adjusted R2? link
  • high R2 not always mean a good fit, likewise low R2 is not always bad. Is it true? Why?

Reference: R-squared clearly explained by StatQuest