Numeracy, Maths and Statistics – Academic Skills Kit

Video formula for calculating r

To calculate $R^2$ you need to find the sum of the residuals squared and the total sum of squares.

Start off by finding the residuals, which is the distance from regression line to each data point. Work out the predicted $y$ value by plugging in the corresponding $x$ value into the regression line equation.

  • For the point $(2,2)$

begin{align} hat{y}&=0.143+1.229x &=0.143+(1.229times2) &=0.143+2.458 &=2.601 end{align}

The actual value for $y$ is $2$. begin{align} text{Residual}&=text{actual } y text{ value} – text{predicted }y text{ value} r_1&=y_i-hat{y_i} &=2-2.601 &=-0.601 end{align} As you can see from the graph the actual point is below the regression line, so it makes sense that the residual is negative.

  • For the point $(3,4)$

begin{align} hat{y}&=0.143+1.229x &=0.143+(1.229times3) &=0.143+3.687 &=3.83 end{align}

The actual value for $y$ is $4$.

begin{align} text{Residual}&=text{actual } y text{ value} – text{predicted }y text{ value} r_2&=y_i-hat{y_i} &=4-0.3.83 &=0.17 end{align} As you can see from the graph the actual point is above the regression line, so it makes sense that the residual is positive.

  • For the point $(4,6)$

begin{align} hat{y}&=0.143+1.229x &=0.143+(1.229times4) &=0.143+4.916 &=5.059 end{align}

The actual value for $y$ is $6$.

begin{align} text{Residual}&=text{actual } y text{ value} – text{predicted }y text{ value} r_3&=y_i-hat{y_i} &=6-5.059 &=0.941 end{align}

  • For the point $(6,7)$

begin{align} hat{y}&=0.143+1.229x &=0.143+(1.229times6) &=0.143+7.374 &=7.517 end{align}

The actual value for $y$ is $7$. begin{align} text{Residual}&=text{actual } y text{ value} – text{predicted }y text{ value} r_4&=y_i-hat{y_i} &=7-7.517 &=-0.517 end{align} To find the residuals squared we need to square each of $r_1$ to $r_4$ and sum them.

begin{align} sum({y_i}-hat{y_i})^2&=sum{r_i} &={r_1}^2+{r_2}^2+{r_3}^2+{r_4}^2 &=(−0.601)^2+(0.17)^2+(0.941)^2-(-0.517)^2 &=1.542871 end{align}

To find $sum(y_i-bar{y})^2$ you first need to find the mean of the $y$ values.

begin{align} bar{y}&=frac{sum{y} }{n} &=frac{2+4+6+7}{4} &=frac{19}{4} &=4.75 end{align}

Now we can calculate $sum(y_i-bar{y})^2$.

begin{align} sum(y_i-bar{y})^2&=(2-4.75)^2+(4-4.75)^2+(6-4.75)^2+(7-4.75)^2 &=(-2.75)^2+(-0.75)^2+(1.25)^2+(2.25)^2 &=14.75 end{align}


begin{align} R^2&=1-frac{text{sum squared regression (SSR)} }{text{total sum of squares (SST)} } &=1-frac{sum({y_i}-hat{y_i})^2}{sum(y_i-bar{y})^2} &=1-frac{1.542871}{14.75} &=1-0.105 text{(3.s.f)} &=0.895text{ (3.s.f)} end{align}

This means that the number of lectures per day account for $89.5$% of the variation in the hours people spend at university per day.

An odd property of $R^2$ is that it is increasing with the number of variables. Thus, in the example above, if we added another variable measuring mean height of lecturers, $R^2$ would be no lower and may well, by chance, be greater – even though this is unlikely to be an improvement in the model. To account for this, an adjusted version of the coefficient of determination is sometimes used. For more information, please see [

