Skip to main content

Section 7.2 Interpreting Linear Regression (R2)

We've shown how to perform linear regression in Section 7.1. In this section, we look at what the regression is telling us, what information we can draw from this analysis, and what we can't draw.

Subsection 7.2.1 Utilizing Regression Results

Remark 7.2.1.

When modeling the relationship between two variables with a function (such as a linear function), the input (typically \(x\)m values) is called the explanatory variable and the output (typically \(y\) values) is called the response variable. Recall Definition 1.1.6.

The idea is that one can measure the change in \(y\) as a response to changes in \(x\text{.}\)

Exploration 7.2.1. Possums Again.

Recall from Exploration 7.1.1 and Activity 7.1.7 that the “head length of possums in mm” (response variable \(y\)) as a linear function of the “skull width of possums in mm” (explanatory variable \(x\)) has regression line

\begin{equation*} y\approx 0.8158x+46.1954 \end{equation*}

with correlation coefficient \(R\approx 0.7108, R^2\approx 0.5053.\)

Run the following cell to re-compute these values:

Click here to learn more about this data set: https://www.openintro.org/data/index.php?data=possum

(a)

How would you describe the relationship between skull width and head length? (Strong, Moderate or Weak? Positive, Negative or None?)

(b)

What proportion of the head length is explained by skull width? (See Remark 7.1.20)

(c)

For a possum with skull width 53mm, what is the predicted head length? (Is skull width the explanatory or response variable?)

(d)

For a possum with skull width 62mm, what is the predicted head length? (Is skull width the explanatory or response variable?)

(e)

For a possum with head length 92mm, what skull width predicts this head length? (Is head length the explanatory or response variable?)

(f)

Given that the slope \(\beta_1=\frac{\Delta x}{\Delta y}\) what are the units of \(\beta_1\) in this problem?

(g)

If the skull width of a possum is increased by 1mm, what is the predicted change in head length?

Remark 7.2.2.

Since the slope of a line \(\beta_1\) is measured by the change in \(y\) over change in \(x\text{,}\) the units of the slope are:

\begin{equation*} \frac{\text{units of}\ y}{\text{units of}\ x} \end{equation*}

and each increase in \(x\) by one unit results in a change of \(y\) by \(\beta_1\) units.

Activity 7.2.2. Car Dealership.

A car salesmen analysis the cars he sold in the past year, how much he sold them for, and their age.

\begin{equation*} \begin{array}{c|c} \text{Car Age (in years)} \amp \text{Sell Price (in dollars)}\\ \hline 4 \amp 6300\\ 4 \amp 5800\\ 5 \amp 5700\\ 7 \amp 4500\\ 7 \amp 4200\\ 8 \amp 4100\\ 9 \amp 3100\\ 10 \amp 6300\\ 11 \amp 2500\\ 12 \amp 2200\\ \end{array} \end{equation*}
(a)

Letting “Car Age” be the explanatory variable and “Sell Price” the response variable, enter the data into the columns x_1, y_1:

(You may have to rescale the window to see the data clearly)

(b)

What is \(R\text{?}\) What does it tell you about the relationship between the variables? (Strong, Moderate or Weak? Positive, Negative or None?)

(c)

What proportion of the weight is explained by height? (See Remark 7.1.20)

(d)

State what the regression line is, and what it measures in the context of this problem.

(e)

Explain the meaning of the slope of the regression line in the context of this problem.

(f)

What is the predicted price of a 6 year old car?

(g)

What is the predicted age of a car that sells fpr $5000?

Activity 7.2.3. Medial Temporal Lobe.

In 2018, a study was done to show a relationship between sedentary behavior and thickness of the medial temporal lobe.

Run the following code to download the mtl.csv data set which contains information about 35 participants, demographic and psychological information, physical activity and measurements of the MTL:

Click here to learn more about this data set: https://www.openintro.org/data/index.php?data=mtl

(a)

Run the following code to plot the total thickness of the MTL (in mm), versus their self reported daily time sitting (in hours).

(b)

Run the following code to create a linear model for total, as a function of sitting and save it as mtlmod.

(c)

Run the following code to show the correlation for this model.

(d)

Run the following code to plot the scatterplot and the least squares line.

(e)

Run the following code to show a summary of mtlmod.

(f)

What is \(R\text{?}\) What does it tell you about the relationship between the variables? (Strong, Moderate or Weak? Positive, Negative or None?)

(g)

What proportion of the MLT thickness is explained by hours sitting? (See Remark 7.1.20)

(h)

State what the regression line is, and what it measures in the context of this problem.

(i)

Explain the meaning of the slope of the regression line in the context of this problem.

(j)

What is the predicted total MTL thickness of a someone sitting 12 hours a day?

(k)

What is the predicted time sitting a day for someone whose total MTL thickness is 2.5 mm?

Subsection 7.2.2 Pitfalls

Remark 7.2.3.

As mentioned in Remark 1.1.7, Correlation is Not Causation! Sometimes what one thinks is the explanatory and response variables are reversed. Then sometimes two things can have strong correlation without either causing the other.

Activity 7.2.4. Icecream and Sunglasses.

A store owner notices thst when her sunglasses sales go up, so does her icecream sales:

Depicted above is a plot of different months, the number of sunglasses she sold, and her sales of icecream in dollars in the same month, as well as the regression analysis.

(a)

If she has surplus of icecream she needs to sell, would it make sense to put a sale on sunglasses to boost sunglasses sales? What about reversing the roles?

(b)

Why are these variables correlated?

Remark 7.2.4.

A linear function is generally defined for all possible values of \(x\text{,}\) but in the context of some problem, this may not be sensible

An XKCD Comic.

It's important to know for what values it's sensible to apply the linear model, and for what values it is not.

Activity 7.2.5. Height and Age.

The height of a female child (in inches) of a given age (in years) is as follows:

(a)

According this the regression analysis, how tall will she be when she is 35?

(b)

Why isn't this sensible?

Remark 7.2.5.

As is a theme in other chapters, sometimes random chance delivers data that's correlated, while the underlying variables are not, particularly if the samples are small.

Activity 7.2.6. Random Correlation.

In this activity, we'll generate totally random data and try to find correlations between them.

(a)

Run the following code to generate n=10 random X and Y values, plot them, find a regression line and print a correlation:

Run it a few times, what sort of values can you get, what's the highest you achieved?

(b)

Chance n=2 and run it again, what do you notice?