Section 7.2 Interpreting Linear Regression (R2)
Subsection 7.2.1 Utilizing Regression Results
Remark 7.2.1.
When modeling the relationship between two variables with a function (such as a linear function), the input (typically
The idea is that one can measure the change in
Exploration 7.2.1. Possums Again.
Recall from Exploration 7.1.1 and Activity 7.1.7 that the βhead length of possums in mmβ (response variable
with correlation coefficient
Run the following cell to re-compute these values:
Click here to learn more about this data set:xxxxxxxxxx
possum = read.csv("https://github.com/TienChih/tbil-stats/raw/main/data/possum.csv")
possummod=lm(head_l~skull_w, data=possum)
print(possummod)
print(cor(possum$skull_w, possum$head_l))
plot(possum$skull_w, possum$head_l, pch=19)
abline(possummod, col="red")
https://www.openintro.org/data/index.php?data=possum
(a)
How would you describe the relationship between skull width and head length? (Strong, Moderate or Weak? Positive, Negative or None?)
(b)
What proportion of the head length is explained by skull width? (See Remark 7.1.20)
(c)
For a possum with skull width 53mm, what is the predicted head length? (Is skull width the explanatory or response variable?)
(d)
For a possum with skull width 62mm, what is the predicted head length? (Is skull width the explanatory or response variable?)
(e)
For a possum with head length 92mm, what skull width predicts this head length? (Is head length the explanatory or response variable?)
(f)
Given that the slope
(g)
If the skull width of a possum is increased by 1mm, what is the predicted change in head length?
Remark 7.2.2.
Since the slope of a line
and each increase in
Activity 7.2.2. Car Dealership.
A car salesmen analysis the cars he sold in the past year, how much he sold them for, and their age.
(a)
Letting βCar Ageβ be the explanatory variable and βSell Priceβ the response variable, enter the data into the columns x_1
, y_1
:
(You may have to rescale the window to see the data clearly)
(b)
What is
(c)
What proportion of the weight is explained by height? (See Remark 7.1.20)
(d)
State what the regression line is, and what it measures in the context of this problem.
(e)
Explain the meaning of the slope of the regression line in the context of this problem.
(f)
What is the predicted price of a 6 year old car?
(g)
What is the predicted age of a car that sells fpr $5000?
Activity 7.2.3. Medial Temporal Lobe.
In 2018, a study was done to show a relationship between sedentary behavior and thickness of the medial temporal lobe.
Run the following code to download the mtl.csv
data set which contains information about 35 participants, demographic and psychological information, physical activity and measurements of the MTL:
Click here to learn more about this data set:xxxxxxxxxx
mtl = read.csv("https://github.com/TienChih/tbil-stats/raw/main/data/mtl.csv")
β
names(mtl)
https://www.openintro.org/data/index.php?data=mtl
(a)
Run the following code to plot the total thickness of the MTL (in mm), versus their self reported daily time sitting (in hours).
xxxxxxxxxx
plot(mtl$sitting, mtl$total, pch=19)
(b)
Run the following code to create a linear model for total
, as a function of sitting
and save it as mtlmod
.
xxxxxxxxxx
mtlmod=lm(total~sitting, data=mtl)
(c)
Run the following code to show the correlation for this model.
xxxxxxxxxx
cor(mtl$sitting, mtl$total)
(d)
Run the following code to plot the scatterplot and the least squares line.
xxxxxxxxxx
plot(mtl$sitting, mtl$total, pch=19)
abline(mtlmod, col="red")
(e)
Run the following code to show a summary of mtlmod
.
xxxxxxxxxx
summary(mtlmod)
(f)
What is
(g)
What proportion of the MLT thickness is explained by hours sitting? (See Remark 7.1.20)
(h)
State what the regression line is, and what it measures in the context of this problem.
(i)
Explain the meaning of the slope of the regression line in the context of this problem.
(j)
What is the predicted total MTL thickness of a someone sitting 12 hours a day?
(k)
What is the predicted time sitting a day for someone whose total MTL thickness is 2.5 mm?
Subsection 7.2.2 Pitfalls
Remark 7.2.3.
As mentioned in Remark 1.1.7, Correlation is Not Causation! Sometimes what one thinks is the explanatory and response variables are reversed. Then sometimes two things can have strong correlation without either causing the other.
Activity 7.2.4. Icecream and Sunglasses.
A store owner notices thst when her sunglasses sales go up, so does her icecream sales:
Depicted above is a plot of different months, the number of sunglasses she sold, and her sales of icecream in dollars in the same month, as well as the regression analysis.
(a)
If she has surplus of icecream she needs to sell, would it make sense to put a sale on sunglasses to boost sunglasses sales? What about reversing the roles?
(b)
Why are these variables correlated?
Remark 7.2.4.
A linear function is generally defined for all possible values of

It's important to know for what values it's sensible to apply the linear model, and for what values it is not.
Activity 7.2.5. Height and Age.
The height of a female child (in inches) of a given age (in years) is as follows:
(a)
According this the regression analysis, how tall will she be when she is 35?
(b)
Why isn't this sensible?
Remark 7.2.5.
As is a theme in other chapters, sometimes random chance delivers data that's correlated, while the underlying variables are not, particularly if the samples are small.
Activity 7.2.6. Random Correlation.
In this activity, we'll generate totally random data and try to find correlations between them.
(a)
Run the following code to generate n=10
random X
and Y
values, plot them, find a regression line and print a correlation:
Run it a few times, what sort of values can you get, what's the highest you achieved?xxxxxxxxxx
n=10
X=runif(n, 0, 10)
Y=runif(n, 0, 10)
mod=lm(Y~X)
print(cor(X, Y))
plot(X, Y, pch=19)
abline(mod)
(b)
Chance n=2
and run it again, what do you notice?