Section 7.3 Inference and Linear Regression (R3)
Exploration 7.3.1. Estimating Regression.
Let X
and Y
be variables where
(a)
What line probably best fits the relationship between X
and Y
?
(b)
Run the following code to generate n=10
random X
and Y
values, plot them, find a regression line and the line in (a):
Run it a few times, what do you notice?xxxxxxxxxx
n=10
X=runif(n, 0, 10)
Y=0.8*X+5+rnorm(n,0,3)
mod=lm(Y~X)
plot(X, Y, pch=19)
abline(mod)
abline(5, 0.8, col="blue", lty=2)
(c)
Adjust n=100
and run it again, now what do you notice?
Subsection 7.3.1 Hypothesis Testing
Remark 7.3.1.
When it comes to linear regression, we focus on the following pair of hypothesis. :
or the slope is zero. So changes in the explanatory variable do not result in average change in response variable. or the slope is not zero. So changes in the explanatory variable do result in average change in response variable.
This is a numerical hypothesis test similar to what we have done before. However we're not just given a list of slopes to find a sample standard deviation. The computation here is tedious, so we utilize technology to perform them. If one wished to compute the standard error for
where
Exploration 7.3.2. Hypothesis Testing: Slope - Possums.
Run the following code to download possum.csv
as seen in Exploration 7.1.1, Activity 7.1.7, and Exploration 7.2.1, and create a linear model for head_l~skull_w
and summarize it.
We're focused on thexxxxxxxxxx
possum = read.csv("https://github.com/TienChih/tbil-stats/raw/main/data/possum.csv")
possummod=lm(head_l~skull_w, data=possum)
summary(possummod)
Coefficients
, in particular the second row.
The first row starting with (Intercept)
gives the point estimate, standard error, test statistic and
The second row starting with skull_w
gives the point estimate, standard error, test statistic and
(a)
The first entry of this row gives us the point estimate for the slope. How does this compare to slope found by running:
We call this valuexxxxxxxxxx
possummod
(b)
The second entry of this row gives us the standard error for the slope. Call this value
How does this value compare to the third entry?
(c)
According to the summary, how many degrees of freedom are there?
(d)
Compute
Hint. Desmosxxxxxxxxxx
How does this value compare to the 4th entry?
(e)
Do we accept or reject the null hypothesis that
Remark 7.3.2.
The fourth entry of the second Coefficients:
row gives the probability that, if the slope was 0, that we would obtain a slope as steep or steeper.
The fourth entry of the first Coefficients:
row gives the probability that, if the intercept was 0, that we would obtain an as or more extreme.
Subsection 7.3.2 Confidence Intervals
Remark 7.3.3.
For either
where
Exploration 7.3.3. Confidence: Intervals: - Possums.
We continue from Exploration 7.3.2.
(a)
Find a
(b)
Use the point estimate and standard error for
(c)
Explain what this confidence interval means within the context of the problem.
Subsection 7.3.3 Putting it together
Activity 7.3.4. Inference for SP500 companies.
Run the following code to download sp500.csv
a data set comtaining information on a sample of 50 fortune 500 companies and show it's variable names.
Click here to learn more about this data set:xxxxxxxxxx
sp500 = read.csv("https://github.com/TienChih/tbil-stats/raw/main/data/sp500.csv")
names(sp500)
https://www.openintro.org/data/index.php?data=sp500
.(a)
Run the following to create and summarize a linear model with debt
the debt in millions of dollars as the explanatory variable and profit_margin
the percent of earnings that is profit as the response variable:
xxxxxxxxxx
sp500mod=lm(profit_margin~debt, data=sp500)
summary(sp500mod)
(b)
Run the following to plot profit_margin
of these companies against the debt
and draw a regression line:
xxxxxxxxxx
plot(sp500$debt, sp500$profit_margin)
abline(sp500mod, col="red")
(c)
State what the slope
(d)
Interpret the
(e)
Do we reject the null hypothesis that
(f)
Find a
(g)
Use the point estimate and standard error for
(h)
Explain what this confidence interval means within the context of the problem.
Activity 7.3.5. Inference for Nutrition and Starbucks.
Run the following code to download starbucks.csv
a data set comtaining information about 77 Starbucks menu items their nutritional value and show it's variable names.
Click here to learn more about this data set:xxxxxxxxxx
starbucks = read.csv("https://github.com/TienChih/tbil-stats/raw/main/data/starbucks.csv")
names(starbucks)
https://www.openintro.org/data/index.php?data=starbucks
.(a)
Run the following to create and summarize a linear model with protein
the protein content of an item in g as the explanatory variable and calories
the calories of each item measure in, well calories, as the response variable:
xxxxxxxxxx
starbucksmod=lm(calories~protein, data=starbucks)
summary(starbucksmod)
(b)
Run the following to plot calories
of these items against the protein
and draw a regression line:
xxxxxxxxxx
plot(starbucks$protein, starbucks$calories)
abline(starbucksmod, col="red")
(c)
State what the slope
(d)
Interpret the
(e)
Do we reject the null hypothesis that
(f)
Find a
(g)
Use the point estimate and standard error for
(h)
Explain what this confidence interval means within the context of the problem.
(i)
Repeat this for any other pair of numerical variables.