Recall Definition 4.2.1. For a numerical variable in particular, a C% confidence interval is an interval so that the true mean of the variable, , has a C% chance of lying within the interval.
For a numerical sample of size with sample mean and sample standard deviation , we recall that the standard error is . Thus, we can compute a C% confidence interval via
where is the value so that for a -distribution with degrees of freedom.
Figure6.2.2.A C% confidence interval for . There is a C% chance the population mean lies in this interval.
Suppose we wanted to find a 95% confidence interval for the average credit card debt of a student. In a sample of 25 college students, the sample average debt was with sample standard deviation .
Suppose we wanted to take samples of n=50 houses to determine a confidence interval. Find so that for a -distribution with 49 degrees of freedom, we have: .
In 2020, the average finishing time of a race was 95 minutes. The race organizer believes that in 2021, the average finishing time will be less. They sample 20 random runners from the 2021 race and their finishing times in minutes were:
Find the probability that, if the average finishing time had not changed, that you could sample 20 runners and have an average running speed of or lower by computing for a -distribution with the appropriate degrees of freedom.
Hypothesis Testing for a numerical variable follows similarly from how it does for categorical variables as seen in Section 4.3. There is a random numerical variable with unknown true mean , that we want to say something about, and we gather data to reject or fail to reject a null hypothesis.
When doing numerical hypothesis testing, there are three types of Alternative Hypothesis:
Corresponding to “the true mean is (not equal to/greater than/less than) ” for some value .
In all of these cases, the Null Hypothesis will be: , that is, the true mean could be .
Then as before, we're given a sample from which we can compute a sample mean, , standard deviation and sample size . We then compute a -value for the alternative hypothesis. The -value still represent: “The probability that if we were to assume the null hypothesis, that we could observe values as or more extremal than the sample.”
The way -values are computed depends on the form of the alternative hypothesis:
If is of the form then allowing to be the -variable with mean , standard deviation and degrees of fredom, the -value is
-value
Figure6.2.4.-value for . We do this by finding the corresponding -score of and find the probability that for the standard variable with degrees of freedom:
Figure6.2.5.-value for using -scores.
If is of the form then allowing to be the -variable with mean , standard deviation and degrees of fredom, the -value is
-value
Figure6.2.6.-value for . We do this by finding the corresponding -score of and find the probability that for the standard -variable with degrees of freedom:
Figure6.2.7.-value for using -scores.
If is of the form then things are more complicated. We still let be the -variable with mean , standard deviation and degrees of freedom. But in this case, extremal means at least as far to the left or the right of . So we can compute the -value via
-value
Figure6.2.8.-value for . We do this by finding the corresponding -score of and find the probability that for the standard -variable with degrees of freedom:
Figure6.2.9.-value for using -scores. It's also worth noting that for either of these tails, and . So if you find one of these tails, you can double it to find the sum of both tails.
A researcher believes adults spend on average 2 hours and 20 minutes a day on social media. Their colleague disagrees. They survey 100 adults, and found a sample mean of 2 hours and 17 minutes, with standard deviation 23.5 minutes. Suppose we had a level of significance .
Our recurring restaurateur believes that the average amount spent by customers is over $12. She plans on polling 50 customers to test this. Suppose we had a level of significance .
As in Activity 4.3.10, we can use R to Hypothesis Test Directly. The structure of the command is t.test(data, mu=mu_0, alternative="greater, less, two.sided" depending on H_A)
Suppose we wanted to know if the average number of chocolate donuts sold by a shop per day is more than 55 chocolate donuts . We sample 12 random days and the number of chocolate donuts sold were
Suppose we wanted to know if the average number of donuts with filling sold by a shop per day is less than 100 donuts . We sample 10 random days and the number of donuts with filling sold were
Suppose we wanted to know if the average number of sprinkled donuts sold per day was or wasn't 120 . We sample 16 random days and the number of sprinkled donuts sold were
According to Business Insider, the average gas mileage of a car sold in America is 25 miles per gallon. One would hope that a hybrid car such a Prius would get better gas mileage. In fact it's plausible a hybrid car could get over 100 miles per gallon. Data is collected on 19 Prius drivers to see if Prius's have better than 100 mpg gas mileage on average.
Run the following code to download the prius_mpg.csv data set which contains information about 19 Prius drivers, and display the variables: