Introduction to Confidence Intervals (F2)

Section 4.2 Introduction to Confidence Intervals (F2)

We saw in Section 4.1 that the point estimates can be a very accurate predictor of the parameter of interest. However, it is rarely exactly the same, and of course there is always some cases where it deviates significantly.

The odds that your point estimate matches the parameter of interest may be very unlikely, but we can perhaps use the point estimate to find a range of probable values for the parameter of interest.

In this section we compute and interpret confidence intervals for population proportions.

Subsection 4.2.1 Defining the Confidence Interval

Exploration 4.2.1. Finding an Interval.

Recall from Theorem 4.1.4 that \(\hat{p}\) can be approximated by a normal variable with parameters \(\mu_{\hat{p}}=p, SE_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}.\)

Hint. Desmos

(a)

For the standard normal variable \(Z\) (Section 3.1) verify that \(P(-1.96\lt Z\lt 1.96)\approx 0.95\text{.}\)

(b)

Use the \(z\)-score \(1.96\) from (a) and Definition 3.2.1 to find the corresponding \(x\) value for a normal variable with mean \(\mu_{\hat{p}}=p\) and standard deviation \(SE_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}.\text{.}\)

Definition 4.2.1.

Given a parameter of interest, a C% confidence interval is an interval \([L, U]\) such that there is a C% chance the parameter lies in this interval.

Remark 4.2.2.

Suppose you had a sample of size \(n\) and a sample proportion of \(\hat{p}\text{.}\) Following Exploration 4.2.1, we have that

\begin{equation*} [\hat{p}-1.96SE_{\hat{p}}, \hat{p}+1.96SE_{\hat{p}}] \end{equation*}

is a 95% Confidence Interval for \(p\text{.}\) In other words, there is a 95% chance that \(p\) lies in this interval.

Example 4.2.4. Cord Cutting.

In a survey of 150 Adult Americans, 33 of them said they were “cord-cutters”, people who only consume television through internet streaming. Find a 95% Confidence Interval for the proportion of American Adults who are cord cutters.

Solution.

We note that the the sample size is \(n=150\) and the sample proportion is

\begin{equation*} \hat{p}=\frac{33}{150}=0.22 \end{equation*}

or 22% of the sample were cord cutters. The sample proportion is a random variable distributed with estimated standard error

\begin{equation*} SE_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.22\cdot0.78}{150}}\approx 0.03382306. \end{equation*}

So following Remark 4.2.2, we have that the lower bound for our Confidence Interval is

\begin{equation*} \hat{p}-1.96SE_{\hat{p}}\approx 0.22-1.96\cdot0.03382306\approx 0.1537 \end{equation*}

or 15.37%. Similarly, the upper bound for our Confidence Interval is

\begin{equation*} \hat{p}+1.96SE_{\hat{p}}\approx 0.22+1.96\cdot0.03382306\approx 0.2863 \end{equation*}

or 28.63%.

So the 95% Confidence Interval is \([0.1537, 0.2863]\text{,}\) that is, there is a 95% chance the population proportion lies in this interval. In other words: “There is a 95% chance that the proportion of American Adults who are cord-cutters is between 15.37% and 28.63%”.

Note this means that there is a 5% chance that the proportion of American Adults who are cord-cutters is higher than 28.63% or lower than 15.37%.

Activity 4.2.2. Medium Rare Steak.

Suppose 23% of of Americans prefer their steak medium-rare. A researcher, not knowing this, polls 50 Americans for their doneness preference.

(a)

If the researcher produces a 95% confidence interval, which of the following best describe this confidence interval?

95% of adult Americans lie in this interval
There is a 23% chance that 95% of Americans prefer their steaks medium-rare.
There is a 95% chance that 23% of Americans perfer their steak medium-rare.
There is a 95% chance that the population proportion lies in this interval?

(b)

If 10 respondents to the survey say they prefer medium-rare, use Theorem 4.1.4 find \(\hat{p}\) and \(SE_{\hat{p}}\text{.}\)

(c)

Use Remark 4.2.2 to compute a 95% confidence interval for \(p\text{:}\) the proportion of Americans who prefer their steaks medium-rare. Is 23% in this interval?

Hint. Desmos

(d)

Run the following code to conduct your own survey of 50 Americans, and see how many of them like their steak medium-rare.

(e)

Use this new result to compute \(\hat{p}, SE_{\hat{p}}\) and a new 95% Confidence Interval. Does THIS new interval contain 23%?

(f)

State what this confidence interval means in the context of this problem using complete sentences.

(g)

Run the following code to, for 100 times, sample 50 Americans and construct a 95% confidence interval based on the results.

How many of these intervals do not contain 23%? Is this suprising?

Subsection 4.2.2 Other Confidences

Exploration 4.2.3.

Recall from Example 4.2.4 that the 95% confidence interval for the proportion of American Adult cord cutters was \([0.1537, 0.2863]\text{.}\)

(a)

If you needed an interval that had a 99% chance to contain the population proportion, should it be wider or narrower?

(b)

If you only needed an interval that had a 90% chance to contain the population proportion, should it be wider or narrower?

Remark 4.2.5.

In general, if we wanted to find a \(C\%\) confidence interval given \(\hat{p}, SE_{\hat{p}}\) we would find this via:

\begin{equation*} [\hat{p}-z^*SE_{\hat{p}}, \hat{p}+z^*SE_{\hat{p}}] \end{equation*}

where \(s^*\) is a standard normal value such that given the standard normal random variable \(Z\text{,}\) \(P(-z^*\lt Z\lt z^*)=C%\text{.}\)

One can check that the following \(z^*\) values correspond to the following commonly used confidence levels:

\begin{equation*} \begin{array}{|c|c|} \hline C\% \amp z^*\\ \hline 90\% \amp 1.645\\ 95\% \amp 1.96\\ 99\% \amp 2.576 \\ \hline \end{array} \end{equation*}

Hint. Desmos

Activity 4.2.4. Product Reviews.

In a random survey of 250 customers who purchased a product, 33 of them gave the product a negative review.

(a)

Find \(\hat{p}\text{,}\) the sample proportion of customers who gave the product a negative review, as well as \(SE_{\hat{p}}\text{.}\)

(b)

Use Remark 4.2.5 to find 90%, 95%, and 99% confidence intervals for the proportion of all customers who thought negatively of the product. What do we notice as the confidence level increases?

(c)

State what this confidence interval means in the context of this problem using complete sentences.

(d)

Find a value \(z^*\) so that given the standard normal random variable \(Z\text{,}\) \(P(-z^*\lt Z\lt z^*)=0.85\text{.}\)

Hint. Desmos

(e)

Use \(z^*\) to find an 85% confidence intervals for the proportion of all customers who thought negatively of the product. How does this value compare to what we found in (b)?

(f)

State what this confidence interval means in the context of this problem using complete sentences.

Activity 4.2.5. Branches of the US Military.

We consider what proportion of the Military belongs to which branch of service.

Run the following code to download military.csv and display it's variables.

This dataset contains demographic information on every member of the US armed forces including gender, race, and rank.

Let \(p\) denote the proportion of military members who are in the Army.

(a)

Run the following to sample n=50 random members of the military:

(b)

Run the following to see how many members of our sample are in the Army:

Let \(\hat{p}\) denote the proportion of the sample that's in the army.

(c)

Use \(\hat{p}\) above and \(n=50\text{.}\) to compute a 95% confidence interval for the proportion of miliary members in the Army.

Hint. Desmos

(d)

State what this confidence interval means in the context of this problem using complete sentences.

(e)

Run the following to see the actual proportion of military members in the Army:

Is this in your condidence interval?