Skip to main content

Section 2.3 Bayes Theorem (P3)

In Section 2.2 we saw that some events can change the likelihood of other events. If we consider something like smoking and lung cancer, it's well understood that whether or not one smokes differentiates the probability that one contracts lung cancer. Thus, we should be able to predict lung cancer rates from smoking rates, and vice versa. We will make that formal here.

Run the following code to download the smallpox data set, and display the variables it records:

Subsection 2.3.1 Using Conditional Probability

Remark 2.3.1.

Reworking Definition 2.2.1 we get:

\begin{align*} P(A|B)\amp=\frac{P(A\ \text{and} \ B)}{P(B)}\\ P(A\ \text{and} \ B)\amp=P(A|B)P(B) \end{align*}

Activity 2.3.1. Conditional Probability and Coffee.

Consider a group of coffee drinkers and select one at random. Let \(C\) denote “selected person uses cream” and let \(S\) denote “selected person uses sugar”. Suppose that 70% of coffee drinkers use cream (\(P(C)\)), out of coffee drinkers who use cream, 80% use sugar (\(P(S|C)\)), and out of those who do not use cream, 40% use sugar (\(P(S|C^c)\)).

Consider this was about a group of 100 coffee drinkers.

(a)

How many coffee drinkers in this group use cream?

(b)

How many of the cream users use sugar?

(c)

How many of the cream users do not use sugar?

(d)

How many coffee drinkers in this group do not use use cream?

(e)

How many of the non-cream users use sugar?

(f)

How many of the non-cream users do not use sugar?

(g)

Fill out the Venn Diagram of coffee drinkers:

Test Venn Diagram.
Figure 2.3.2. \(C\) cream, \(S\) sugar.

(h)

How many people used sugar?

(i)

What is \(P(S)\text{?}\)

(j)

What is \(P(C|S)\text{?}\)

(k)

What is \(P(C^c|S^c)\text{?}\)

Activity 2.3.2. Simulating Coffee Drinkers.

Suppose that a coffee drinkers has a 70% chance to use cream (\(P(C)\)), out of coffee drinkers who use cream, they have an 80% chance to use sugar (\(P(S|C)\)), and out of those who do not use cream, they have a 40% chance to use sugar (\(P(S|C^c)\)).

(a)

Run the following code to simulate 100 random coffee drinkers and whether or not they use cream.

(b)

Run the following code to simulate the coffee drinker's sugar preference, based on if they use cream.

(c)

Run the following code to show coffee drinkers preferences for cream and sugar.

Remark 2.3.3.

Given events \(X\) and \(Y\text{,}\) note that each event in \(X\) either satisfies \(Y\text{,}\) or doesn't. Thus we have:

\begin{equation*} P(X)= P(X\ \text{and}\ Y)+P(X\ \text{and}\ Y^c) \end{equation*}

Test Venn Diagram.
Figure 2.3.4. Every event in \(X\) is either in \(Y\) or isn't.

Subsection 2.3.2 Bayes Theorem

Remark 2.3.5.

Consider that given events \(A\) and \(B\text{.}\) From Remark 2.3.3 we have that:

\begin{equation*} P(A) = P(A\ \text{and}\ B) + P(A\ \text{and}\ B^c). \end{equation*}

Then from Remark 2.3.1 we get

\begin{equation*} P(A)=P(A|B)P(B) + P(A|B^c)P(B^c). \end{equation*}

Then via Remark 2.1.8

\begin{equation*} P(A)=P(A|B)P(B) + P(A|B^c)(1-P(B)). \end{equation*}

Activity 2.3.3. Conditional Probability and Coffee revisited.

Recall Activity 2.3.1 Let \(C\) denote “selected person uses cream” and let \(S\) denote “selected person uses sugar”. Recall that out of coffee drinkers who use cream, 80% use sugar (\(P(S|C)\)), and out of those who do not use cream, 40% use sugar (\(P(S|C^c)\)).

(a)

If 70% of coffee drinkers use cream (\(P(C)=0.7\)), use Remark 2.3.5 to show how what proportion of people use sugar (\(P(S)\)). How does this solution compare to what you found in Activity 2.3.1?

(b)

If 20% of coffee drinkers use cream (\(P(C)=0.2\)), use Remark 2.3.5 to show how what proportion of people use sugar (\(P(S)\)).

(c)

If 50% of coffee drinkers use cream (\(P(C)=0.5\)), use Remark 2.3.5 to show how what proportion of people use sugar (\(P(S)\)).

(d)

If 0% of coffee drinkers use cream (\(P(C)=0\)), use Remark 2.3.5 to show how what proportion of people use sugar (\(P(S)\)).

(e)

If 100% of coffee drinkers use cream (\(P(C)=1\)), use Remark 2.3.5 to show how what proportion of people use sugar (\(P(S)\)).

(f)

In the Desmos interative below, let X denote \(P(S)\) and \(Y\) denote \(P(C)\text{.}\) Note that X_GivenY\(=P(S|C)=0.8)\) and X_GivenNotY\(=P(S|C^c)=0.4)\) are already set. Set Y to be equal to 0, 0.2, 0.5, 0.7, 1. What do the visuals tell you for each setting? How do the X values compare to what you found above?

Activity 2.3.4. Zombie Apocalypse.

Suppose that during the zombie apocalypse, an anti-zombification serum is developed. People who take the serum have a 10% chance of becoming zombies, but those who don't have an 85% chance of becoming zombies. Suppose that in a town, 60% of the residents had become zombies.

Let \(S\) denote the event “took the serum” and \(Z\) denote the event “became a zombie”.

(a)

“People who take the serum have a 10% chance of becoming zombies” corresponds to which of the following?

  • \(\displaystyle P(Z)\)

  • \(\displaystyle P(S)\)

  • \(\displaystyle P(Z|S)\)

  • \(\displaystyle P(Z\ \text{and}\ S)\)

(b)

“Those who don't have an 85% chance of becoming zombies” corresponds to which of the following?

  • \(\displaystyle P(Z|S^c)\)

  • \(\displaystyle P(S^c)\)

  • \(\displaystyle P(Z\ \text{and}\ S^c)\)

  • \(\displaystyle P(S|Z)\)

(c)

“60% of the residents had become zombies” corresponds to which of the following?

  • \(\displaystyle P(Z|S)\)

  • \(\displaystyle P(Z|S^c)\)

  • \(\displaystyle P(Z\ \text{or}\ S)\)

  • \(\displaystyle P(Z)\)

(d)

Use what you found above and Remark 2.3.5 to find \(P(S)\text{.}\)

(e)

Using the Desmos interative below, letting X denote \(P(Z)\) and \(Y\) denote \(P(S)\text{,}\) set Y to the appropriate value to verify that X is 0.6.

Via Remark 2.3.1 amd Remark 2.3.5:
\begin{align*} P(B|A)\amp=\frac{P(A\ \text{and}\ B)}{P(A)}\\ \amp=\frac{P(A|B)P(B)}{P(A)}\\ \amp=\frac{P(A|B)P(B)}{P(A|B)P(B) + P(A|B^c)(1-P(B))} \end{align*}

Activity 2.3.5. Bayes Theorem and Smallpox.

We show how Bayes Theorem is applied.

Recall that there are 6224 patients in the smallpox data set.

(a)

Run the following code to show how many patients were “inoculated”.

What is P(inoculated)?

(b)

Run the following code to subset the patients who were “inoculated”, the patients who were “not inoculated” and show how many patients were not inoculated.

(c)

Run the following code to see how many incoulated patients died.

What is P(died|inoculated)

(d)

Run the following code to see how many not incoulated patients died.

What is P(died|not inoculated)

(f)

Run the following code to subset the patients who “died” and show how many patients this was.

(g)

Run the following code to show how many of the patients who died were inoculated.

Compute P(inoculated|died) directly. How does this compare to what you found in (e)? To P(inoculated)?

Activity 2.3.6. Breast Cancer Simulation.

In Canada, 0.35% of women develop breast cancer P(BC)=0.0035. Of the women who have breast cancer and get tested, 89% will test positive P(positive|BC)=0.89, and of those who do not have breast cancer, 7% will (falsely) test positive regardless. P(positive| not BC)=0.07.

(a)

Use Theorem 2.3.6 to find the probability a woman has breast cancer if she tests positive for breast cancer (P(BC|positive)).

(b)

Run the following code to simulate 10,000 random women receiving a breast cancer test BC and whether or not they have breast cancer.

(c)

Run the following code to determine if each woman tests positive or negative.

(d)

Run the following code to combine cancer and testresult into a single dataframe testsubject.

(e)

Run the following code to subset testsubject who tested positive and see how many that is.

(f)

Run the following code to see how many women who tested positive for breast cancer had breast cancer.

(g)

Find the proportion of women who had breast cancer out of those who tested positive. How does this value compare to what you found in (a)?

(h)

Run the following code to show a contingency table for testing positive and breast cancer.