Section 2.1 Introduction to Probability (P1)
We use probability to build tools to describe and understand apparent randomness. We often frame probability in terms of a random process giving rise to an outcome.
Rolling a die or flipping a coin is a seemingly random process and each gives rise to an outcome.
In this section, we introduce the language of probability: events, probability of events, and operations of events.
Run the following code to download the loans_full_schema
data set, and display the variables it records:
This data set represents thousands of loans made through the Lending Club platform, which is a platform that allows individuals to lend to other individuals.
Subsection 2.1.1 Defining Probability
Exploration 2.1.1.
Before getting into more technical details, we work with a situation that is more familiar. Consider a standard fair six-sided dice.
(a)
What is the probability of getting a 1 on a roll?
(b)
What is the probability of getting a 1 or 2 on a roll?
(c)
What is the probability of getting a 1,2,3,4,5 or 6 on a roll?
(d)
What is the probability of not rolling a 2?
Definition 2.1.1.
The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.
Activity 2.1.2. Rolling a Six.
(a)
Run the following code to show a plot of 10 die rolls, and the proportion of die rolls that are 6's after each roll. The red dashed line represents \(\frac{1}{6}\text{.}\)
(b)
Run it a few times. What do you notice about the different simulations?
(c)
Change n=100
run it a few times. What do you notice about the different simulations?
(d)
Change n=1000
run it a few times. What do you notice about the different simulations?
(e)
Change n=10000
run it a few times. What do you notice about the different simulations?
(f)
As n
increases, what can you say about the proportion of die rolls that come up a “6”?
Subsection 2.1.2 Events
Definition 2.1.2.
A collection of outcomes is called an event.
Given events \(A\text{,}\) \(B\text{,}\) we let \(A\ \text{or}\ B\) denote the event “either \(A\) occurs or \(B\) occurs or both.”
We let \(A\ \text{and}\ B\) denote the event “both \(A\) and \(B\) occur at the same time.”
We let \(A^c\) pronounced “\(A\) complement” denote the event “\(A\) does not occur.”
Activity 2.1.3. Example Events.
Consider the set of numbers \(\{1,2,3,4,5,6,7,8,9,10\}\text{.}\) Suppose one picks one of these numbers.
(a)
Let \(A\) denote the event “the chose number is greater than 5”. What outcomes belong to \(A\text{?}\)
(b)
Let \(B\) denote the event “the chose number is prime”. What outcomes belong to \(B\text{?}\)
(c)
Consider the event \(A\ \text{and} B\text{.}\) Give a verbal description of this event.
(d)
What outcomes belong to \(A\ \text{and} B\text{?}\)
(e)
Consider the event \(A\ \text{or}\ B\text{.}\) Give a verbal description of this event.
(f)
What outcomes belong to \(A\ \text{or}\ B\text{?}\)
(g)
Consider the event \(B^c\text{.}\) Give a verbal description of this event.
(h)
What outcomes belong to \(B^c\text{?}\)
(i)
Consider the event \(A^c\ \text{and} B\text{.}\) Give a verbal description of this event.
(j)
What outcomes belong to \(A^c\ \text{and} B\text{?}\)
Remark 2.1.6.
Given an experiment, the set of all possible outcomes is denoted the sample space. For a finite sample space \(S\) where all outcomes are equally likely, the probability of an event \(A\) is the size of the event divided by the size of the sample space:
Activity 2.1.4. Extreme Scenarios.
(a)
If an event \(X\) has probability \(P(X)=0\text{,}\) what does that mean?
(b)
If an event \(Y\) has probability \(P(Y)=1\text{,}\) what does that mean?
(c)
Could an event \(Z\) have probability \(P(Z)\lt0\) or \(P(Z)>1\text{?}\) If it can, give an example, if it can't, explain why not.
Activity 2.1.5. Example Probability.
Recall from Activity 2.1.3 the sample space \(S=\{1,2,3,4,5,6,7,8,9,10\}\text{,}\) the events \(A\)=“the chosen number is greater than 5” and \(B\)=“the chosen number is prime”.
(a)
What is \(P(A)\text{?}\)
(b)
What is \(P(B)\text{?}\)
(c)
What is \(P(A\ \text{and} B)\text{?}\)
(d)
What is \(P(A\ \text{or}\ B)\text{?}\)
(e)
Look at the probabilities you computed. What is the relationship between them?
(f)
In the following Venn Diagram take thw numbers 1-10, and place them in a Venn Diagram, depending on whether or not they belong to both \(A\) and \(B\text{,}\) just \(A\text{,}\) just \(B\) or neither.
(g)
Is \(P(A\ \text{or}\ B)=P(A)+P(B)\text{?}\) If they're not equal, how might we adjust the sum so that they are?
(h)
Compare \(P(A)\) to \(P(A^c)\) and \(P(B)\) to \(P(B^c)\text{.}\) What is the relationship there?
Remark 2.1.8.
Given events \(A, B\text{,}\) we have that:
This corresponds to the outcomes of \(A\) plus the outcomes of \(B\text{,}\) then removing the overlap.
Given event \(A\text{,}\) we have that \(P(A^c)=1-P(A)\text{.}\) This corresponds to all outcomes, removing the ones belonging to \(A\text{.}\)
Activity 2.1.6. Application Type and Homeownership.
(a)
Run the following code to display the number of loans out of 10,000 that were for joint applications:
(b)
Run the following code to display the number of loans out of 10,000 that were for applicants with mortages:
(c)
Run the following code to display the number of loans out of 10,000 that were for joint applicants with mortages:
(d)
Suppose we pick a loan at random. Let \(J\) denote the event “joint application” and \(M\) denote “applicant(s) with a mortage”. Find \(P(J), P(M), P(M\ \text{and} J), P(M\ \text{or}\ J)\)
(e)
Run the following code to sample 1000 random loans from this list:
(f)
Run the following code to plot a contingency table for the above sample, comparing application type to homeownership:
(g)
What proportion of your sample were joint applications? From homeowners? From both? From either one or the other? How does this compare to what you computed in (d)?
Activity 2.1.7. Probability of your chosen variables.
(a)
Follow this link and identify two categorical vairables and a value for each whose probability you wish to find. https://www.openintro.org/data/index.php?data=loans_full_schema
.
(b)
Modify the following code to display the number of loans out of 10,000 where the variable of your choice takes on the value of your choice:
(c)
Modify the following code to display the number of loans out of 10,000 where both variables of your choice takes on the values of your choice:
(d)
Modify the following code to sample 1000 random loans from this list and display a contingency table for your two variables:
(e)
How do the results from your sample compare to the counts you computed earlier?