Skip to main content

Section A.2 Probability Theory Review

This is an extremely brief “review” of the limited probability theory we utilize in Chapter 5. It’s not even particularly fair to call this a review, since probability is not a prerequisite to this course. However, the limited amount we use is fairly straightforward and intuitive.
If a more thorough treatment is needed, then depending on your goals, there are good options available. For someone looking to explore some elementary probability theory, the introductory statistics textbook “OpenIntro Statistics”
 1 
www.openintro.org/book/os/
by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel does a good job presenting this material. It also is an excellent introducotry statistics text with labs and data available. For a calculus-based, theory heavy treatment of this text, I recommend “Probability: Lecture and Labs”
 2 
www.markhuberdatascience.org/probability-textbook
.

Definition A.2.1.

In probability, an experiment is an occurrence with a measurable result. Each instance of an experiment is a trial. The possible results of each trial are called outcomes. The set of all possible outcomes for an experiment is the sample space for that experiment.

Definition A.2.2.

Given an experiment with sample space \(S\text{:}\)
  • An event \(A\) is a subset of \(S\text{.}\)
  • If each outcome in the sample space is equally likely, then the probability of \(A\text{,}\) denoted \(P(A)\) is
    \begin{equation*} P(A)=\frac{|A|}{|S|}. \end{equation*}

Remark A.2.3.

But what does it mean for an event \(A\) to have probability \(P(A)\text{?}\) It means that if I repeat the experiment over and over, the proportion of them where \(A\) is true should be \(P(A)\text{.}\)
So if I roll a die over and over, the proportion of them that give me a 6 over time should be \(\frac{1}{6}\text{.}\) So if we roll a dice 10000 times, we would expect one sixth of them to come up heads:

Definition A.2.4.

A random variable is a function from sample space to an outcome set. For our purposes, this set of outcomes will always be \(\mathbb{R}\text{.}\)
A probability distribution is, roughly speaking, a complete description of a random variable and the likelihood of each output. In the case of random variables with a finite number of possible outputs a probability distribution table is a convenient way of presenting this information.

Remark A.2.5.

To check if something is a valid probability distribution, for any possible outcome \(x\) of \(X\) we must have:
  • \(0\leq P(X=x)\leq1\text{.}\) This ensures all outcomes are valid probabilities.
  • \(\sum P(X=x)=1\text{.}\) The sum of the probabilities of all outcomes should be 100% of the outcomes

Example A.2.6. Poisoned apples.

Snow White has a basket of 10 apples, 3 are poisoned. She is going to pick 4 apples at random to eat for some reason. Let \(X\) denote the number of poisoned apples she eats.
The probability distribution for \(X\) would be:
\begin{equation*} \begin{array}{|c|cccc|} \hline x \amp 0 \amp 1 \amp 2 \amp 3 \\ \hline P(X=x) \amp \frac{{3\choose 0}{7\choose 4}}{ {10\choose 4} } \amp \frac{{3\choose 1}{7\choose 3}}{ {10\choose 4} } \amp \frac{{3\choose 2}{7\choose 2}}{ {10\choose 4} } \amp\frac{{3\choose 3}{7\choose 1}}{ {10\choose 4} }\\ \hline \end{array} \end{equation*}
equivalently:
\begin{equation*} \begin{array}{|c|cccc|} \hline x \amp 0 \amp 1 \amp 2 \amp 3 \\ \hline P(X=x) \amp \frac{35}{210}\amp \frac{105}{210} \amp \frac{63}{210} \amp \frac{7}{210}\\ \hline \end{array} \end{equation*}
or:
\begin{equation*} \begin{array}{|c|cccc|} \hline x \amp 0 \amp 1 \amp 2 \amp 3 \\ \hline P(X=x) \amp \approx 0.1667\amp 0.5 \amp 0.3 \amp \approx 0.0333\\ \hline \end{array} \end{equation*}
This can be seen by the following R simulation:

Definition A.2.7.

Given a finite random variable \(X\text{,}\) it’s expected value is the predicted average outcome of experiments, and is computed:
\begin{equation*} E(X)=\sum P(X=x)\cdot x. \end{equation*}
Note that the “Expected Value” may not be a value we actually expect, that is, may not be one of the outcomes, just an average outcome. We think of this as the outcomes of \(X\text{,}\) weighted by their likelihood, so the more likely outcomes contribute more than the less likely ones.

Example A.2.8.

Recall Example A.2.6. The expected value of poisoned apples would be
\begin{equation*} E(X)=0\cdot\frac{35}{210}+1\cdot\frac{105}{210}+2\cdot\frac{63}{210}+3\cdot\frac{7}{210}=1.2. \end{equation*}
We can compute the mean of the previously simulated number of poisoned apples and visualize it:
Be sure to run the simulation in Example A.2.6 first!