Chapter 2 Probability

A primary goal of statistics is to describe the real world based on limited observations. These observations may be influenced by random factors, such as measurement error or environmental conditions. This chapter introduces probability, which is designed to describe random events. Later, we will see that the theory of probability is so powerful that we intentionally introduce randomness into experiments and studies so we can make precise statements from data.

2.1 Probability Basics

In order to learn about probability, we must first develop a vocabulary that we can use to discuss various aspects of it.

Definitions

An experiment is a process that produces an observation.
An outcome is a possible observation
The set of all possible outcomes is called the sample space
An event is a subset of the sample space.

Example Roll a die and observe the number of dots on the top face. This is an experiment, with six possible outcomes. The sample space is the set $S = \{1,2,3,4,5,6\}$. The event “roll higher than 3” is the set $\{4,5,6\}$.

Example Stop a random person on the street and ask them what month they were born. This experiment has the twelve months of the year as possible outcomes. An example of an event $E$ might be that they were born in a summer month, $E = \{June, July, August\}$

Example Suppose a traffic light stays red for 90 seconds each cycle. While driving you arrive at this light, and observe the amount of time that you are stopped until the light turns green. The sample space is the interval of real numbers $[0,90]$. The event “you didn’t have to stop” is the set $\{0\}$.

Since events are, by their very definition, sets, it will be useful for us to review some basic set theory.

Definition 2.1 Let $A$ and $B$ be events in a sample space $S$.

$A \cap B$ is the set of outcomes that are in both $A$ and $B$.
$A \cup B$ is the set of outcomes that are in either $A$ or $B$ (or both).
$\overline{A}$ is the set of outcomes that are not in $A$ (but are in $S$).
$A \setminus B$ is the set of outcomes that are in $A$ and not in $B$.

Example Suppose that the sample space $S$ consists of the positive integers. Let $A$ be the set of all positive even numbers, and let $B$ be the set of all prime numbers. Then, $A = \{2, 4, 6, \ldots\}$ and $B = \{2, 3, 5, 7, 11, \ldots\}$. Then,

$A \cap B = \{2\}$
$A \cup B = \{2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, \ldots\}$
$\overline{A}$ is the set of all positive odd integers.
$B \setminus A$ is the set of odd prime numbers.

Definition 2.2 Let $S$ be a sample space. A valid probability of events $E$ is a number $P(E)$ between 0 and 1 (inclusive), so $0 \leq P(E) \leq 1$, that satisfies the following axioms.

The probability of the sample space is 1, $P(S) = 1$.
The probability of the empty set is 0, $P(\emptyset) = 0$.
Probabilities are monotonic: if $A \subset B$, the $P(A) \le P(B)$.
Probabilities are countably additive: If $A_1, A_2, \ldots$ are pairwise disjoint, then \[ P\bigl(\cup_{n=1}^\infty A_n\bigr) = \sum_{n= 1}^\infty P(A_n) \]

The four axioms above are not a minimal set of axioms. For example, countable additivity implies both $P(\emptyset) = 0$ and monotonicity. We will not be concerned in this book about carefully describing all of the subsets of $S$ which have an associated probability. We will assume that any event of interest will be an event associated with a probability.

There are multiple possible interpretations of a probability. One interpretation, which is the one used in this book, is that if the probability of an event $E$ is $p$, then if you repeat the experiment many times, then the proportion of times that the event occurs will eventually be close to $p$. Another interpretation is that the probability of an event measures the degree of certainty that a person has about whether the event occurs or not.

To understand the difference, consider the following thought experiment. Suppose that I am about to toss a coin, and I ask you to estimate the probability that the coin will land on Heads. Not knowing any reason to think otherwise, you might say that you estimate it to be $p = 0.5$. From the frequentist point of view, that would mean that you believe that if I repeat the experiment infinitely many times, then the proportion of times that it is Heads will converge to 0.5. From the certainty of belief point of view, you believe that each outcome (Heads/Tails) is equally likely.

Now, suppose that I flip the coin and look it, but don’t tell you whether it is Heads or Tails. At this point, there is nothing random that I can repeat, so it wouldn’t make sense from the first interpretation for you to say that the probability is $p = 0.5$. However, your degree of certainty about the outcome hasn’t changed, it is still $p = 0.5$. We bring up this example to illustrate the importance of a random nature to the experiments that we are studying in this section.

For the remainder of this chapter, we will be calculating probabilities of events.

Probabilities obey some important rules, which are consequences of the axioms.

Theorem Let $A$ and $B$ be events in the sample space $S$.

If $A$ and $B$ are disjoint, then $P(A \cup B) = P(A) + P(B)$.
$P(A) = 1 - P(\overline{A})$, where $\overline{A} = S \setminus A$.
$P(A \setminus B) = P(A) - P(A \cap B)$.
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

Proof. We sketch the proof of these results. For 1, note that in Axiom 4 of probability we can set $A_1 = A$, $A_2 = B$, and $A_3, $ all equal to the empty set. The result follows. For 3, we have that $A = (A \cap B) \cup (A \setminus B)$, where $A \cap B$ and $A \setminus B$ are disjoint. We have $P(A) = P(A \cap B) + P(A \setminus B)$, which gives the result. Item 2 is a special case of 3. To prove 4, we note that $A \cup B = A \cup (B \setminus A)$, where $A$ and $B \setminus A$ are disjoint. Therefore, $P(A \cup B) = P(A) + P(B \setminus A) = P(A) + P(B) - P(A \cap B)$ by 1 and 3.

One way to assign probabilities to events is empirically, by repeating an experiment many times and observing the proportion of times the event occurs. While this can only approximate the true probability, it is sometimes the only approach possible. For example, in the United States the probability of being born in October is noticeably higher than the probability of being born in January, and these values can only be estimated by observing actual patterns of human births.

Another method is to make an assumption that all outcomes are equally likely, usually because of some physical property of the experiment. For example, because (high quality) dice are close to perfect cubes, one believes that all six sides of a die are equally likely to occur. Using the additivity of disjoint events (rule 4 in the definition of probability),

\[ P(\{1\}) + P(\{2\}) + P(\{3\}) + P(\{4\}) + P(\{5\}) + P(\{6\}) = P(\{1,2,3,4,5,6\}) = 1 \]

Since all six probabilities are equal and sum to 1, the probability of each face occurring is $1/6$. In this case, the probability of an event $E$ can be computed by counting the number of elements in $E$ and dividing by the number of elements in $S$.

Example Suppose that two six-sided dice are rolled and the numbers appearing on the dice are observed. The sample space, $S$, is given by

\[ \begin{pmatrix} (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) \\ (2,1), (2,2), (2,3), (2,4), (2,5), (2,6) \\ (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) \\ (4,1), (4,2), (4,3), (4,4), (4,5), (4,6) \\ (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) \\ (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) \end{pmatrix} \]

By the symmetry of the dice, we expect all 36 possible outcomes to be equally likely. So the probability of each outcome is $1/36$.
The event “The sum of the dice is 6” is represented by

\[ E = \{(1,5), (2,4), (3,3), (4,2), (5,1)\} \]

The probability that the sum of two dice is 6 is given by \[ P(E) = \frac{|E|}{|S|} = \frac{5}{36}, \] which can be obtained by simply counting the number of elements in each set above.
Let $F$ be the event “At least one of the dice is a 2.” This event is represented by

\[ F = \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,2), (3,2), (4,2), (5,2), (6,2)\} \]

and the probability of $F$ is $P(F) = \frac{11}{36}$.

$E \cap F = \{(2,4), (4,2)\}$ and $P(E \cap F) = \frac{2}{36}$.
$P(E \cup F) = P(E) + P(F) - P(E \cap F) = \frac{5}{36} + \frac{11}{36} - \frac{2}{36} = \frac{14}{36}$.
$P(\overline{E}) = 1 - P(E) = \frac{31}{36}$.

2.2 Conditional Probability and Independence

Sometimes when considering multiple events, we have information that one of the events has occurred. This new information requires us to reconsider the probability that the other event occurs. For example, suppose that you roll two dice and one of them falls off of the table where you cannot see it, while the other one shows a 4. We would want to update the probabilities associated with the sum of the two dice based on this information. The new probability that the sum of the dice is 2 would be 0, the new probability that the sum of the dice is 5 would be 1/6 because that is just the probability that the die that we cannot see is a “1,” and the new probability that the sum of the dice is 7 would also be 1/6 (which is the same as its original probability).

Formally, we have the following definition.

Definition 2.3 Let $A$ and $B$ be events in the sample space $S$, with $P(B) \not= 0$. The conditional probability of $A$ given $B$ is \[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

We read $P(A|B)$ as “the probability of $A$ given $B$.” In R, the $|$ symbol denotes the or operator, but in mathematical notation $P(A \cup B)$ is the probability of $A$ or $B$. It is important to keep these distinct. We also note that $P(A|B)$ does not mean the probability of the event $A|B$, as there is no event in the sample space $S$ that corresponds to the “event” $A|B$. It is important to keep straight in your mind the fixed idiom $P(A|B)$ means the probability of $A$ given $B$, or the probability that $A$ occurs given that $B$ occurs.

The general process of assuming that $B$ occurs and making computations under that assumption is called conditioning on $B$. Note that in order to condition on $B$ in the definition of $P(A|B)$, we must assume that $P(B)\not= 0$, since otherwise we would get $\frac {0}{0}$, which is undefined. This also makes some intuitive sense. If we assume that a probability zero event occurs, then probability of further events conditioned on that would need to be undefined.

Example Two dice are rolled. What is the probability that both dice are 4, given that the sum of two dice is 8?

Solution: Let $A$ be the event “both dice are 4” and $B$ be the event “the sum is 8”. Then, $P(A|B) = P(A \cap B)/P(B) = \frac{1/36}{5/36} = 1/5.$ Note that this is the hardest way to get an 8; the probability that one of the dice is 3 and the other is 5 is 2/5.

Note that the order is important. If instead, we had asked “What is the probability that the sum of two dice is 8, given that both dice are 4?” the answer would be 1. Formally, with the events defined as in the previous paragraph, $P(B|A) = P(B \cap A)/P(A) = \frac{1/36}{1/36} = 1$.

Example Show that $P((A \cap B)|B) = P(A|B)$.

Solution: \[\begin{align*} P((A \cap B)|B) &= P((A \cap B) \cap B)/P(B)\\ &= P(A\cap (B \cap B))/P(B)\\ &= P(A \cap B)/P(B)\\ &= P(A|B). \end{align*}\]

We included parentheses around $(A \cap B)$ above, but we did not need to. Remember, there is no event that is associated with $B|B$, so the only possible interpretation of $P(A \cap B | B)$ is $P((A \cap B)|B)$.

Note that this makes intuitive sense; we assume that we know that $B$ occurs. In this case, the probability that both $A$ and $B$ occur is just the probability that $A$ occurs. A similar argument using De Morgan’s Laws shows that $P(A \cup B|B) = 1$.

From discussion above, we have seen that the probability of $A$ given $B$ can be larger than $P(A)$, smaller than $P(A)$ or equal to $P(A)$. Of particular interest are pairs of events $A$ and $B$ such that knowledge that one of the events occurs does not impact the probability that the other event occurs.

Definition 2.4 Two events are said to be independent if knowledge that one event occurs does not give any probabilistic information as to whether the other event occurs. Formally, we say that $A$ and $B$ are independent if $P(A \cap B) = P(A) P(B)$.

It is not immediately clear why the formal statement in the definition of independence implies the intuitive statement that “the knowledge that one event occurs does not give any probabilistic information as to whether the other event occurs.” To see that, we assume $P(B) \not= 0$ and compute:

\[ P(A|B) = P(A \cap B)/P(B) = \frac{P(A)P(B)}{P(B)} = P(A) \]

A similar computation shows that (if $P(A) \not= 0$) then $P(B|A) = P(B)$ and proves the following theorem.

Theorem 2.1 Let $A$ and $B$ be events with non-zero probability in the sample space $S$. The following are equivalent.

$P(A \cap B) = P(A)P(B)$
$P(A|B) = P(A)$
$P(B|A) = P(B)$

Events $A$ and $B$ are said to be dependent if they are not independent.

Example Two dice are rolled. Let $A$ be the event “The first die is a 5”, let $B$ be the event “The sum of the dice is 7”, and let $C$ be the event “The sum of the dice is 8.” Show that $A$ and $B$ are independent, but $A$ and $C$ are dependent.

Note that $P(B) = 6/36 = 1/6$. Now, $P(B|A) = P(A \cap B)/P(A) = \frac{1/36}{1/6} = 1/6$. Therefore, $A$ and $B$ are independent. However, $P(C) = 5/36$, which is not the same as $P(C|A) = \frac{1/36}{1/6} = 1/6$. Therefore, $A$ and $C$ are not independent. We note here that $B$ and $C$ are also not independent. Indeed, $P(B \cap C) = 0 \not= P(B) P(C)$.

Many times later in this book, independence will be an assumption that we make about events, rather than a property of events that we prove. It is important to develop an intuition about independence in order to determine whether such assumptions are reasonable. Consider the following scenarios, and determine whether the events indicated are most likely dependent or independent.

A randomly selected day in the last 365 days is selected, and $A$ is the event that the high temperature in St. Louis, Missouri on that day was greater than 90 degrees, while $B$ is the event that the high temperature on that same day in Cape Town, South Africa was greater than 90 degrees.
Two coins are flipped, and $A$ is the event that the first coin lands on Heads, while $B$ is the event that the second coin lands on Heads.
Six patients are given a tuberculosis skin test, which requires a professional to estimate the size of the reaction to the tuberculin agent. Two professionals are randomly chosen Let $A$ be the event that Alice estimates the size of the reaction in each of patients 1-5 to be larger than Bob does. Let $B$ be the event that that Alice estimates the size of the reaction in patient 6 to be larger than Bob does.

Discussion: in scenario 1, the events are dependent. If we know that the high temperature in St Louis was greater than 90 degrees, then the day was most likely a day in June, July, August or September, which gives us probabilistic information about whether the high temperature in Cape Town was greater than 90 degrees on that day. (In this case, it means that it was very unlikely, since that is winter in the southern hemisphere.)

In scenario 2, the events are independent, or at least approximately so. One could argue that knowing that one of the coins is Heads means the person tossing the coins might be more likely to obtain Heads when tossing coins. However, the potential effect here seems so weak based on experience, that it is a reasonable assumption that the events are independent.

In scenario 3, it may be inadvisable to assume that the events are independent. Of course, they may be. It could be that Alice and Bob are well-trained, and there is no bias in their measurements. However, it is also possible that there is something systematic about how they measure the reactions so that one of them usually measures it as larger than the other one does. Knowing $A$ may be an indication that Alice does systematically measure reactions as larger than Bob does. (Of course, it would also be interesting to know which one was closer to the true value, but that is not what we are worried about at this point.) Later, we will develop tools that will allow us to make a more quantitative statement about this type of problem.

Example In a certain hotel near the US/Canada border, 70% of hotel guests are American and 30% are Canadian. It is known that 40% of Americans wear white socks, while 20% of Canadians wear white socks. Suppose you randomly select a person and observe that they are wearing white socks. What is the probability that the person in that is Canadian?

To solve this, let’s first set notation. Let $A$ be the event that a randomly person selected is Canadian. Let $B$ denote the event that a randomly selected person is wearing white socks. We are asked to find $P(A|B)$. The problem gives us that $P(A) = 0.3$ and $P(B|A) = 0.2$, but is silent on the crucial issue of $P(A|B)$. In order to do this problem, we need a new technique given in the following theorem.

Theorem 2.2 (Bayes’ Rule) Let $A$ and $B$ be events in the sample space $S$. \[ P(B|A) = \frac{P(A|B)P(B)}{P(A)} = \frac{P(A|B)P(B)}{P(A|B)P(B) + P(A|\overline{B})P(\overline{B})} \]

To finish this problem, then, we apply Bayes’ rule and see that \[\begin{align*} P(A|B) &= \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}\\ &= \frac{0.2 \times 0.3}{0.2 \times 0.3 + 0.4 \times 0.7} = 0.176 \end{align*}\]

Bayes’ Rule will not come up much in this book, with the exception of Chapter 16, which is devoted to conditional probability and Bayesian statistics.

Example The name “Mary” was given to 7065 girls in 1880, and to 11475 girls in 1980. There were 97583 girls born in 1880, and 177907 girls born in 1980. Suppose that a randomly selected girl born in 1880 or 1980 is chosen. What is the probability that the girl’s name is “Mary”?

To solve this, let’s let $A$ be the event that the randomly selected girl’s name is Mary. If we knew what year the girl was born in, then we would have a good idea what to do. We don’t, so we condition on the birth year. Let $B$ be the event that the randomly selected girl was born in 1880. We need a new fact.

Theorem 2.3 (Law of Total Probability) Let $A$ and $B$ be events in the sample space $S$. \[ P(A) = P(A \cap B) + P(A \cap \overline{B}) = P(A|B)P(B) + P(A|\overline{B})P(\overline{B}) \]

Continuing with the example, we have \[\begin{align*} P(A) &= P(A|B)P(B) + P(A|\overline{B})P(\overline{B}) \\ &= \frac{7065}{97583} \frac{97583}{97583 + 177907} + \frac{11475}{177907} \frac{177907}{97583 + 177907}\\ &= .0676 \end{align*}\]

2.3 Counting Arguments

Given a sample space $S$ consisting of equally likely simple events, and an event $E$, recall that $P(E) = \frac{|E|}{|S|}$. For this reason, it can be useful to be able to carefully enumerate the elements in a set. While an interesting topic, this is not a point of emphasis of this book, as (1) we assume that students have seen some basic counting arguments in the past and (2) we emphasize simulations when computing probabilities gets challenging.

The interested reader who would like some review of elementary combinatorics can see Khan Academy for more examples of this type of reasoning at a similar level to this text. Another source with even more examples is the first chapter of Discrete and Combinatorial Mathematics, by Ralph Grimaldi. Note that the 4th edition is completely acceptable, and much cheaper than the 5th.

We recall for future reference two counting rules that we will use later in this text.

Proposition 2.1 (Rule of product) If there are $m$ ways to do something, and for each of those $m$ ways there are $n$ ways to do another thing, then there are $m \times n$ ways to do both things

Proposition 2.2 (Combinations) The number of ways of choosing $k$ distinct objects from a set of $n$ is given by \[ {n \choose k} = \frac{n!}{k!(n - k)!} \]

The R command for computing ${10 \choose 3}$ is choose(10,3).

Example

A coin is tossed 10 times. Some possible outcomes are HHHHHHHHHH, HTHTHTHTHT, and HHTHTTHTTT. Since each toss has two possibilities, the rule of product says that there are $2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2\cdot 2 = 2^{10} = 1024$ possible outcomes for the experiment. We expect each possible outcome to be equally likely, so the probability of any single outcome is 1/1024.

Let $E$ be the event “We flipped exactly three heads”. This might happen as the sequence HHHTTTTTTT, or TTTHTHTTHT, or many other ways. What is $P(E)$? To compute the probability, we need to count the number of possible ways that three heads may appear. Since the three heads may appear in any of the ten slots, the answer is

\[ |E| = {10 \choose 3} = \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 120. \]

Then $P(E) = 120/1024 \approx 0.117$.

Example Suppose that in a class of 10 boys and 10 girls, 5 students are randomly chosen to present work at the board. a. What is the probability that all 5 students are boys? b. What is the probability that exactly 4 of the students are girls?

Let $E$ be the event “all 5 students are boys.” The sample space consists of all ways of choosing 5 students from a class of 20, so choose(20,5) = 15504. The event $E$ consists of all ways of choosing 5 boys from a group of 10, so choose(10, 5) = 252. Therefore, the probability is 252/15504 = .016.

Next, let $A$ be the event “exactly 4 of the students are girls.” The sample space is still the same. The event $A$ can be broken down into two tasks: choose the 4 girls and choose the 1 boy. By the multiplication principle, there are choose(10, 4) * choose(10,1) = 2100 ways of doing that. Therefore, the probability is 2100/15504 = .135.

2.4 Simulations

One of the advantages of using R is that we can often estimate probabilities using simulations.

For the purposes of simulation, one of the most important ways of creating a vector is with the sample() command. Here are some facts about sample.

sample takes up to 4 arguments, though only the first argument x is required.
the parameter x is the vector of elements from which you are sampling.
size is the number of samples you wish to take.
replace determines whether you are sampling with replacement or not. Sampling without replacement means that sample will not pick the same value twice, and this is the default behavior. Pass replace = TRUE to sample if you wish to sample with replacement.
prob is a vector of probabilities or weights associated with x. It should be a vector of nonnegative numbers of the same length as x. If the sum of prob is not 1, it will be normalized.
If only x is supplied, then sample provides a random permutation of the values in x.
To get a random number between 1 and 10 (inclusive), use

sample(x = 1:10, size = 1)

## [1] 2

To get 3 distinct numbers between 1 and 10, use

sample(x = 1:10, size = 3, replace = FALSE)

## [1] 2 9 8

Note: You don’t have to type replace = FALSE, because FALSE is the default value for whether sampling is done with replacement. You also don’t have to type x = ... or size = ... as long as these are the first and second arguments. However, it is sometimes clearer to explicitly name the arguments to complicated functions like sample. Use your best judgment, and include the parameter name if there is any doubt.

To simulate 8 rolls of a six-sided die, use

sample(x = 1:6, size = 8, replace = TRUE)

## [1] 5 6 4 2 4 1 6 6

Note that you can store these in variables:

RollDice <- sample(x = 1:6, size = 8, replace = TRUE)

and see what the sum of the dice would be:

sum(RollDice)

## [1] 24

OR, if you only care about the sum of the dice, you could even do

SumOfDice <- sum(sample(x = 1:6,size = 8,replace = TRUE))

Finally, if you want to sample from sets that aren’t just the first x integers, you can specify the vector from which to sample. The following takes a random sample of size 2 from the first 8 prime numbers, with repeats possible.

primes <- c(2,3,5,7,11,13,17,19)
sample(x = primes,size = 2, replace = TRUE)

## [1] 17  7

2.4.1 Using replicate to simulate experiments

The function replicate is an example of an implicit loop in R. Suppose that expr is one or more R commands, the last of which returns a single value. The call

replicate(n, expr)

repeats the expression stored in expr n times, and stores the resulting values as a vector.

Example

Estimate the probability that the sum of two dice is 8.

The plan is to use sample to simulate rolling two dice. We will say that success is a sum of 8. Then, we use the replicate function to repeat the experiment many times, and take the mean number of successes to estimate the probability. Since the final R command is complicated, we will build it up piece by piece.

First, simulate a roll of two dice:

Dice <- sample(x = 1:6, size = 2, replace = TRUE) # roll two dice
Dice # display our roll

## [1] 6 6

sum(Dice)

## [1] 12

sum(Dice) == 8 # test for success

## [1] FALSE

We now replicate the above code. You can use curly braces { and } to replicate more than one R command. I recommend placing each line of code on a new line inside of replicate, for readability, but it is also possible to separate each command with a semicolon. When we do this, only the result of the last command is saved in the vector via replicate. First, we test that our replication works by only doing a few trials:

replicate(20, { 
  Dice <- sample(x = 1:6, size = 2, replace = TRUE)  
  sum(Dice) == 8
})

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Finally, we want to compute the probability of success. R counts TRUE as 1 and FALSE as 0. If we take the average of a vector of TRUE/FALSE values, we get the number of TRUE divided by the size of the vector, which is exactly the proportion of times that success occurred. We replicate 10000 times for a more accurate estimate.

mean(replicate(10000, { 
  Dice <- sample(x = 1:6, size = 2, replace = TRUE)  
  sum(Dice) == 8 
}))

## [1] 0.1401

5/36

## [1] 0.1388889

We compare the simulated result to the true result, which is 5/36. Generally, the more samples you take using replicate(), the more accurate you can expect your simulation to be. It is often a good idea to repeat a simulation a couple of times to get an idea about how much variance there is in the results.

The workflow we will be using in the rest of this book is to first store the simulated data in a variable, and then compute the probabilities.

sim_data <- replicate(10000, {
  Dice <- sample(x = 1:6, size = 2, replace = TRUE)  
  sum(Dice) == 8
})

mean(sim_data)

## [1] 0.1377

Example

Three dice are thrown. Estimate the probability that the largest value is a 4.

die_roll <- sample(1:6, 3, TRUE)
max(die_roll)

## [1] 5

sim_data <- replicate(10000, {
  die_roll <- sample(1:6, 3, TRUE)
  max(die_roll)
})

Note that in the above, we have simulated running an experiment where three dice are rolled and the largest value is observed. We can use this to estimate the probability that the largest value is a 4 in the following way:

mean(sim_data == 4)

## [1] 0.1766

It is not always easy to simulate the experiment directly (see the next example), but when it is, it is often beneficial to do so. For example, if we wanted to know estimate the probability that the largest die is a 6, we don’t need to re-simulate, we can simply compute

mean(sim_data == 6)

## [1] 0.4126

Example

A fair coin is repeatedly tossed. Estimate the probability that you observe Heads for the third time on the 10th toss.

As before, we build this up from scratch. Here is a sample of ten tosses of a coin:

coinToss <- sample(c("H", "T"), 10, replace = TRUE)
coinToss

##  [1] "T" "H" "H" "T" "T" "H" "H" "T" "T" "H"

In order for this to be a success, we need for there to be exactly 3 heads, so we count the number of heads:

sum(coinToss == "H")

## [1] 5

And we check to see whether it is equal to 3

sum(coinToss == "H") == 3

## [1] FALSE

Next, we also need to make sure that we didn’t have 3 heads in the first nine tosses. So, we look only at the first nine tosses:

coinToss[1:9]

## [1] "T" "H" "H" "T" "T" "H" "H" "T" "T"

and add up the heads observed in the first nine tosses:

sum(coinToss[1:9] == "H") == 2

## [1] FALSE

Note that both of those have to be true in order for this to be a success:

sum(coinToss == "H") == 3 && sum(coinToss[1:9] == "H") == 2 #OR

## [1] FALSE

sum(coinToss == "H") == 3 && coinToss[10] == "H" #OR

## [1] FALSE

sum(coinToss[1:9] == "H") == 2  && coinToss[10] == "H"

## [1] FALSE

We put this inside replicate

sim_data <- replicate(10000, {
  coinToss <- sample(c("H", "T"), 10, replace = TRUE) 
  (sum(coinToss == "H") == 3) && (sum(coinToss[1:9] == "H") == 2)
})

mean(sim_data)

## [1] 0.0375

NOTE: I strongly recommend that you follow the workflow as presented above; namely,

Write code that performs the experiment a single time.
Place inside sim_data <- replicate(1000, { HERE }))
Compute probability using mean(sim_data).

It is much easier to trouble-shoot your code this way, as you can test each line of your simulation separately.

Example (The Birthday Problem)

Estimate the probability that out of 25 randomly selected people, no two will have the same birthday. Assume that all birthdays are equally likely, except that none are leapday babies.

In order to do this, we need to be able to determine whether all of the elements in a vector are unique. R has many, many functions that can be used with vectors. For most things that you want to do, there will be an R function that does it. In this case it is anyDuplicated(), which returns the location of the first duplicate if there are any, and zero otherwise.

The important thing to learn here isn’t necessarily this particular function, but rather the fact that most tasks are possible via some built in functionality.

sample(x = 1:365, size = 25, replace = TRUE)

##  [1] 334 160 187 351 123 129 347   8 220 351 301 169 248 220 360 265  37 170 326
## [20] 338 314 202 184 108  11

anyDuplicated(sample(x = 1:365, size = 25, replace = T))

## [1] 0

trials <- replicate(n = 10000, {
  anyDuplicated(sample(x = 1:365, size = 25, replace = T))
})
head(trials)

## [1] 17  0  0  0  0  0

mean(trials == 0)

## [1] 0.4378

Here, we have built up the simulation from the ground floor, first simulating the 25 birthdays, then determining if any two birthdays are the same, and then replicating the experiment. Note the use of mean in the last line to compute the proportion of successes. “Success” here is the event “no two people have the same birthday”, and the probability of this event is approximately 0.43. Interestingly, we see that it is actually quite likely (about 57%) that a group of 25 people will contain two with the same birthday.

Another approach is to use the idiom length(unique()) to count the number of distinct birthdays that are sampled, and see whether that is equal to 25, the total number of birthdays sampled.

bdays <- sample(1:365, 25, replace = TRUE)
length(unique(bdays)) == 25

## [1] FALSE

sim_data <- replicate(10000, {
  bdays <- sample(1:365, 25, replace = TRUE)
  length(unique(bdays)) == 25
})
mean(sim_data)

## [1] 0.4265

Challenge

Modify the above code to take into account leap years. Is it reasonable to believe that birthdays throughout the year are equally likely? Later in this book, we will use birthday data to get a better approximation of the probability that out of 25 randomly selected people, no two will have the same birthday.

Example Three numbers are picked uniformly at random from the interval $(0,1)$. What is the probability that a triangle can formed whose side-lengths are the three numbers that you chose?

Solution: We need to be able to simulate picking three numbers at random from the interval $(0, 1)$. We will see later in the book that the way to do this is via runif(3, 0, 1), which returns 3 numbers randomly chosen between 0 and 1. We then need to check whether the sum of the two smaller numbers is larger than the largest number. We use the sort command to sort the three numbers into increasing order, as follows:

sim_data <- replicate(10000, {
  x = sort(runif(3,0,1)) 
  sum(x[1:2]) > x[3]
})
mean(sim_data)

## [1] 0.4976

Example According to Rick Wicklin, the proportion of M&M’s of various colors produced in the New Jersey M&M factory are as follows.

Color	Proportion
Blue	25.0
Orange	25.0
Green	12.5
Yellow	12.5
Red	12.5
Brown	12.5

If you buy a bag from the New Jersey factory that contains 35 M&M’s, what is the probability that it will contain exactly 9 Blue and 5 Red M&M’s?

To do this, we can use the prob argument in the sample function, as follows.

bag <- sample(c("Bl", "O", "G", "Y", "R", "Br"), 
              size = 35, 
              replace = TRUE, 
              prob = c(25, 25, 12.5, 12.5, 12.5, 12.5))
sum(bag == "Bl") #counts the number of Blue M&M's

## [1] 6

sim_data <- replicate(10000, {
  bag <- sample(c("Bl", "O", "G", "Y", "R", "Br"), 
              size = 35, 
              replace = TRUE, 
              prob = c(25, 25, 12.5, 12.5, 12.5, 12.5))
  sum(bag == "Bl")  == 9 && sum(bag == "R") == 5
})
mean(sim_data)

## [1] 0.0298

2.4.2 Simulating Conditional Probability

Simulating conditional probabilities can be a bit more challenging. In order to estimate $P(A|B)$, we will estimate $P(A \cap B)$ and $P(B)$, and then divide the two answers. This is not the most efficient or best way to estimate $P(A|B)$, but it is easy to do with the tools that we already have developed.

Example

Two dice are rolled. Estimate the conditional probability that the sum of the dice is at least 10, given that at least one of the dice is a 6.

First, we estimate the probability that the sum of the dice is at least 10 and at least one of the dice is a 6.

sim_data_AB <- replicate(10000, { 
  dieRoll <- sample(1:6, 2, replace = TRUE)
  sum(dieRoll) >= 10 && 6 %in% dieRoll
})
probAB <- mean(sim_data_AB)

Next, we estimate the probability that at least one of the dice is a 6.

sim_data_B <- replicate(10000, {
  die_roll <- sample(1:6, 2, replace = TRUE)
  6 %in% die_roll
})
probB <- mean(sim_data_B)

Finally, we take the quotient.

probAB/probB

## [1] 0.4811954

Note that the correct answer is $P(A\cap B)/P(B) = \frac{5/36}{11/36} = 5/11 \approx 0.4545$. This is not the ideal way to estimate conditional probabilities using simulation. Better would be to have a single repeated sample, from which we estimate both the probability of the intersection and the probability of the event. For simplicity, though, we will be satisfied with this estimate for the purposes of this book.

2.5 Exercises

Two dice are rolled.
1. What is the probability that the sum of the numbers is exactly 10?
2. What is the probability that the sum of the numbers is at least 10?
3. What is the probability that the sum of the numbers is exactly 10, given that it is at least 10?
A hat contains six slips of paper with the numbers 1 through 6 written on them. Two slips of paper are drawn from the hat (without replacing), and the sum of the numbers is computed.
1. What is the probability that the sum of the numbers is exactly 10?
2. What is the probability that the sum of the numbers is at least 10?
3. What is the probability that the sum of the numbers is exactly 10, given that it is at least 10?
When rolling two dice, what is the probability that one die is twice the other?
Consider an experiment where you roll two dice, and subtract the smaller value from the larger value (getting 0 in case of a tie).
1. What is the probability of getting 0?
2. What is the probability of getting 4?
Roll two dice, one white and one red. Consider these events:
- $A$: The sum is 7
- $B$: The white die is odd
- $C$: The red die has a larger number showing than the white
- $D$: The dice match (doubles)
1. Which pair(s) of events are disjoint (events $A$ and $B$ are disjoint if $A \cap B = \emptyset$)?
2. Which pair(s) are independent?
3. Which pair(s) are neither disjoint nor independent?
Suppose you do an experiment where you select ten people at random and ask their birthdays. Here are three events:
- $A$ : all ten people were born in February
- $B$ : the first person was born in February
- $C$ : the second person was born in January
1. Which pair(s) of these events are disjoint, if any?
2. Which pair(s) of these events are independent, if any?
3. What is $P(B | A)$?
Suppose a die is tossed three times. Let $A$ be the event “The first toss is a 5”. Let $B$ be the event “The first toss is the largest number rolled” (the “largest” can be a tie). Determine, via simulation or otherwise, whether $A$ and $B$ are independent.
Bob Ross was a painter with a PBS television show “The Joy of Painting” that ran for 11 years.
1. 91% of Bob’s paintings contain a tree. 85% contain two or more trees.
  
  What is the probability that he painted a second tree, given that he painted a tree?
2. 18% of Bob’s paintings contain a cabin. Given that he painted a cabin, there is a 35% chance the cabin is on a lake.
  
  What is the probability that a Bob Ross painting contains both a cabin and a lake?
(Source: https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)
Rick Wilkin determined that in the Tennessee M&M’s factory, the proportion of M&M’s by color is: \[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 0.14 & 0.13 & 0.20 & 0.12 & 0.20 & 0.21 \end{array} \]
1. What is the probability that a randomly selected M&M is not green?
2. What is the probability that a randomly selected M&M is red, orange, or yellow?
3. Estimate the probability that a random selection of four M&M’s will contain a blue one.
4. Estimate the probability that a random selection of six M&M’s will contain all six colors.
With the distribution from problem 9, suppose you buy a bag of M&M’s with 30 pieces in it. Estimate the probability of obtaining at least 9 Blue M&M’s and at least 6 Orange M&M’s in the bag.
Blood types O, A, B, and AB have the following distribution in the US: \[ \begin{array}{lcccc} \text{Type} & O & A & B & AB\\ \text{Probability} & 0.45 & 0.40 & 0.11 & 0.04 \end{array} \] What is the probability that two randomly selected people have the same blood type?
Use simulation to estimate the probability that a 10 is obtained when two dice are rolled.
Estimate the probability that exactly 3 Heads are obtained when 7 coins are tossed.
Estimate the probability that the sum of five dice is between 15 and 20, inclusive.
Suppose a die is tossed repeatedly, and the cumulative sum of all tosses seen is maintained. Estimate the probability that the cumulative sum ever is exactly 20. (Hint: the function cumsum computes the cumulative sums of a vector.)
Rolling two dice
1. Simulate rolling two dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
2. You can buy trick dice, which look (sort of) like normal dice. One die has numbers 5, 5, 5, 5, 5, 5. The other has numbers 2, 2, 2, 6, 6, 6.
  
  Simulate rolling the two trick dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
3. Sicherman dice also look like normal dice, but have unusual numbers.
  
  One die has numbers 1, 2, 2, 3, 3, 4. The other has numbers 1, 3, 4, 5, 6, 8.
  
  Simulate rolling the two Sicherman dice and adding their values. Do 100000 simulations and make a bar chart showing how many of each outcome occurred.
  
  How does your answer compare to part a?
In a room of 200 people (including you), estimate the probability that at least one other person will be born on the same day as you.
In a room of 100 people, estimate the probability that at least two people were not only born on the same day, but also during the same hour of the same day. (For example, both were born between 2 and 3.)
Assuming that there are no leapday babies and that all birthdays are equally likely, estimate the probability that at least three people have the same birthday in a group of 50 people. (Hint: try using table.)
If 100 balls are randomly placed into 20 urns, estimate the probability that at least one of the urns is empty.
A standard deck of cards has 52 cards, four each of 2,3,4,5,6,7,8,9,10,J,Q,K,A. In blackjack, a player gets two cards and adds their values. Cards count as their usual numbers, except Aces are 11 (or 1), while K, Q, J are all 10.
1. “Blackjack” means getting an Ace and a value ten card. What is probability of getting a blackjack?
2. What is probability of getting 19? (The probability that the sum of your cards is 19, using Ace as 11)
Use R to simulate dealing two cards, and compute these probabilities experimentally.
Ultimate frisbee players are so poor they don’t own coins. So, team captains decide which team will play offense first by flipping frisbees before the start of the game. Rather than flip one frisbee and call a side, each team captain flips a frisbee and one captain calls whether the two frisbees will land on the same side, or on different sides. Presumably, they do this instead of just flipping one frisbee because a frisbee is not obviously a fair coin - the probability of one side seems likely to be different from the probability of the other side.
1. Suppose you flip two fair coins. What is the probability they show different sides?
2. Suppose two captains flip frisbees. Assume the probability that a frisbee lands convex side up is $p$. Compute the probability (in terms of $p$) that the two frisbees match.
3. Make a graph of the probability of a match in terms of $p$.
4. One Reddit user flipped a frisbee 800 times and found that in practice, the convex side lands up 45% of the time. When captains flip, what is the probability of “same”? What is the probability of “different”?
5. What advice would you give to an ultimate frisbee team captain?
6. Is the two frisbee flip better than a single frisbee flip for deciding the offense?
Suppose you have two coins (or Frisbees) that land with Heads facing up with probability $p$, where $0 < p < 1$. One coin is red and the other is white. You toss both coins. Find the probability that the red coin is Heads, given that the red coin and the white coin are different.
Suppose that you have 10 boxes, numbered 0-9. Box $i$ contains $i$ red marbles and $9 - i$ blue marbles. You perform the following experiment. You pick a box at random, draw a marble and record its color. You then replace the marble back in the box, and draw another marble from the same box and record its color. You replace the marble back in the box, and draw another marble from the same box and record its color. So, all three marbles are drawn from the same box.
1. If you draw three consecutive red marbles, what is the probability that the 4th marble will also be red?
2. If you draw three consecutive red marbles, what is the probability that you chose the 9th box?