Simulating Probabilities

(Sections 6.1 - 6.2)

Probability and Expected Value

What is the Chance?

  • You pull three balls at random from an urn that contains 100 balls, 70 of which are red and 30 of which are black.
    • What is the chance that at least two of the balls are red?
  • Your favorite college basketball team is in the NCAA Tournament.
    • What is the chance that it will win the Championship?
  • You plan to live in Florida for the next ten years.
    • What is the chance that you will experience a Category 5 hurricane?

We are interested in finding the probability that a specified event will occur when a random process takes place.

Random Processes and Events

Random Process Event
selecting three balls at least two of them are red
March Madness your team wins it all
ten years in Florida category 5 storm hits at least once
ten people toss their hats at least one picks up her own hat

Definition of Probability

Probability of an Event

The long-run proportion of times that the event will occur if the underlying random process is repeated many, many times.

Monte Carlo Simulation

How to Estimate a Probability


Say you want the probability of being hit by a Cat-5 storm in the next ten years.

n <- a really big number
for (i in 1:n) {
  
  live in FL ten years
  
  record whether or not 
  a Cat-5 hit you
  (TRUE or FALSE)
}
probability <- 
  (number of TRUEs you recorded)/n
cat(
  "The chance of getting hit is ", 
  probability, ".", sep = ""
)

The bigger \(n\) is, the better your estimate of the probability should be.

This is Crazy!

  • It takes \(10n\) years to live in FL for ten years, \(n\) times!
  • And it’s expensive!
  • And you would need a time machine to repeat the process under the same conditions:
    • live in FL ten years
    • enter time machine, go back 10 years
    • live in FL another 10 years …

How do we get around this?

Monte Carlo Simulation


The solution is to make the computer model (simulate) the random process and keep track of the results.


Repetition will deliver results of each “trial” of the modeled random process.


This is called Monte Carlo simulation.

Example #1: Picking From a Box

A Box

Consider a box that holds ten tickets. Three of them are labeled with the letter “a”; the rest are labeled “b”.

Our random process:

We will draw one ticket from the box at random. (Each ticket is equally likely to be picked.)

Our probability question:

What is the probability of drawing an “a”-ticket?

Modeling the Box


tickets <- c(rep("a", 3), rep("b", 7))



tickets
 [1] "a" "a" "a" "b" "b" "b" "b" "b" "b" "b"

Modeling the Random Process


We want to model (simulate) the random process of picking a ticket from the box.

We need a good function for this.

Modeling the Process: the sample() Function


Try this:

sample(tickets, size = 1)


[1] "b"

Try it a few more times.

  • first argument is the vector to sample from
  • size says how many times to repeat the sampling

Repetition

To get an estimate of probability, we need to repeat the random process a number of times. (Let’s do 1000 times.)

You can do this with a loop:


results <- logical(length = 1000)
for ( i in 1:1000 ) {
  selection <- sample(tickets, size = 1)
  got_a <- selection == "a"
  results[i] <- got_a
}
results[1:10]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

Estimate Probability of a-Ticket


Count how many times we got a-ticket:

sum(results)
[1] 315

Divide by the number of times we simulated:

sum(results) / 1000
[1] 0.315

The mean() will do this all at once:

mean(results)
[1] 0.315

Summary of Our Work

## set up the box:
tickets <- c(rep("a", 3), rep("b", 7))
## decide how many times to simulate:
reps <- 1000
## set up vector to store results:
results <- logical(length = reps)
## get many results with loop:
for (i in 1:reps) {
  ## pick from the box once:
  selection <- sample(tickets, size = 1) 
  ## figure out whether the event occurred:
  got_a <- selection == "a"
  ## store in results vector:
  results[i] <- got_a
}
## compute estimated probability of a:
mean(results)

Encapsulate Into a Function

ticket_sim <- function(reps) {
  tickets <- c(rep("a", 3), rep("b", 7))
  results <- logical(length = reps)
  for (i in 1:reps) {
    selection <- sample(tickets, size = 1)
    ## determine / store all in one line:
    results[i] <- selection == "a"
  }
  mean(results)
}

Try It Out With More Reps


ticket_sim(reps = 10000)
[1] 0.2957

Probability Estimation Template (Looping)

In general, when we use a loop to perform a Monte-Carlo simulation to estimate a probability, our code will follow this template:

results <- logical(length = reps_desired)
for (i in 1:reps_desired {
  ## some code to make the random process happen once
  ## more code to find out whether the event occurred
  results[i] <- whether_event_occurred
}
mean(results)

Repetition (no loop)


Actually, sample() can handle the repetition for you:


sims <- sample(tickets, size = 20, replace = TRUE)
sims
 [1] "a" "b" "a" "b" "b" "b" "b" "a" "b" "a" "b" "a" "b" "a" "b" "b" "a" "b" "a"
[20] "a"


replace says whether or not to replace the item you picked, after you sample.

No-Loop Version

ticket_sim_2 <- function(reps) {
  tickets <- c(rep("a", 3), rep("b", 7))
  ## can get all simulations at once:
  selections <- sample(tickets, size = reps, replace = TRUE)
  ## do all figuring in one big logical vector:
  got_a <- selections == "a"
  ## report estimated probability:
  mean(got_a)
}

Repeat MANY Times

How about repeating more times, say 100,000 times?

ticket_sim_2(reps = 100000)
[1] 0.2997


With more repetitions, we get a better estimate.

Law of Large Numbers

The Law of Large Numbers says:

The more times you repeat the underlying random process, the more accurate your estimate of the desired probability (or expected value) is liable to be.

Example #2: Tossing Hats

The Scenario


\(n\) people toss their hats into the air; each one randomly picks up a hat.


What is the chance that at least one person picks up his/her own hat?

Represent the People (and Their Hats)


Say you have ten people. Here they are:

people <- 1:10
hats <- 1:10

Simulate One Toss


Shuffle the hats randomly among the people:

hatsPickedUp <- sample(hats, size = 10, replace = FALSE)
people
 [1]  1  2  3  4  5  6  7  8  9 10
hatsPickedUp
 [1]  1  9 10  5  4  3  2  7  8  6

Determine Whether the Event Occurred

Did at least one person get her own hat?

people == hatsPickedUp
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
any(people == hatsPickedUp)
[1] TRUE

A Helper-Function

This function will help us understand one toss:

hatToss <- function(n) {
  ## set up people:
  people <- 1:n
  hats <- 1:n
  ## do one toss
  hatsPickedUp <- 
    sample(
      hats, size = n, 
      replace = FALSE
  )
  ## figure who gets own hat:
  response <- 
    ifelse(
      people == hatsPickedUp, 
      "WOO-HOO!", "meh"
    )
  ## we will learn about
  ## data framesin ch. 7:
  data.frame(
    people, 
    hatsPickedUp, 
    response
  )
}

hatToss(n = 10)
people hatsPickedUp response
1 8 meh
2 4 meh
3 6 meh
4 5 meh
5 7 meh
6 3 meh
7 2 meh
8 10 meh
9 9 WOO-HOO!
10 1 meh

Simulation Function


Parameters:

  • n is how many people are involved
  • reps is how many repetitions

Return: Estimate of probability

hatTossSim <- function(n, reps) {
  people <- 1:n
  hats <- 1:n
  results <- logical(reps)
  for (i in 1:reps) {
    hatsPickedUp <-
      sample(
        hats, size = n, 
        replace = FALSE)
    gotMatch <- 
      any(people == hatsPickedUp)
    results[i] <- gotMatch
  }
  mean(results)
}

Try It!


hatTossSim(n = 10, reps = 10000)
[1] 0.6286


Fun Fact:

As the number of people involved increases, the probability that at least one person gets her own hat converges to:

\[1- 1/\mathrm{e} \approx 0.63.\]