Further Simulation Examples

(Sections 6.5 - 6.7)

Will They Connect?

The Scenario: Anna and Raj

  • Anna and Raj make a date for coffee tomorrow at the local Coffee Shop.
  • Both of them forget the exact time they agreed to meet: they can only remember that it was to be sometime between 10 and 11am.
  • Each person, independently of the other, randomly picks a time between 10 and 11 to arrive.
  • Each person is willing to wait ten minutes for the other person to arrive.

Our Question

What is the chance that they meet?

This app will help you understand the situation.

Random Process

The random process is:

From 10 to 11am, watch for Anna and Raj enter/leave the coffeeshop.

Modeling involves getting Anna and Raj’s arrival times (in minutes after 10am):

Key Idea

Anna and Raj will connect if their arrival times are no more than 10 minutes apart.

That is, they connect if and only if

\[-10 < anna - raj < 10,\]

which is the same as

\[\vert\, anna - raj \,\vert < 10.\]

What Happens

Try this a few times:

Repetition

runif() can generate many values. Let’s use this fact to repeat eight times.

Encapsulate Simulations into a Function

meetupSim <- function(reps = 10000, table = FALSE, 
                      seed = NULL) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
  anna <- runif(reps, 0, 60)
  raj <- runif(reps, 0, 60)
  connect <- (abs(anna - raj) < 10)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(connect))
    cat("\n")
  }
  cat(
    "The proportion of times they met was ", 
    mean(connect), ".\n", sep = ""
    )
}

Try It Out

Vectorization vs. Looping

Two Ways to Repeat

  • to repeat the random process, we can always use a loop.
  • Sometimes, we are able to use vectorization to get all simulations at once.
  • Does it make a difference?

A Looping Meetup Simulation

meetupSim2 <- function(reps = 10000, table = FALSE, 
                       seed = NULL) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
   # set up vector to hold results
  connect <- logical(reps) 
  # loop to repeat random process
  for (i in 1:reps) { 
    # get one arrival time for anna
    anna <- runif(1, 0, 60)  
    # and a time for raj
    raj <- runif(1, 0, 60)
    they_meet <- abs(anna - raj) < 10
    # store in results vector:
    connect[i] <- they_meet
  }
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(connect))
    cat("\n")
  }
  cat(
    "The proportion of times they met was ", 
    mean(connect), ".\n", sep = ""
  )
}

Comparison

system.time() records time to perform a task.

Note: If you ran this code in R Studio, it would be able to compute the “user” and “system” times, which add up to the total “elapsed” time”.

The Moral

Avoid looping if you can repeat with vectorization.

(But it doesn’t matter much, if your job is “small”.)

The Appeals Court Paradox

Scenario: An Appeals Court

  • Five judges on a court.
  • Majority vote decides the case.
  • Judges have different chances to make the correct decision:
    • Judge A has 95% chance to judge correctly;
    • Judge B has 94% chance to judge correctly;
    • Judge C has 90% chance to judge correctly;
    • Judge D has 90% chance to judge correctly;
    • Judge E has 80% chance to judge correctly.

What’s the chance they will vote the correct way?

Try this app to understand the situation.

Five Judges

Set up the scenario in R:

aProb <- 0.95
bProb <- 0.94
cProb <- 0.90
dProb <- 0.90
eProb <- 0.80

Now imagine that they hear a case.

How it goes for Judge A:

  • 0 means: Judge A makes the wrong decision;
  • 1 means: Judge A gets it right.

All Judges Hear the Case:

Correct Votes

Add to find the number of correct votes in each case:

Getting Case Right

Now we can find whether they got each case right or wrong:

Put It Together

Try it for yourself a few times, from scratch:

Encapuslate into a Function

courtSim <- function(reps,
                     probs = c(0.95, 0.94, 0.90, 0.90, 0.80),
                     seed = NULL) {
  
  if (!is.null(seed)) {
    set.seed(seed)
  }
  # get the probabilities
  aProb <- probs[1]
  bProb <- probs[2]
  cProb <- probs[3]
  dProb <- probs[4]
  eProb <- probs[5]
  # simulate decisions of each judge:
  a <- sample(
    c(0, 1),
    size = reps, prob = c(1 - aProb, aProb),
    replace = TRUE
  ) # Judge A
  b <- sample(
    c(0, 1),
    size = reps, prob = c(1 - bProb, bProb),
    replace = TRUE
  ) # Judge B
  c <- sample(
    c(0, 1),
    size = reps, prob = c(1 - cProb, cProb),
    replace = TRUE
  ) # Judge C
  d <- sample(
    c(0, 1),
    size = reps, prob = c(1 - dProb, dProb),
    replace = TRUE
  ) # Judge D
  e <- sample(
    c(0, 1),
    size = reps, prob = c(1 - eProb, eProb),
    replace = TRUE
  ) # Judge E

  # count the number of correct votes in each case:
  correctVotes <- a + b + c + d + e
  # determine whether court decided correctly, in each case:
  courtCorrect <- (correctVotes >= 3)
  # report estimate of probability
  mean(courtCorrect)
}

Try It Out

Leaning on Judge E

Suppose the judges convince Judge E (the weakest one) to always vote the same as Judge A (the best one) does.

Will this increase the court’s chance of making the correct decision?

Simulation

courtSim2 <- function(reps = 10000,
                      probs = c(0.95, 0.94, 0.90, 0.90),
                      seed = NULL) {
  
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  
  # get the probabilities
  aProb <- probs[1]
  bProb <- probs[2]
  cProb <- probs[3]
  dProb <- probs[4]
  
  # simulate decisions:             
  a <- sample(
    c(0, 1),
    size = reps, prob = c(1 - aProb, aProb),
    replace = TRUE
  ) # Judge A
  b <- sample(
    c(0, 1),
    size = reps, prob = c(1 - bProb, bProb),
    replace = TRUE
  ) # Judge B
  c <- sample(
    c(0, 1),
    size = reps, prob = c(1 - cProb, cProb),
    replace = TRUE
  ) # Judge C
  d <- sample(
    c(0, 1),
    size = reps, prob = c(1 - dProb, dProb),
    replace = TRUE
  ) # Judge D
  
  # count the number of correct votes in each case:
  correctVotes <- 2*a + b + c + d
  
  # determine whether court decided correctly, in each case:
  courtCorrect <- (correctVotes >= 3)
  # report estimate of probability
  mean(courtCorrect)
}

Try It

Compare with using all five judges:

Interesting!

Kicking out the worst judge made the court not as good!

Does this make sense?

The Number Needed

The Scenario

Consider this chance process:

  • Pick a real number \(x_1\) between 0 and 1 at random.
  • Pick another real number \(x_2\) at random. Add it to \(x_1\).
  • If the sum \(x_1 + x_2\) exceeds 1, then stop.
  • Otherwise, pick a third real number \(x_3\), and add it to the sum of the previous two.
  • If the sum \(x_1+x_2+x_3\) exceeds 1, then stop.
  • Otherwise, pick a fourth real number …
  • Keep going until the sum of the numbers picked exceeds 1.

The Number Needed

Let \(X\) be the number of numbers you have to pick. Let’s estimate its expected value.

This app will help you understand the situation.

Modeling the Number Needed

Here is a function to simulate one “try” of \(X\):

numberNeeded <- function(verbose = FALSE) {
  # set up initial sum:
  my_sum <- 0
  # set up initial count of numbers selected:
  count <- 0
  # repeatedly select numbers until sum exceeds 1:
  while (my_sum < 1) {
    # select a random real number between 0 and 1:
    random_real_number <- runif(1)
    # add it to the current sum:
    my_sum <- my_sum + random_real_number
    # increase the count of numbers by 1:
    count <- count + 1
    # report (if user wants a report):
    if (verbose) {
      cat("Just now picked ", random_real_number, ".\n", sep = "")
      cat(count, " number(s) picked so far.\n", sep = "")
      cat("Their sum is ", my_sum, ".\n\n", sep = "")
    }
  }
  count
}

Try It Out a Few Times

Repeating the Process, and Estimating Expected Value

Try this:

Distribution of the Number Needed

This estimates the distribution of \(X\), the number needed.

Encapsulate the Work

Now we can make a good simulation function:

numberNeededSim <- function(reps = 1000, seed = NULL, 
                            report = FALSE) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
  needed <- numeric(reps)
  for (i in 1:reps) {
    how_many_needed <- numberNeeded()
    needed[i] <- how_many_needed
  }
  # a table of proportions (if the user wants one):
  if (report) {
    cat("Proportions table of simulated values:\n")
    print(prop.table(table(needed)))
  }
  # make sure to return the estimate of the expected value:
  mean(needed)
}

Try It Out

Fun Fact

It is known that the expected number of numbers needed is:

\[\mathrm{e} \approx 2.71828 \ldots\]