Further Simulation Examples

(Sections 6.5 - 6.7)

Will They Connect?

The Scenario: Anna and Raj

  • Anna and Raj make a date for coffee tomorrow at the local Coffee Shop.
  • Both of them forget the exact time they agreed to meet: they can only remember that it was to be sometime between 10 and 11am.
  • Each person, independently of the other, randomly picks a time between 10 and 11 to arrive.
  • Each person is willing to wait ten minutes for the other person to arrive.

Our Question

What is the chance that they meet?

Random Process

The random process is:

From 10 to 11am, watch for Anna and Raj enter/leave the coffeeshop.

Modeling involves getting Anna and Raj’s arrival times (in minutes after 10am):

anna <- runif(1, min = 0, max = 60)
raj <- runif(1, min = 0, max = 60)

Key Idea

Anna and Raj will connect if their arrival times are no more than 10 minutes apart.

That is, they connect if and only if

\[-10 < anna - raj < 10,\]

which is the same as

\[\vert\, anna - raj \,\vert < 10.\]

What Happens This Time

c(anna = anna, raj = raj)
    anna      raj 
12.58343 28.44067 
anna - raj
[1] -15.85724
abs(anna - raj)
[1] 15.85724

They do not connect.

Repetition

runif() can generate many values. Let’s use this fact to repeat eight times.

anna <- runif(8, min = 0, max = 60)
raj <- runif(8, min = 0, max = 60)

anna
[1] 12.583430 28.440668  8.108124 29.213319 35.928174 47.170376 56.500402
[8] 19.749677
raj
[1] 31.10838 43.97309 47.08549 40.77307 32.72790 53.82384 38.61780 25.83640

See When They Met

anna - raj
[1] -18.524952 -15.532418 -38.977363 -11.559753   3.200269  -6.653462  17.882606
[8]  -6.086727
abs(anna - raj)
[1] 18.524952 15.532418 38.977363 11.559753  3.200269  6.653462 17.882606
[8]  6.086727
abs(anna - raj) < 10
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE

In a Table

anna raj arrival_diff do_they_meet
12.583430 31.10838 18.524952 FALSE
28.440668 43.97309 15.532418 FALSE
8.108124 47.08549 38.977363 FALSE
29.213319 40.77307 11.559753 FALSE
35.928174 32.72790 3.200269 TRUE
47.170376 53.82384 6.653462 TRUE
56.500402 38.61780 17.882606 FALSE
19.749677 25.83640 6.086727 TRUE

Encapsulate in a Function

meetupSim <- function(reps = 10000, table = FALSE, 
                      seed = NULL) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  anna <- runif(reps, 0, 60)
  raj <- runif(reps, 0, 60)
  connect <- (abs(anna - raj) < 10)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(connect))
    cat("\n")
  }
  cat(
    "The proportion of times they met was ", 
    mean(connect), ".\n", sep = ""
    )
}

Try It Out

meetupSim(table = TRUE, seed = 3939)
Here is a table of the results:

connect
FALSE  TRUE 
 6955  3045 

The proportion of times they met was 0.3045.

Vectorization vs. Looping

Two Ways to Repeat

  • to repeat the random process, we can always use a loop.
  • Sometimes, we are able to use vectorization to get all simulations at once.
  • Does it make a difference?

A Looping Meetup Simulation

meetupSim2 <- function(reps = 10000, table = FALSE, 
                       seed = NULL) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
   # set up vector to hold results
  connect <- logical(reps) 
  # loop to repeat random process
  for ( i in 1:reps ) { 
    # get one arrival time for anna
    anna <- runif(1, 0, 60)  
    # and a time for raj
    raj <- runif(1, 0, 60)
    they_meet <- abs(anna - raj) < 10
    # store in results vector:
    connect[i] <- they_meet
  }
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(connect))
    cat("\n")
  }
  cat(
    "The proportion of times they met was ", 
    mean(connect), ".\n", sep = ""
  )
}

Comparison

system.time() records time to perform a task.

system.time(meetupSim(reps = 10000, seed = 4040))
The proportion of times they met was 0.3079.
   user  system elapsed 
  0.003   0.000   0.003 
system.time(meetupSim2(reps = 10000, seed = 4040))
The proportion of times they met was 0.2993.
   user  system elapsed 
  0.012   0.000   0.012 

The Moral

Avoid looping if you can repeat with vectorization.

(But it doesn’t matter much, if your job is “small”.)

The Appeals Court Paradox

Scenario: An Appeals Court

  • Five judges on a court.
  • Majority vote decides the case.
  • Judges have different chances to make the correct decision:
    • Judge A has 95% chance to judge correctly;
    • Judge B has 94% chance to judge correctly;
    • Judge C has 90% chance to judge correctly;
    • Judge D has 90% chance to judge correctly;
    • Judge E has 80% chance to judge correctly.

What’s the chance they will vote the correct way?

Five Judges

Set up the scenario in R:

aProb <- 0.95
bProb <- 0.94
cProb <- 0.90
dProb <- 0.90
eProb <- 0.80

Seven Cases

Imagine (for now) that they they hear seven cases:

reps <- 7

How it goes for Judge A:

a <- sample(
  c(0, 1), size = reps, 
  prob = c(1 - aProb, aProb), 
  replace = TRUE
)

a
[1] 1 1 1 1 0 1 1
  • 0 means: Judge A makes the wrong decision;
  • 1 means: Judge A gets it right.

The Other Judges

b <- sample(
    c(0, 1),
    size = reps, prob = c(1 - bProb, bProb),
    replace = TRUE
  ) # Judge B
  c <- sample(
    c(0, 1),
    size = reps, prob = c(1 - cProb, cProb),
    replace = TRUE
  ) # Judge C
  d <- sample(
    c(0, 1),
    size = reps, prob = c(1 - dProb, dProb),
    replace = TRUE
  ) # Judge D
  e <- sample(
    c(0, 1),
    size = reps, prob = c(1 - eProb, eProb),
    replace = TRUE
  ) # Judge E

Correct Votes

Add to find the number of correct votes in each case:

correctVotes <- a + b + c + d + e

correctVotes
[1] 5 4 5 5 3 4 5

Getting Case Right

Now we can find whether they got each case right or wrong:

courtCorrect <- (correctVotes >= 3)
courtCorrect
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

The Court’s Decisions

A B C D E sum correct
1 1 1 1 1 5 TRUE
1 1 1 1 0 4 TRUE
1 1 1 1 1 5 TRUE
1 1 1 1 1 5 TRUE
0 1 1 1 0 3 TRUE
1 1 0 1 1 4 TRUE
1 1 1 1 1 5 TRUE

Encapuslate into a Function

courtSim <- function(reps,
                     probs = c(0.95, 0.94, 0.90, 0.90, 0.80),
                     seed = NULL) {
  
  if (!is.null(seed)) {
    set.seed(seed)
  }
  # get the probabilities
  aProb <- probs[1]
  bProb <- probs[2]
  cProb <- probs[3]
  dProb <- probs[4]
  eProb <- probs[5]
  # simulate decisions of each judge:
  a <- sample(
    c(0, 1),
    size = reps, prob = c(1 - aProb, aProb),
    replace = TRUE
  ) # Judge A
  b <- sample(
    c(0, 1),
    size = reps, prob = c(1 - bProb, bProb),
    replace = TRUE
  ) # Judge B
  c <- sample(
    c(0, 1),
    size = reps, prob = c(1 - cProb, cProb),
    replace = TRUE
  ) # Judge C
  d <- sample(
    c(0, 1),
    size = reps, prob = c(1 - dProb, dProb),
    replace = TRUE
  ) # Judge D
  e <- sample(
    c(0, 1),
    size = reps, prob = c(1 - eProb, eProb),
    replace = TRUE
  ) # Judge E

  # count the number of correct votes in each case:
  correctVotes <- a + b + c + d + e
  # determine whether court decided correctly, in each case:
  courtCorrect <- (correctVotes >= 3)
  # report estimate of probability
  mean(courtCorrect)
}

Try It Out

courtSim(reps = 100000, see = 2020)
[1] 0.99254

Leaning on Judge E

Suppose the judges convince Judge E (the weakest one) to always vote the same as Judge A (the best one) does.

Will this increase the court’s chance of making the correct decision?

Simulation

courtSim2 <- function(reps = 10000,
                      probs = c(0.95, 0.94, 0.90, 0.90),
                      seed = NULL) {
  
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  
  # get the probabilities
  aProb <- probs[1]
  bProb <- probs[2]
  cProb <- probs[3]
  dProb <- probs[4]
  
  # simulate decisions:             
  a <- sample(
    c(0, 1),
    size = reps, prob = c(1 - aProb, aProb),
    replace = TRUE
  ) # Judge A
  b <- sample(
    c(0, 1),
    size = reps, prob = c(1 - bProb, bProb),
    replace = TRUE
  ) # Judge B
  c <- sample(
    c(0, 1),
    size = reps, prob = c(1 - cProb, cProb),
    replace = TRUE
  ) # Judge C
  d <- sample(
    c(0, 1),
    size = reps, prob = c(1 - dProb, dProb),
    replace = TRUE
  ) # Judge D
  
  # count the number of correct votes in each case:
  correctVotes <- 2*a + b + c + d
  
  # determine whether court decided correctly, in each case:
  courtCorrect <- (correctVotes >= 3)
  # report estimate of probability
  mean(courtCorrect)
}

Try It

courtSim2(reps = 100000, seed = 2020)
[1] 0.98808

Interesting:

Kicking out the worst judge made the court not as good!

The Number Needed

The Scenario

Consider this chance process:

  • Pick a real number \(x_1\) between 0 and 1 at random.
  • Pick another real number \(x_2\) at random. Add it to \(x_1\).
  • If the sum \(x_1 + x_2\) exceeds 1, then stop.
  • Otherwise, pick a third real number \(x_3\), and add it to the sum of the previous two.
  • If the sum \(x_1+x_2+x_3\) exceeds 1, then stop.
  • Otherwise, pick a fourth real number …
  • Keep going until the sum of the numbers picked exceeds 1.

Let \(X\) be the number of numbers you have to pick.

Modeling the Number Needed

Here is a function to simulate one “try” of \(X\):

numberNeeded <- function(verbose = FALSE) {
  mySum <- 0
  count <- 0
  while (mySum < 1) {
    number <- runif(1)
    mySum <- mySum + number
    count <- count + 1
    if (verbose) {
      cat("Just now picked ", number, ".\n", sep = "")
      cat(count, " number(s) picked so far.\n", sep = "")
      cat("Their sum is ", mySum, ".\n\n", sep = "")
    }
  }
  count
}

Try It Out

numberNeeded(verbose = TRUE)

Try it a few times.

Repeating the Process

needed <- numeric(1000)
for (i in 1:1000) {
  needed[i] <- numberNeeded()
}
cat("The expected number needed is about ",
  mean(needed), ".\n",
  sep = ""
)
The expected number needed is about 2.697.
table(needed)
needed
  2   3   4   5   6   7 
528 303 123  37   8   1 

Distribution of the Number Needed

prop.table(table(needed))
needed
    2     3     4     5     6     7 
0.528 0.303 0.123 0.037 0.008 0.001 

This estimates the distribution of \(X\), the number needed.

Encapsulate and Generalize

First, revise the function numberNeeded() to handle any target number.

numberNeeded <- function(target) {
    mySum <- 0
    count <- 0
    while( mySum < target ) {
      number <- runif(1)
     mySum <- mySum + number
      count <- count + 1
    }
    count
  }

Encapsulate

Now we can make a good simulation fuction:

numberNeededSim <- function(target = 1, reps = 1000) {
  # set up vector for results
  needed <- numeric(reps)
  # loop to repeat random process
  for (i in 1:reps) {
    # store result each time through
    needed[i] <- numberNeeded(target)
  }
  mean(needed)
}

Generalize (add seed and report options)

numberNeededSim <- function(target = 1, reps = 1000,
                            seed = NULL, report = FALSE) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
  needed <- numeric(reps)
  for (i in 1:reps) {
    needed[i] <- numberNeeded(target)
  }
  if (report) {
    print(prop.table(table(needed)))
    cat("\n")
    cat("The expected number needed is about ",
      mean(needed), ".\n",
      sep = ""
    )
  }
  mean(needed)
}

Try It Out

numberNeededSim(
  target = 1, reps = 10000, 
  seed = 4848, report = TRUE
)
needed
     2      3      4      5      6      7      8 
0.4937 0.3350 0.1313 0.0317 0.0071 0.0010 0.0002 

The expected number needed is about 2.7273.
[1] 2.7273

It is known that when the target number is 1 the expected number of numbers needed is:

\[\mathrm{e} \approx 2.71828 \ldots\]