(Sections 6.1 - 6.2)
We are interested in finding the probability that a specified event will occur when a random process takes place.
Random Process | Event |
---|---|
selecting three balls | at least two of them are red |
March Madness | your team wins it all |
ten years in Florida | category 5 storm hits at least once |
ten people toss their hats | at least one picks up her own hat |
Probability of an Event
The long-run proportion of times that the event will occur if the underlying random process is repeated many, many times.
Say you want the probability of being hit by a Cat-5 storm in the next ten years.
The bigger \(n\) is, the better your estimate of the probability should be.
How do we get around this?
The solution is to make the computer model (simulate) the random process and keep track of the results.
Repetition will deliver results of each “trial” of the modeled random process.
This is called Monte Carlo simulation.
Consider a box that holds ten tickets. Three of them are labeled with the letter “a”; the rest are labeled “b”.
Our random process:
We will draw one ticket from the box at random. (Each ticket is equally likely to be picked.)
Our probability question:
What is the probability of drawing an “a”-ticket?
We want to model (simulate) the random process of picking a ticket from the box.
We need a good function for this.
sample()
FunctionTry this:
[1] "b"
Try it a few more times.
size
says how many times to repeat the samplingTo get an estimate of probability, we need to repeat the random process a number of times. (Let’s do 1000 times.)
Count how many times we got a-ticket:
## set up the box:
tickets <- c(rep("a", 3), rep("b", 7))
## decide how many times to simulate:
reps <- 1000
## set up vector to store results:
results <- logical(length = reps)
## get many results with loop:
for (i in 1:reps) {
## pick from the box once:
selection <- sample(tickets, size = 1)
## figure out whether the event occurred:
got_a <- selection == "a"
## store in results vector:
results[i] <- got_a
}
## compute estimated probability of a:
mean(results)
In general, when we use a loop to perform a Monte-Carlo simulation to estimate a probability, our code will follow this template:
results <- logical(length = reps_desired)
for (i in 1:reps_desired {
## some code to make the random process happen once
## more code to find out whether the event occurred
results[i] <- whether_event_occurred
}
mean(results)
Actually, sample()
can handle the repetition for you:
How about repeating more times, say 100,000 times?
The Law of Large Numbers says:
The more times you repeat the underlying random process, the more accurate your estimate of the desired probability (or expected value) is liable to be.
\(n\) people toss their hats into the air; each one randomly picks up a hat.
What is the chance that at least one person picks up his/her own hat?
Say you have ten people. Here they are:
Shuffle the hats randomly among the people:
Did at least one person get her own hat?
This function will help us understand one toss:
hatToss <- function(n) {
## set up people:
people <- 1:n
hats <- 1:n
## do one toss
hatsPickedUp <-
sample(
hats, size = n,
replace = FALSE
)
## figure who gets own hat:
response <-
ifelse(
people == hatsPickedUp,
"WOO-HOO!", "meh"
)
## we will learn about
## data framesin ch. 7:
data.frame(
people,
hatsPickedUp,
response
)
}
hatToss(n = 10)
people | hatsPickedUp | response |
---|---|---|
1 | 8 | meh |
2 | 4 | meh |
3 | 6 | meh |
4 | 5 | meh |
5 | 7 | meh |
6 | 3 | meh |
7 | 2 | meh |
8 | 10 | meh |
9 | 9 | WOO-HOO! |
10 | 1 | meh |
Parameters:
n
is how many people are involvedreps
is how many repetitionsReturn: Estimate of probability
Fun Fact:
As the number of people involved increases, the probability that at least one person gets her own hat converges to:
\[1- 1/\mathrm{e} \approx 0.63.\]