The box model is a simple (but quite general!) way to think about chance processes.
In the box model we have a single box that contains tickets with numbers written on those tickets.
We can design a box by filling in the following sentence:
Box Model: Draw _________ times (with replacement) from the box ___________________.
Examples
We use the box model an analogy. The key is to find the analogous box model for a chance process. Once we can find the box model, it simplifies our later calculations.
Examples
The sum is random, so it varies; we cannot usually know it before completing the chance process.
But we can fill in the following sentence: The sum of the draws will be about _____, give or take _____ or so.
We call the first blank the expected value (of the sum).
We call the second blank the standard error (of the sum).
Each chance process results in a single (random) value. By analogy, this random value corresponds to the sum of the draws from the box model. Imagine repeating the chance process again, and again, and again,…, so that we have an infinite number of sums.
The average of this hypothetical infinite number of sums is the expected value (of the sum).
The SD of this hypothetical infinite number of sums is the standard error (of the sum).
These exactly match our concepts of average and SD, except these new concepts apply to a hypothetical, infinitely long list of sums generated by our chance process.
Just to be clear, we’ve got the following two sentences:
The box model allows us to easily calculate the expected values and standard error.
expected value = (number of draws) x (average of box)
standard error = (square root of number of draws) x (SD of box)
The hardest part of working with the box model is calculating the SD of a box. But there’s a shortcut.
If the box has only zeros and onces, I call this a 0-1 box. The SD of a 0-1 box is square root of [(fraction that are ones) x (fraction that are zeros)].
Example (in-class exercise process 1)
Suppose we’re rolling 10 dice and counting the total dots shown across the 10 dice. That’s like drawing 10 times (with replacement) from a box with tickets numbered one through six and summing the draws.
expected value = (number of draws) x (average of box) = 10 x 3.5 = 35
standard error = (square root of number of draws) x (SD of box) = 3.16 x 1.71 = 5.40
Then, we can say that the sum of the draws will be about 35, give or take 5.40 or so.
We can easily check this in R.
# create box model
box <- 1:6
n_draws <- 10
# expected value
mean(box)*n_draws
## [1] 35
# standard error
# note: the difference between n and n-1 is large enough to matter here,
# so don't use sd(box)
dev <- box - mean(box)
sd_of_box <- sqrt(mean(dev^2))
sqrt(n_draws)*sd_of_box
## [1] 5.400617
# for comparison only--too big
sqrt(n_draws)*sd(box)
## [1] 5.91608
# repeat chance process many times
n_simulations <- 10000 # large enough to approximate infinity
sums <- numeric(n_simulations)
for (i in 1:n_simulations) {
draws <- sample(box, n_draws, replace = TRUE) # draw from box
sums[i] <- sum(draws) # sum the draws and store
}
# summarize the large list of sums
sums[1:10] # are these expected value g.o.t. std. error or so?
## [1] 38 22 36 35 25 31 25 46 32 34
mean(sums) # about the expected value
## [1] 34.9324
sd(sums) # about the
## [1] 5.429934