The box model is a simple (but quite general!) way to think about chance processes.

In the box model we have a single box that contains tickets with numbers written on those tickets.

  1. Design the box.
    1. What numbers do you want on the tickets?
    2. How many of each number?
    3. How many times will you draw (usually with replacement)?
  2. Draw from the box the chosen number of times (usually with replacement) and record each draw.
  3. Add up (sum) all the draws. (We focus on the sum because it’s a general statistic–one we usually use to calculate others like the average. For now, let’s worry about the sum of all the tickets, we’ll turn to the average, percentage, etc., later.)

We can design a box by filling in the following sentence:

Box Model: Draw _________ times (with replacement) from the box ___________________.

Examples

  1. Draw 10 times (with replacement) from the box with tickets numbered 1, 2, 3, 4, 5, and 6 and sum the draws.
  2. Draw 10 times (with replacement) from the box with tickets numbered 1, 1, 1, and 0 and sum the draws.
  3. Draw 57 times (with replacement) from the box with with six tickets numbered 1 and 13 tickets numbered 0 and sum the draws.

We use the box model an analogy. The key is to find the analogous box model for a chance process. Once we can find the box model, it simplifies our later calculations.

Examples

  1. Rolling six dice and counting the total dots shown is like drawing 6 times (with replacement) from a box with tickets numbered one through six and summing the draws.
  2. Rolling a die 25 times and counting the number of sixes shown across the 25 rolls is like drawing 25 times (with replacement) from a box five tickets numbered zero and a single ticket numbered one and summing the draws.
  3. An exam has 100 MC questions with 4 choices each. Each correct answer adds 1 point. Each incorrect answer deducts 4 points. If one completely guesses on this exam, then their score is like drawing 100 times (with replacement) from a box with one tickets numbered 1 and three tickets numbered -4 and summing the draws.

Properties of the Sum of the Draws

The sum is random, so it varies; we cannot usually know it before completing the chance process.

But we can fill in the following sentence: The sum of the draws will be about _____, give or take _____ or so.

We call the first blank the expected value (of the sum).

We call the second blank the standard error (of the sum).

Each chance process results in a single (random) value. By analogy, this random value corresponds to the sum of the draws from the box model. Imagine repeating the chance process again, and again, and again,…, so that we have an infinite number of sums.

The average of this hypothetical infinite number of sums is the expected value (of the sum).

The SD of this hypothetical infinite number of sums is the standard error (of the sum).

These exactly match our concepts of average and SD, except these new concepts apply to a hypothetical, infinitely long list of sums generated by our chance process.

Just to be clear, we’ve got the following two sentences:

  1. The entries in a (fixed, observed) list of numbers are about [the average], give or take [the SD] or so.
  2. The sum of the draws from a chance process will be about [the expected value], give or take [the standard error] or so.

The box model allows us to easily calculate the expected values and standard error.

expected value = (number of draws) x (average of box)

standard error = (square root of number of draws) x (SD of box)

A Shortcut

The hardest part of working with the box model is calculating the SD of a box. But there’s a shortcut.

If the box has only zeros and onces, I call this a 0-1 box. The SD of a 0-1 box is square root of [(fraction that are ones) x (fraction that are zeros)].

Example (in-class exercise process 1)

Suppose we’re rolling 10 dice and counting the total dots shown across the 10 dice. That’s like drawing 10 times (with replacement) from a box with tickets numbered one through six and summing the draws.

expected value = (number of draws) x (average of box) = 10 x 3.5 = 35

standard error = (square root of number of draws) x (SD of box) = 3.16 x 1.71 = 5.40

Then, we can say that the sum of the draws will be about 35, give or take 5.40 or so.

We can easily check this in R.

# create box model
box <- 1:6
n_draws <- 10

# expected value
mean(box)*n_draws
## [1] 35
# standard error
# note: the difference between n and n-1 is large enough to matter here, 
# so don't use sd(box)
dev <- box - mean(box)
sd_of_box <- sqrt(mean(dev^2))
sqrt(n_draws)*sd_of_box
## [1] 5.400617
# for comparison only--too big
sqrt(n_draws)*sd(box)
## [1] 5.91608
# repeat chance process many times
n_simulations <- 10000  # large enough to approximate infinity
sums <- numeric(n_simulations)
for (i in 1:n_simulations) {
  draws <- sample(box, n_draws, replace = TRUE)  # draw from box
  sums[i] <- sum(draws)  # sum the draws and store
}

# summarize the large list of sums
sums[1:10]  # are these expected value g.o.t. std. error or so?
##  [1] 38 22 36 35 25 31 25 46 32 34
mean(sums) # about the expected value
## [1] 34.9324
sd(sums)  # about the 
## [1] 5.429934

Creative Commons License
Carlisle Rainey