It was the first evening of the 2008 School of Information Theory, and Paul Cuff, then a grad student at Stanford, had just shared a probability puzzle about doors (not the Monty Hall problem). I don’t remember exactly how we started on puzzles, but I am pretty sure people had grown tired of my Ali G impersonation. At some point, I had shared a puzzle about dice that has a solution in the spirit of Paul’s puzzle, which may have been the inspiration.
We found a strategy that would work, but calculating the probability seemed a little tricky. When I asked Paul, he mentioned that the probability was about 1/3 but that he hadn’t gleaned any special insights from it. Thoughts about the puzzle were quickly replaced with the more common preoccupations of a final-year graduate student: finishing a dissertation and finding a job. The “about 1/3″ result entered my head as a folk theorem.
Recently, I posed the puzzle to Jim and Chrysteena, two friends in the San Francisco improv community. Jim designs puzzles and seemed dissatisfied when he evaluated the probability for the candidate solution purported to be the best. I quoted the folk theorem, but I was dissatisfied, as well, and I thought it was time to revisit the puzzle.
100 Doors Puzzle
There are 100 people and a room with 100 doors. Behind each door is the name of exactly one of the 100 people such that every name is behind exactly one door. The location of the names are unknown to the 100 people. The 100 people are then asked to play the following game:
- Each person is given a chance to go into the room and open 50 doors.
- After a person opens 50 doors, the room is reset to the state it was in before entry, and the person who entered may not communicate with any of the other people for the remainder of the game.
- The game ends in victory if and only if each of the 100 people opens the door with his or her name. In other words, if any one person fails, everyone loses.
If the location of the names behind the doors is chosen at random, find a strategy that maximizes the probability of victory.
It helps to look at cases with a smaller number of doors to build intuition. Let’s first consider two doors and two people, in which each person is allowed to open one door. Note that if both people open the same door, then they are guaranteed to lose since we require that they both find their names. Thus, the best strategy is for them to choose separate doors, at which point the probability of them winning is . Note that this matches the probability that one of them finds his or her name, so it is optimal.
Now let’s move to four doors, four people, in which each person is allowed to open two doors. After some thought, one strategy that would generalize to 100 doors is to arrange the names in alphabetical order and assign a number to each name based on where it is in the ordering, e.g. the first name in alphabetical order is assigned the number 1, and the last is assigned 100. Then, each person starts by opening the door with the number corresponding to his or her name. Upon opening the door, that person will see a new name, which has its own number associated with it in the alphabetical ordering, and will then proceed to open the door associated with this new number. The process repeats until all tries are exhausted.
If one follows this approach, the cases leading to victory are those in which the process of going through the subsequent doors will cycle through to each person’s name within 50 tries in the case of 100 doors, or within 2 tries in the case of 4 doors. For instance, with 4 doors, victory occurs in a case like the following:
Door 1 (Bob), Door 2 (Alice), Door 3 (Dennis), Door 4 (Cindy)
Bob would start at door 2, see Alice’s name and then go to door 1, where he would find his name. Following the approach for the others results in something similar. On the other hand, we fail if we have a case like this:
Door 1 (Bob), Door 2 (Cindy), Door 3 (Alice), Door 4 (Dennis)
Again, Bob would start at door 2, which would give him Cindy’s name and lead him to door 3, which would give him Alice’s name, by which point he has exhausted his two tries without finding his name. Since one person failed, everyone loses.
It turns out that for four doors, there are 10 cases among the 24 permutations that have these smaller cyles, resulting in a victory with probability , which is close to . For 100 doors, there’s the folk theorem.
Probability of Victory
One thing that has bothered me in the formulation of the puzzle is that the focus is to maximize the probability, but it’s unclear to me whether the candidate strategy above does this. If “about 1/3″ continues to hold as the probability of victory as the number of doors increases say even beyond 100, then at least there is a sense that this is a scaleable strategy if not the best since no strategy can exceed 1/2. On the other hand, if the probability decays to 0, it raises a new question to understand why that’s the case, if all strategies decay to a victory probability of 0 as the number of doors increase, or if there exists a strategy that is within some bound of .
We can start asking these questions if we have an expression for the probability of victory. Let’s consider doors, where represents the number of tries, and in the original problem. We have characterized the victory cases as those in which the permutation of doors can be represented as cycles of at most . To calculate the probability, we can use the cycle index of the symmetric group as
where denotes cycles of length .
It turns out the cycle index of the symmetric group can be expressed as the recurrence
Let’s define to highlight the dependence on the cycles. Thus, we can rewrite the above equation as
Let’s define the cumulative distribution function (CDF) as evaluated at and . In words, is the probability that all cycles are of length or less.
Thus, if we can find an expression for , we can evaluate it at to get our result. The results are contained in Propositions 1 and 2 below, where Proposition 2 is reminiscent of our folk theorem, which yields a probability of , which is slightly less than 1/3. Furthermore, Corollary 1.1 indicates that we decrease monotonically to this probability, so regardless of the number of doors, this strategy is within of , the upper bound on how well one could possibly do.
Proposition 1. .
Proof: The first equality is by definition, so we simply need to show the second equality. For , we can evaluate the probabilities as
Note that if , then , so we can apply the same result to get that
and substituting this into our expression for yields
Likewise, one can similarly derive
Starting with the base case that and the above inductive steps, one can conclude that
However, , so we have our result.
Corollary 1.1. .
Proof: From Proposition 1, we can write
. Note that we can rewrite the summation
The key is to observe that this is simply a Riemann sum approximating the integral
which converges as we let .