**Readings/Playings**

Read this article on Gamasutra by designer Tyler Sigman. I affectionately refer to it as the “Orc Nostril Hair” article, but it provides a pretty good primer for probabilities in games.

**This Week’s Topic**

Up until now, nearly everything we’ve talked about was deterministic, and last week we really went deep into transitive mechanics and took them about as far as I can go with them. But so far we’ve ignored a huge aspect of many games, the non-deterministic aspects: in other words, randomness. Understanding the nature of randomness is important for game designers because we make systems in order to craft certain player experiences, so we need to know how those systems work. If a system includes a random input, we need to understand the *nature* of that randomness and how to modify it to get the results that we want.

**Dice**

Let’s start with something simple: a die-roll. When most people think of dice, they are thinking of six-sided dice, also known as d6s. But most gamers have encountered plenty of other dice: four-sided (d4), eight-sided (d8), d12, d20… and if you’re *really* geeky, you might have a d30 or d100 lying around. In case you haven’t seen this terminology before, “d” with a number after it means a die with that many sides; if there is a number *before* the “d” that is the *number* of dice rolled – so for example, in *Monopoly* you roll 2d6.

Now, when I say “dice” here, that is shorthand. We have plenty of other random-number generators that aren’t globs of plastic but that still serve the same function of generating a random number from 1 to *n*. A standard coin can be thought of as a d2. I’ve seen two designs for a d7, one which looks like a die and the other which looks more like a seven-sided wooden pencil. A four-sided dreidel (also known as a *teetotum*) is equivalent to a d4. The spinner that comes with Chutes & Ladders that goes from 1 to 6 is equivalent to a d6. A random-number generator in a computer might create a random number from 1 to 19 if the designer wants it to, even though there’s no 19-sided die inside there (I’ll actually talk a bit more about numbers from computers *next* week). While all of these things look different, really they are all equivalent: you have an equal chance of choosing one of several outcomes.

Dice have some interesting properties that we need to be aware of. The first is that each side is equally likely to be rolled (I’m assuming you’re using a fair die and not a rigged one). So, if you want to know the *average value* of a roll (also known as the “expected value” by probability geeks), just add up all the sides and divide by the *number* of sides. The average roll for a standard d6 is 1+2+3+4+5+6 = 21, divided by the number of sides (6), which means the average is 21/6 = 3.5. This is a special case, because we assume all results are equally likely.

What if you have custom dice? For example, I’ve seen one game with a d6 with special labels: 1, 1, 1, 2, 2, 3, so it behaves sort of like this weird d3 where you’re more likely to get a 1 than a 2 and more likely to get 2 than 3. What’s the average roll for this die? 1+1+1+2+2+3 = 10, divided by 6, equals 5/3 or about 1.66. So if you’ve got that custom die and you want players to roll three of them and add the results, you know they’ll roll an average of about a total of 5, and you can balance the game on that assumption.

*Dice and Independence*

As I said before, we’re going on the assumption that each roll is equally likely. This is true no matter how many dice are rolled. Each die roll is what we call *independent*, meaning that previous rolls do not influence later rolls. If you roll dice enough times you definitely *will* see “streaks” of numbers, like a run of high or low numbers or something, and we’ll talk later about why that is, but it doesn’t mean the dice are “hot” or “cold”; if you roll a standard d6 and get two 6s in a row, the probability of rolling another 6 is… exactly 1/6. It is not more likely because the die is “running hot”. It is not less likely because “we already got two 6s, so we’re due for something else”. (Of course, if you roll it twenty times and get 6 on each time, the odds of getting a 6 on the twenty-first roll are actually pretty good… because it probably means you have a rigged die!) But assuming a fair die, each roll is equally likely, independent of the others. If it helps, assume that we’re swapping out dice each time, so if you roll a 6 twice, remove that “hot” die from play and replace with a new, “fresh” d6. For those of you that knew this already, I apologize, but I need to be clear about that before we move on.

*Making Dice More or Less Random*

Now, let’s talk about how you can get different numbers from different dice. If you’re only making a single roll or a small number of rolls, the game will feel more random if you use more sides of the dice. The more you roll a die, or the more dice you roll, the more things will tend towards the average. For example, rolling 1d6+4 (that is, rolling a standard 6-sided die and adding 4 to the result) generates a number between 5 and 10. Rolling 5d2 also generates a number between 5 and 10. But the single d6 roll has an equal chance of rolling a 5, or 8, or 10; the 5d2 roll will tend to create more rolls of 7 and 8 than any other results. Same range, and even the same average value (7.5 in both cases), but the nature of the randomness is different.

Wait a minute. Didn’t I just say that dice don’t run hot or cold? And now I’m saying if you roll a lot of them, they tend towards an average? What’s going on?

Let me back up. If you roll a *single* die, each roll is equally likely. That means if you roll a lot, over time, you’ll roll each side about as often as the others. The more you roll, the more you’ll tend towards the average, collectively. This is not because previous numbers “force” the die to roll what hasn’t been rolled before. It’s because a small streak of 6s (or 20s or whatever) ends up not being much influence when you roll another ten thousand times and get mostly average rolls… so you might get a bunch of high numbers now, but you might also get a bunch of low numbers later, and over time it all tends to go towards the mean. Not because the die is being influenced by previous rolls (seriously, the die is a piece of *plastic*, is doesn’t exactly have a brain to be thinking “gosh, I haven’t come up 2 in awhile”), but because that’s just what tends to happen in large sets of rolls. Your small streak in a large ocean of die-rolls will be mostly drowned out.

So, doing the math for a random die roll is pretty straightforward, at least in terms of finding the average roll. There are also ways to quantify “how random” something is, a way of saying that 1d6+4 is “more random” than 5d2 in that it gives a more even spread, mostly you do that by computing something called “standard deviation” and the larger that is the more random it is, but that takes more computation than I want to get into today (I’ll get into it later on). All I’m asking you to know is that in general, fewer dice rolled = more random. While I’m on the subject, more faces on a die is also more random since you have a wider spread.

*Computing Probability through Counting*

You might be wondering: how can we figure out the exact probability of getting a specific roll? This is actually pretty important in a lot of games, because if you’re making a die roll in the first place there is probably some kind of optimal result. And the answer is, we count two things. First, count the total number of ways to roll dice (no matter what the result). Then, count the number of ways to roll dice that get the result you actually want. Divide the first number by the second and you’ve got your probability; multiply by 100 if you want the percentage.

*Examples*

Here’s a very simple example. You want to roll 4 or more on 1d6. There are 6 total possible results (1, 2, 3, 4, 5, or 6). Of those, 3 of the results (4, 5, or 6) are a success. So your probability is 3 divided by 6, or 0.5, or 50%.

Here’s a slightly more complicated example. You want to roll an even number on 2d6. There are 36 total results (6 for each die, and since neither die is influenced by the other you multiply 6 results by 6 to get 36). The tricky thing with questions like this, is that it’s easy to double-count. For example, there are actually two ways to roll the number 3 on 2d6: 1+2 and 2+1. Those look the same, but the difference is which number appears on the first die, and which appears on the second die. If it helps, think of each die as having a different color, so maybe you have a red die and a blue die in this case. And then you can count the ways to roll an even number: 2 (1+1), 4 (1+3), 4 (2+2), 4 (3+1), 6 (1+5), 6 (2+4), 6 (3+3), 6 (4+2), 6 (5+1), 8 (2+6), 8 (3+5), 8 (4+4), 8 (5+3), 8 (6+2), 10 (4+6), 10 (5+5), 10 (6+4), 12 (6+6). It turns out there are exactly 18 ways to do this out of 36, also 0.5 or 50%. Perhaps unexpected, but kind of neat.

*Monte Carlo Simulations*

What if you have too many dice to count this way? For example, say you want to know the odds of getting a combined total of 15 or more on a roll of 8d6. There are a LOT of different individual results of eight dice, so just counting by hand takes too long. Even if we find some tricks to group different sets of rolls together, it still takes a really long time. In this case the easiest way to do it is to stop doing math and start using a computer, and there are two ways to do this.

The first way gets you an exact answer but takes a little bit of programming or scripting. You basically have the computer run through every possibility inside a for loop, evaluating and counting up all the iterations total and also all the iterations that are a success, and then have it spit out the answers at the end. Your code might look something like this:

int wincount=0, totalcount=0;

for (int i=1; i<=6; i++) {

for (int j=1; j<=6; j++) {

for (int k=1; k<=6; k++) {

… // insert more loops here

if (i+j+k+… >= 15) {

wincount++;

}

totalcount++;

}

}

}

float probability = wincount/totalcount;

If you don’t know programming but you just need a ballpark answer and not an exact one, you can simulate it in Excel by having it roll 8d6 a few thousand times and take the results. To roll 1d6 in Excel, use this formula:

=FLOOR(RAND()*6)+1

When you don’t know the answer so you just try it a lot, there’s a name for that: *Monte Carlo simulation*, and it’s a great thing to fall back on when you’re trying to do probability calculations and you find yourself in over your head. The great thing about this is, we don’t have to know the math behind why it works, and yet we know the answer will be “pretty good” because like we learned before, the more times you do something, the more it tends towards the average.

*Combining Independent Trials*

If you’re asking for several repeated but independent trials, so the result of one roll doesn’t affect other rolls, we have an extra trick we can use that makes things a little easier.

How do you tell the difference between something that’s dependent and something that’s independent? Basically, if you can isolate each individual die-roll (or series) as a separate event, then it is independent. For example, rolling a total of 15 on 8d6 is *not* something that can be split up into several independent rolls. Since you’re summing the dice together, what you get on one die affects the required results of the others, because it’s only all of them added together to give you a single result.

Here’s an example of independent rolls: say you have a dice game where you roll a series of d6s. On the first roll, you have to get 2 or higher to stay in the game. On the second roll you have to get 3 or higher. Third roll requires 4 or higher, fourth roll is 5 or higher, and fifth roll requires a 6. If you make all five rolls successfully, you win. Here, the rolls are independent. Yes, if you fail one roll it affects the outcome of the entire dice game, but each individual roll is not influenced by the others; for example, if you roll really well on your second roll, that doesn’t make you any more or less likely to succeed on future rolls. Because of this, we can consider the probability of each roll separately.

**When you have separate, independent probabilities and you want to know what is the probability that all of them will happen, you take each of the individual probabilities and multiply them together. **Another way to think about this: if you use the word “and” to describe several conditions (as in, “what is the probability that some random event happens

*and*that some other independent random event happens?”), figure out the individual probabilities and multiply.

No matter what you do, do not *ever *add independent probabilities together. This is a common mistake. To see why it doesn’t work, imagine a 50/50 coin flip, and you’re wondering what is the probability that you’ll get Heads twice in a row. The probability of each is 50%, so if you add those together you’d expect a 100% chance of getting Heads, but we know that’s not true, because you could get Tails twice. If instead you multiply, you get 50%*50% = 25%, which is the correct probability of getting Heads twice.

*Example*

Let’s go back to our d6 game where you have to roll higher than 2, then higher than 3, and so on up to 6. In this series of 5 rolls, what are the chances you make all of them?

As we said before, these are independent trials, so we just compute the odds of each roll and then multiply together. The first roll succeeds 5/6 times. The second roll, 4/6. The third, 3/6. The fourth, 2/6, and the fifth roll 1/6. Multiplying these together, we get about 1.5%… So, winning this game is pretty rare, so you’d want a pretty big jackpot if you were putting that in your game.

*Negation*

Here’s another useful trick: sometimes it’s hard to compute a probability directly, but you can figure out the chance that the thing *won’t* happen much easier.

Here’s an example: suppose we make another game where you roll 6d6, and you win if you roll *at least* one 6. What are your chances of winning?

There are a lot of things to compute here. You might roll a single 6, which means one of the dice is showing 6 and the others are all showing 1-5, and there are 6 different ways to choose which die is showing 6. And then you might roll two 6s, or three, or more, and each of those is a separate computation, and it gets out of hand pretty quickly.

However, there’s another way to look at this, by turning it around. You *lose* if *none* of the dice are showing 6. Here we have six independent trials, each of which has a probability of 5/6 (the die can roll anything except 6). Multiply those together and you get about 33%. So you have about a 1 in 3 chance of losing.

Turning it around again, that means a 67% (or 2 in 3) chance of winning.

The most obvious lesson here is that **if you take a probability and negate it, just subtract from 100%**. If the odds of winning are 67%, the odds of *not* winning are 100% *minus* 67%, or 33%. And vice versa. So if you can’t figure out one thing but it’s easy to figure out the opposite, figure out that opposite and then subtract from 100%.

*Combining Conditions Within a Single Independent Trial*

A little while ago, I said you should never add probabilities together when you’re doing independent trials. Are there any cases where you *can* add probabilities together? Yes, in one special situation.

**When you are trying to find the probability for several non-overlapping success criteria in a single trial, add them together.** For example, the probability of rolling a 4, 5 or 6 on 1d6 is equal to the probability of rolling 4 *plus* the probability of rolling 5 *plus* the probability of rolling 6. Another way of thinking about this: when you use the word “or” in your probability (as in, “what is the probability that you will get a certain result *or* a different result from a single random event?”), figure out the individual probabilities and add them together.

One important trait here is that you add up *all possible outcomes* for a game, the combined probabilities should add up to exactly 100%. If they don’t, you’ve done your math wrong, so this is a good reality check to make sure you didn’t miss anything. For example, if you were analyzing the probability of getting all of the hands in *Poker*, if you add them all up you should get exactly 100% (or at least really close – if you’re using a calculator you might get a very slight rounding error, but if you’re doing exact numbers by hand it should be exact). If you don’t, it means there are probably some hands that you haven’t considered, or that you got the probabilities of some hands wrong, so you need to go back and check your figures.

*Uneven Probabilities*

So far, we’ve been assuming that every single side on a die comes up equally often, because that’s how dice are supposed to work. But occasionally you end up with a situation where there are different outcomes that have *different* chances of coming up. For example, there’s this spinner in one of the *Nuclear War* card game expansions that modifies the result of a missile launch: most of the time it just does normal damage plus or minus a few, but occasionally it does double or triple damage, or blows up on the launchpad and damages you, or whatever. Unlike the spinner in *Chutes & Ladders* or *A Game of Life*, the *Nuclear War* spinner has results that are not equally probable. Some results have very large sections where the spinner can land so they happen more often, while other results are tiny slivers that you only land on rarely.

Now, at first glance this is sort of like that 1, 1, 1, 2, 2, 3 die we were talking about earlier, which was sort of like a weighted 1d3, so all we have to do is divide all these sections evenly, find the smallest unit that everything is a multiple of, and then make this into a d522 roll (or whatever) with multiple sides of the die showing the same thing for the more common results. And that’s one way to do it, and that would technically work, but there’s an easier way.

Let’s go back to our original single standard d6 roll. For a normal die, we said to add up all of the sides and then divide by the number of sides, but what are we *really* doing there? We could say this another way. For a 6-sided die, each side has *exactly* 1/6 chance of being rolled. So we multiply each side’s *result* by the *probability* of that result (1/6 for each side in this case), then add all of these together. Doing this, we get (1*1/6) + (2*1/6) + (3*1/6) + (4*1/6) + (5*1/6) + (6*1/6), which gives us the same result (3.5) as we got before. And really, that’s what we’re doing the whole time: multiplying each outcome by the probability of that outcome.

Can we do this with the *Nuclear War* spinner? Sure we can. And this will give us the average result if we add these all together. All we have to do is figure out the probability of each spin, and multiply by the result.

*Another Example*

This technique of computing expected value by multiplying each result by its individual probability also works if the results are equally probable but weighted differently, like if you’re rolling dice but you win more on some rolls than others. As an example, here’s a game you might be able to find in some casinos: you place a wager, and roll 2d6. If you roll the lowest three numbers (2, 3 or 4) or the highest four numbers (9, 10, 11 or 12), you win an amount equal to your wager. The extreme ends are special: if you roll 2 or 12, you win *double* your wager. If you roll anything else (5, 6, 7 or 8), you lose your wager. This is a pretty simple game. But what is the chance of winning?

We can start by figuring out how many times you win:

- There are 36 ways to roll 2d6, total. How many of these are winning rolls?
- There’s 1 way to roll two, and 1 way to roll twelve.
- There are 2 ways to roll three and eleven.
- There are 3 ways to roll four, and 3 more ways to roll ten.
- There are 4 ways to roll nine.
- Adding these all up, there are 16 winning rolls out of 36.

So, under normal conditions, you win 16 times out of 36… slightly less than 50%.

Ah, but two of those times, you win twice as much, so that’s like winning twice! So if you play this game 36 times with a wager of $1 each time, and roll each possible roll exactly once, you’ll win $18 total (you actually win 16 times, but two of those times it counts as two wins). Since you play 36 times and win $18, does that mean these are actually even odds?

Not so fast. If you count up the number of times you lose, there are 20 ways to lose, not 18. So if you play 36 times for $1 each, you’ll win a total of $18 from the times when you win… but you’ll lose $20 from the twenty times you lose! As a result, you come out very slightly behind: you lose $2 net, on average, for every 36 plays (you could also say that on average, you lose 1/18 of a dollar per play). You can see how easy it is to make one misstep and get the wrong probability here!

*Permutations*

So far, all of our die rolls assume that order doesn’t matter. Rolling a 2+4 is the same as rolling a 4+2. In most cases, we just manually count the number of different ways to do something, but sometimes that’s impractical and we’d like a math formula.

Here’s an example problem from a dice game called *Farkle*. You start each round by rolling 6d6. If you’re lucky enough to roll one of each result, 1-2-3-4-5-6 (a “straight”), you get a huge score bonus. What’s the probability that will happen? There are a lot of different ways to have one of each!

The answer is to look at it this way: one of the dice (and only one) has to be showing 1. How many ways are there to do that? Six – there are 6 dice, and any of them can show the 1. Choose that and put it aside. Now, one of the remaining dice has to show 2. There’s five ways to do that. Choose it and put it aside. Continuing along these lines, four remaining dice can show 3, three dice of the remaining ones after that can show 4, two of the remaining dice after that can show 5, and at the end you’re left with a single die that must show 6 (no choice involved in that last one). To figure out how many ways there are to roll a straight, we multiply all the different, independent choices: 6x5x4x3x2x1 = 720 – that seems like a lot of ways to roll a straight.

To get the probability of rolling a straight, we have to divide 720 by the number of ways to roll 6d6, total. How many ways can we do that? Each die can show 6 sides, so we multiply 6x6x6x6x6x6 = 46656 (a much larger number!). Dividing 720/46656 gives us a probability of about 1.5%. If you were designing this game, that’s good to know so you can design the scoring system accordingly. We can see why *Farkle* gives you such a high score bonus for rolling a straight; it only happens very rarely!

This result is interesting for another reason. It shows just how infrequently we actually roll *exactly* according to probability in the short term. Sure, if we rolled a few thousand dice, we would see about as many of each of the six numbers on our rolls. But rolling just six dice, we *almost never* roll *exactly* one of each! We can see from this, another reason why expecting dice to roll what hasn’t been rolled yet “because we haven’t rolled 6 in awhile so we’re about due” is a fool’s game.

**Dude, Your Random Number Generator Is Broken…**

This brings us to a common misunderstanding of probability: the assumption that everything is split evenly *in the short term*, which it isn’t. In a small series of die-rolls, we expect there to be some unevenness.

If you’ve ever worked on an online game with some kind of random-number generator before, you’ve probably heard this one: a player writes tech support to tell you that your random number generator is clearly broken and not random, and they know this because they just killed 4 monsters in a row and got 4 of the exact same drop, and those drops are only supposed to happen 10% of the time, so this should *almost never happen*, so *clearly* your die-roller is busted.

You do the math. 1/10 * 1/10 * 1/10 * 1/10 is 1 in 10,000, which is pretty infrequent. This is what the player is trying to tell you. Is there a problem?

It depends. How many players are on your server? Let’s say you’ve got a reasonably popular game, and you get 100,000 daily players. How many of those kill four monsters in a row? Maybe all of them, multiple times per day, but let’s be conservative and say that half of them are just there to trade stuff in the auction house or chat on the RP servers or whatever, so only half of them actually go out monster-hunting. What’s the chance this will happen to *someone*? On a scale like that, you’d expect it to happen several times a day, at least!

Incidentally, this is why it seems like every few weeks at least, *someone *wins the lottery, even though that someone is *never* you or anyone you know. If enough people play each week, the odds are you’ll have at least *one* obnoxiously lucky sap somewhere… but that if *you* play the lottery, you’ve got worse odds of winning than your odds of being hired at Infinity Ward.

**Cards and Dependence**

Now that we’ve talked about independent events like die-rolling, we have a lot of powerful tools to analyze the randomness of many games. Things get a little more complicated when we talk about drawing cards from a deck, because each card you draw influences what’s left in the deck. If you have a standard 52-card deck and draw, say, the 10 of Hearts, and you want to know the probability that the next card is also a heart, the odds have changed because you’ve already removed a heart from the deck. Each card that you remove changes the probability of the next card in the deck. Since each card draw is influenced by the card draws that came before, we call this *dependent* probability.

Note that when I say “cards” here I am talking about *any* game mechanic where you have a set of objects and you draw one of them without replacing, so in this case “deck of cards” is mechanically equivalent to a bag of tiles where you draw a tile and don’t replace it, or an urn where you’re drawing colored balls from (I’ve never actually seen a game that involves drawing balls from an urn, but probability professors seem to have a love of them for some reason).

*Properties of Dependence*

Just to be clear, with cards I’m assuming that you are drawing cards, looking at them, and removing them from the deck. Each of these is an important property.

If I had a deck with, say, six cards numbered 1 through 6, and I shuffled and drew a card and then reshuffled all six cards between card draws, that is equivalent to a d6 die roll; no result influences the future ones. It’s only if I draw cards and don’t replace them that pulling a 1 on my first roll makes it more likely I’ll draw 6 next time (and it will get more and more likely until I finally draw it, or until I reshuffle).

The fact that we are *looking at* the cards is also important. If I pull a card from the deck but don’t look at it, I have no additional information, so the probabilities haven’t really changed. This is something that may sound counterintuitive; how does just flipping a card over magically change the probabilities? But it does, because you can only calculate the probability of unknown stuff based on what you *do *know. So, for example, if you shuffle a standard deck, reveal 51 cards and none of them is the Queen of Clubs, you know with 100% certainty that this is what the missing card is. If instead you shuffle a standard deck, and take 51 cards away *without* revealing them, the probability that the last card is the Queen of Clubs is still 1/52. For each additional card you reveal, you get more information.

Calculating probabilities for dependent events follows the same principles as independent, except it’s a little trickier because the probabilities are changing whenever you reveal a card. So you have to do a lot of multiplying different things together rather than multiplying the same thing against itself if you want to repeat a challenge. Really, all this means is we have to put together everything we’ve done already, in combination.

*Example*

You shuffle a standard 52-card deck, and draw two cards. What’s the probability that you’ve drawn a pair? There are a few ways to compute this, but probably the easiest is to say this: what’s the probability that the first card you draw makes you totally ineligible to draw a pair? Zero, so the first card doesn’t really matter as long as the second card matches it. No matter what we draw for our first card, we’re still in the running to draw a pair, so we have a 100% chance that we can still get a pair after drawing the first card.

What’s the probability that the second card matches? There are 51 cards remaining in the deck, and 3 of them match (normally it’d be 4 out of 52, but you already removed a “matching” card on your first draw!) so the probability ends up being exactly 1/17. (So, the next time that guy sitting across the table from you in Texas Hold ‘Em says “wow, another pocket pair? Must be my lucky day” you know there’s a pretty good chance he’s bluffing.)

What if we add two jokers so it’s now a 54-card deck, and we still want to know the chance of drawing a pair? Occasionally your first card will be a joker, and there will only be *one* matching card in the rest of the deck, rather than 3. How do we figure this out? By splitting up the probabilities and then multiplying each possibility.

Your first card is either going to be a Joker, or Something Else. Probability of a Joker is 2/54, probability of Something Else is 52/54.

If the first card is a Joker (2/54), then the probability of a match on the second card is 1/53. Multiplying these together (we can do that since they’re separate events and we want *both *to happen), we have 1/1431 – less than a tenth of a percent.

If the first card is Something Else (52/54), the probability of a match on the second card is up to 3/53. Multiplying these together, we have 78/1431 (a little more than 5.5%).

What do we do with these two results? Since they do not overlap, and we want to know the probability of *either* of them, we add! 79/1431 (still around 5.5%) is the final answer.

If we really wanted to be careful, we could calculate the probability of all other possible results: drawing a Joker and not matching, or drawing Something Else and not matching, and adding those together with the probability of winning, and we should get exactly 100%. I won’t do the math here for you, but feel free to do it yourself to confirm.

**The Monty Hall Problem**

This brings us to a pretty famous problem that tends to really confuse people, called the Monty Hall problem. It’s called that because there used to be this game show called *Let’s Make a Deal*, with your host, *Monty Hall*. If you’ve never seen the show, it was sort of like this inverse *The Price Is Right*. In *The Price Is Right*, the host (used to be Bob Barker, now it’s… Drew Carey? Anyway…) is your friend. He *wants* to give away cash and fabulous prizes. He tries to give you every opportunity to win, as long as you’re good at guessing how much their sponsored items actually cost.

Monty Hall wasn’t like that. He was like Bob Barker’s evil twin. His goal was to make you look like an idiot on national television. If you were on the show, he was the enemy, you were playing a game against him, and the odds were stacked in his favor. Maybe I’m being overly harsh, but when your chance of being selected as a contestant seems directly proportional to whether you’re wearing a ridiculous-looking costume, I tend to draw these kinds of conclusions.

Anyway, one of the biggest memes from the show was that you’d be given a choice of three doors, and they would actually call them Door Number 1, Door Number 2 and Door Number 3. They’d give you a door of your choice… for free! Behind one door, you’re told, is a fabulous prize like a Brand New Car. Behind the other doors, there’s no prize at all, no nothing, those other two doors are worthless. Except the goal is to humiliate you, so they wouldn’t just have an empty door, they’d have something silly-looking back there like a goat, or a giant tube of toothpaste, or something… something that was clearly *not* a Brand New Car.

So, you’d pick your door, and Monty would get ready to reveal if you won or not… but wait, *before we do that*, let’s look at one of the *other* doors that you *didn’t* choose. Since Monty knows where the prize is, and there’s only one prize and *two* doors you didn’t choose, no matter what he can always reveal a door without a prize. Oh, you chose Door Number 3? Well, let’s reveal Door Number 1 to show you that there was no prize there. And now, being the generous guy he is, he gives you the chance to trade your Door Number 3 for whatever’s behind Door Number 2 instead. And here’s where we get into probability: does switching doors increase your chance of winning, or decrease it, or is it the same? What do you think?

The real answer is that switching *increases* your chance of winning from 1/3 to 2/3. This is counterintuitive. If you haven’t seen this problem before, you’re probably thinking: wait, just by revealing a door we’ve magically changed the odds? But as we saw with our card example earlier, that is *exactly* what revealed information does. Your odds of winning with your first pick are obviously 1/3, and I think everyone here would agree to that. When that new door is revealed, it doesn’t change the odds of your first pick at all – it’s still 1/3 –but that means the *other* door now has a 2/3 chance of being the right one.

Let’s look at it another way. You choose a door. Chance of winning: 1/3. I offer to swap you for *both* of the other doors, which is basically what Monty Hall is doing. Sure, he reveals one of them to not be a prize, but he can *always* do that, so that doesn’t really change anything. Of course you’d want to switch!

If you’re still wondering about this and need more convincing, clicking here will take you to a wonderful little Flash app that lets you explore this problem. You can actually play, starting with something like 10 doors and eventually working down your way to 3; there’s also a simulator where you can give it any number of doors from 3 to 50 and just play on your own, or to have it actually run a few thousand simulations and give you how many times you would have won if you stayed versus when you switched.

**Monty Hall, Redux**

Now, in practice on the actual show, Monty Hall knew this, because *he* was good at math even if his contestants weren’t. So here’s what he’d do to change the game a little. If you picked the door with the prize behind it, which *does happen* 1/3 of the time, he’d *always* offer you the chance to switch. After all, if you’ve got a car and then you give it away for a goat, you’re going to look pretty dumb, which is exactly what he wants, because that’s the kind of evil guy he is. But if you pick a door with *no prize* behind it, he’ll only offer you the chance to switch *about half* of those times, and the other half he’ll just show you your Brand New Goat and boot you off the stage. Let’s analyze this new game, where Monty can *choose* whether or not to give you the chance to switch.

Suppose he follows this algorithm: always let you switch if you picked the door with the car, otherwise he has a 50/50 chance of giving you your goat or giving you the chance to switch. *Now* what are your chances of winning?

1/3 of the time, you pick the prize right away and he offers you to switch.

Of the remaining 2/3 of the time (you pick wrong initially), half of the time he’ll offer to switch, half the time he won’t. Half of 2/3 is 1/3, so basically 1/3 of the time you get your goat and leave, 1/3 of the time you picked wrong and he offers the switch, and 1/3 of the time you picked *right *and he offers the switch.

If he offers an exchange, we already know that the 1/3 of the time when he gives you your goat and you leave didn’t happen. That is useful information, because it means our chance of winning has now changed. Of the 2/3 of the time where we’re given a choice, 1/3 means we guessed right, and the other 1/3 means we guessed wrong, so if we’re given a choice at all it means our probability of winning is now 50/50, and there’s no *mathematical *advantage to keeping or switching.

Like in *Poker*, this is no longer a game of math and now a game of psychology. Did Monty offer you a choice because he thinks you’re a sucker who doesn’t know that switching is the “right” choice, and that you’ll stubbornly hold onto the door you picked because psychologically it’s worse to have a car and then lose it? Or does he think you’re smart and that you’ll switch, and he’s offering you the chance because he knows you guessed right at the beginning and you’ll take the bait and fall into his trap? Or maybe he’s being uncharacteristically nice, and goading you into doing something in your own best interest, because he hasn’t given away a car in awhile and his producers are telling him the audience is getting bored and he’d better give away a big prize soon so their ratings don’t drop?

In this way, Monty manages to offer a choice (sometimes) while still keeping the overall probability of winning at 1/3. Remember, a third of the time you’ll just lose outright. A third of the time you’ll guess right initially, and 50% of that time you’ll win (1/3 x 1/2 = 1/6). And a third of the time, you’ll guess wrong initially but be given the choice to switch, and 50% of that time you’ll win (also 1/6). Add the two non-overlapping win states together and you get 1/3, so whether you switch or stay your overall odds are 1/3 throughout the whole game… no better than if you just guessed and he showed you the door, without any of this switching business at all! So the point of offering to switch doors is not done for the purpose of changing the odds, but simply because drawing out the decision makes for more exciting television viewing.

Incidentally, this is one of the same reasons *Poker* can be so interesting, is that most of the formats involve slowly revealing cards in between rounds of betting (like the Flop, Turn and River in *Texas Hold ‘Em*), because you start off with a certain probability of winning and that probability is changing in between each betting round as more cards are revealed.

**The Sibling Problem**

And that brings us to another famous problem that tends to throw people, the Siblings problem. This is about the only thing I’m writing about today that isn’t directly related to games (although I guess that just means I should challenge you to come up a game mechanic that uses this). It’s more a brain teaser, but a fun one, and in order to solve it you really have to be able to understand conditional probability like we’ve been talking about.

The question is this: I have a friend with two kids, and *at least one* of them is a girl. What is the probability that the other one is *also* a girl? Assume that in the normal human population, there’s a 50/50 chance of having a boy or a girl, and assume that this is universally true for any child (in reality some men actually do produce more X or Y sperm, so that would skew the odds a bit where if you know one of their kids is already a girl, that the odds are slightly higher they’ll have more girls, and then there are conditions like hermaphrodism, but for our purposes let’s ignore that and assume that each kid is an independent trial with an equal chance of being male or female).

Intuitively, since we’re dealing with a core 1/2 chance, we would expect the answer would be something like 1/2 or 1/4 or some other nice, round number that’s divisible by 2. The actual answer is *1/3*. Wait, what?

The trick here is that the information we were given narrows down the possibilities. Let’s say the parents are Sesame Street fans and so no matter what the sex, they name their kids A and B. Under normal conditions, there are four possibilities that are equally likely: A and B are both boys, A and B are both girls, A is boy and B is girl, or A is girl and B is boy. Since we know *at least one* of them is a girl, we can eliminate the possibility that A and B are both boys, so we have three (still equally likely) scenarios remaining. Since they’re equally likely and there are three of them, we know each one has a probability of 1/3. Only one of those three scenarios involves two girls, so the answer is 1/3.

**The Sibling Problem, Redux**

It gets weirder. Suppose instead I tell you my friend has two children, and one is a *girl who was born on a Tuesday*. Assume that under normal conditions, a child is equally likely to be born on any of the seven days of the week. What’s the probability the other child is also a girl? You’d think the answer would still be 1/3; what does Tuesday have to do with anything? But again, intuition fails. The actual answer is *13/27*, which isn’t just unintuitive, it’s plain old weird-looking. What’s going on *here*?

Tuesday actually changes the odds, again because we don’t know *which* child it was, or if *both* children were born on Tuesday. By the same logic as earlier, we count all valid combinations of children where at least one is a Tuesday girl. Again assuming the children are named A and B, the combinations are:

- A is a Tuesday girl, B is a boy (there are 7 possibilities here, one for each day of the week that B could be born on).
- B is a Tuesday girl, A is a boy (again, 7 possibilities).
- A is a Tuesday girl, B is a girl born on a
*different*day of the week (6 possibilities). - B is a Tuesday girl, A is a non-Tuesday girl (again, 6 possibilities).
- A and B are both girls born on Tuesday (1 possibility, but we have to take care not to double-count this).

Adding it up, there are 27 different, equally likely combinations of children and days with at least one Tuesday girl. Of those, 13 possibilities involve two girls. Again, this is totally counterintuitive, and apparently designed for no other reason than to make your brain hurt. If you’re still scratching your head, ludologist Jesper Juul has a nice explanation of this problem on his website.

**If You’re Working on a Game Now…**

If a game you’re designing has any randomness, this is a great excuse to analyze it. Choose a random element you want to analyze. For that element, first ask yourself what kind of probability you’re expecting to see, what makes sense to you in the context of the game. For example, if you’re making an RPG and looking at the probability that the player will hit a monster in combat, ask yourself what to-hit percentage feels right to you. Usually in console RPGs, misses by the player are very frustrating, so you wouldn’t usually want them to miss a lot… maybe 10% of the time or less? If you’re an RPG designer you probably know better than I, but you should have some basic idea of what feels right.

Then, ask yourself if this is something that’s *dependent* (like cards) or *independent* (like dice). Break down all possible results, and the probabilities of each. Make sure your probabilities sum to 100%. And lastly, of course, compare the actual numbers to the numbers you were expecting. Is this particular random die-roll or card-draw acting how you want it to, or do you see signs that you need to adjust the numbers? And of course, if you *do* find something to adjust, you can use these same calculations to figure out exactly how much to adjust it!

**Homework**

Your “homework” this week is meant to help you practice your probability skills. I have two dice games and a card game for you to analyze using probability, and then a weird mechanic from a game I once worked on that provides a chance to try out a Monte Carlo simulation.

*Game #1: Dragon Die*

This is a dice game that I invented with some co-workers one day (thanks Jeb Havens and Jesse King!) specifically to mess with people’s heads on probability. It’s a simple casino game called Dragon Die, and it’s a dice gambling contest between you and the House. You are given a standard 1d6, and you roll it. You’re trying to roll higher than the House. The House is given a non-standard 1d6 – it’s similar to yours, but instead of a 1 it has a Dragon on it (so the House die is Dragon-2-3-4-5-6). If the House rolls a Dragon, then the House automatically wins and you automatically lose. If you both roll the same number, it’s a push, and you both re-roll. Otherwise, the winner is whoever rolls highest.

Obviously, the odds are slightly against the player here, because the House has this Dragon advantage. But how much of an advantage is it? You’re going to calculate it. But first, before you do, exercise your intuition. Suppose I said this game was offered with a 2 to 1 payout. That is, if you win, you keep your bet and get *twice* your bet in winnings. So, if you bet $1 and win, you keep your $1 and get $2 extra, for a total of $3. If you lose, you just lose your standard bet. Would you play? That is, intuitively, do you think the odds are better or worse than 2 to 1? Said another way, for every 3 games you play, do you expect to win *more than once*, or *less than once*, or *exactly once*, on average?

Once you’ve used your intuition, do the math. There are only 36 possibilities for both dice, so you should have no problem counting them all up. If you’re not sure about this “2 to 1” business, think of it this way: suppose you played the game 36 times (wagering $1 each time). A win nets you $2 up, a loss causes you to lose $1, and a push is no change. Count up your total winnings and losses and figure out if you come out ahead or behind. And then ask yourself how close your intuition was. And then realize how evil I am.

And yes, if you’re wondering, the actual dice-roll mechanics here are something I’m intentionally obfuscating, but I’m sure you’ll all see through that once you sit down and look at it. Try and solve it yourself. I’ll post all answers here next week.

*Game #2: Chuck-a-Luck*

There is a gambling dice game called Chuck-a-Luck (also known as Birdcage, because sometimes instead of rolling dice they’re placed in a wire cage that somewhat resembles a *Bingo* cage). This is a simple game that works like this: place your bet (say, $1) on any number from 1 to 6. You then roll 3d6. For *each die* that your number shows up on, you get $1 in winnings (and you get to keep your original bet). If *no dice* show your number, the house takes your $1 and you get nothing. So, if you place on 1 and you roll triple 1s, you actually win $3.

Intuitively, it seems like this is an even-odds game. Each die is individually a 1/6 chance of winning, so adding all three should give you a 3/6 chance of winning. But of course, if you calculate that way you’re *adding* when these are separate die-rolls, and remember, you’re only allowed to add if you’re talking about separate win conditions from the *same* die. You need to be multiplying something.

When you count out all possible results (you’ll probably find it easier to do this in Excel than by hand since there are 216 results), it *still *looks at first like an even-odds game. But in reality, the odds of winning are actually slightly in favor of the House; how much? In particular, on average, how much money do you expect to lose each time you play this game? All you have to do is add up the gains and losses for all 216 results, then divide by 216, so this should be simple… but as you’ll see, there are a few traps you can fall into, which is why I’m telling you right now that if you think it’s even-odds, you’ve got it wrong.

*Game #3: 5-Card Stud Poker*

When you’ve warmed up with the previous two exercises, let’s try our hand at dependent probability by looking at a card game. In particular, let’s assume *Poker* with a 52-card deck. Let’s also assume a variant like *5-card Stud *where each player is dealt 5 cards, and that’s all they get. No ability to discard and draw, no common cards, just a straight-up you get 5 cards and that’s what you get.

A “Royal Flush” is the 10-J-Q-K-A in the same suit, and there are four suits, so there are four ways to get a Royal Flush. Calculate the probability that you’ll get one.

One thing I’ll warn you about here: remember that you can draw those five cards in any order. So you might draw an Ace first, or a Ten, or whatever. So the *actual* way you’ll be counting these, there are actually a lot more than 4 ways to get dealt a Royal Flush, if you consider the cards to be dealt sequentially!

*Game #4: IMF Lottery*

This fourth question is one that can’t easily be solved through the methods we’ve talked about today, but you can simulate it pretty easily, either with programming or with some fudging around in Excel. So this is a way to practice your Monte Carlo technique.

In a game I worked on that I’ve mentioned before called *Chron X*, there was this really interesting card called *IMF Lottery*. Here’s how it worked: you’d put it into play. At the end of your turn, the game would roll a percentile, and there was a 10% chance it would leave play, and a random player would gain 5 of each resource type *for every token* on the card. The card didn’t start with any tokens, but if it stayed around then at the start of each of your turns, it gained a token. So, there is a 10% chance you’ll put it into play, end your turn, and it’ll leave and no one gets anything. If that doesn’t happen (90% chance), then there is a further 10% chance (actually 9% at this point, since it’s 10% of 90%) that on the very next turn, it’ll leave play and someone will get 5 resources. If it leaves play on the turn after that (10% of the remaining 81%, so 8.1% chance) someone gets 10 resources, then the next turn it would be 15, then 20, and so on. The question is, what is the *expected value* of the number of total resources you’ll get from this card when it finally leaves play?

Normally, we’d approach this by finding the probability of each outcome, and multiplying by the outcome. So there is a 10% chance you get 0 (0.1*0 = 0). There’s a 9% chance you get 5 resources (that’s 9%*5 = 0.45 resources). There’s an 8.1% chance you get 10 resources (8.1%*10 = 0.81 resources total, expected value). And so on. And then we add all of these up.

Now, you can quickly see a problem: there is always going to be a chance that it will *not* leave play, so this could conceivably stay in play without leaving *forever*, for an infinite number of turns, so there’s no actual way to write out *every single possibility*. The techniques we learned today don’t give us a way to deal with infinite recursion, so we’ll have to fake it.

If you know enough programming or scripting to feel comfortable doing this, write a program to simulate this card. You should have a *while* loop that initializes a variable to zero, rolls a random number, and 10% of the time it exits the loop. Otherwise it adds 5 to the variable, and iterates. When it finally breaks out of the loop, have it increment the total number of trials by 1, and the total number of resources by whatever the variable ended up as. Then, re-initialize the variable and try again. Run this a few thousand times. At the end, divide total number of resources by total number of trials, and that’s your Monte Carlo expected value. Run the program a few times to see if the numbers you’re getting are about the same; if there’s still a lot of variation in your final numbers, increase the number of iterations in the outer loop until you start getting some consistency. And you can be pretty sure that whatever you come up with is going to be about right.

If you don’t know programming (or even if you do), this is an excuse to exercise your Excel skills. You can never have enough Excel skills as a game designer.

Here you’ll want to make a good use of the IF and RAND statements. RAND takes no values, it just returns a random decimal number between 0 and 1. Usually we combine it with FLOOR and some plusses or minuses to simulate a die roll, as I mentioned earlier. In this case, though, we just have a 10% check for the card leaving play, so we can just check if RAND is less then 0.1 and not mess with this other stuff.

IF takes in three values. In order: a condition that’s either true or false, and then a value to return if it’s true, and then a value to return if it’s false. So the following statement will return 5 ten percent of the time, and 0 the other ninety percent of the time:

=IF(RAND()<0.1,5,0)

There are a lot of ways to set this up, but if I were doing it, I’d use a formula like this for the cell that represents the first turn, let’s say this is cell A1:

=IF(RAND()<0.1,0,-1)

Here I’m using *negative one* as shorthand for “this card hasn’t left play and given out any resources yet.” So if the first turn ended and the card left play right away, A1 would be zero; otherwise it’s -1.

For the next cell, representing the second turn:

=IF(A1>-1, A1, IF(RAND()<0.1,5,-1))

So if the first turn ended and the card left play right away, A1 would be 0 (number of resources), and this cell would just copy that value. Otherwise A1 would be -1 (hasn’t left play yet), and this cell proceeds to roll randomly again: 10% of the time it returns 5 (for the 5 resources), the rest of the time it is still -1. Continuing this formula for additional cells simulates additional turns, and whatever cell is at the end gives you a final result (or -1 if it never left play after all of the turns you’re simulating).

Take this row of cells, which represents a single play of the card, and copy and paste for a few hundred (or a few thousand) rows. We might not be able to do an *infinite* test for Excel (there are only so many cells that fit in a spreadsheet), but we can at least cover the majority of cases. Then, have a single cell where you take the average of the results of all the turns (Excel helpfully provides the AVERAGE() function for this).

In Windows, at least, you can hit F9 to reroll all your random numbers. As before, do that a few times and see if the values you get are similar to each other. If there’s too much variety, double the number of trials and try again.

**Unsolved Problems**

If you happen to have a Ph.D. in Probability already and the problems above are too easy for you, here are two problems that I’ve wondered about for years, but I don’t have the math skills to solve them. If you happen to know how to do these, post as a comment; I’d love to know how.

*Unsolved #1: IMF Lottery*

The first unsolved problem is the previous homework. I can do a Monte Carlo simulation (either in C++ or Excel) pretty easily and be confident of the answer for how many resources you get, but I don’t actually know how to come up with a definitive, provable answer mathematically (since this is an infinite series). If you know how, post your math… after doing your own Monte Carlo simulation to verify the answer, of course.

*Unsolved #2: Streaks of Face Cards*

This problem, and again this is *way beyond* the scope of this blog post, is a problem I was posed by a fellow gamer over 10 years ago. They witnessed a curious thing while playing Blackjack in Vegas: out of an eight-deck shoe, they saw *ten* face cards in a row (a face card is 10, J, Q or K, so there are 16 of them in a standard 52-card deck, which means there are 128 of them in a 416-card shoe). What is the probability that there is *at least* one run of ten *or more* face cards, somewhere, in an eight-deck shoe? Assume a random, fair shuffle. (Or, if you prefer, what are the odds that there are *no* runs of ten or more face cards, *anywhere* in the sequence?)

You can simplify this. There’s a string of 416 bits. Each bit is 0 or 1. There are 128 ones and 288 zeros scattered randomly throughout. How many ways are there to randomly interleave 128 ones and 288 zeros, and how many of those ways involve at least one clump of ten or more 1s?

Every time I’ve sat down to solve this problem, it seems like it should be really easy and obvious at first, but then once I get into the details it suddenly falls apart and becomes impossible. So before you spout out a solution, really sit down to think about it and examine it, work out the actual numbers yourself, because every person I’ve ever talked to about this (and this includes a few grad students in the field) has had that same “it’s obvious… no, wait, it’s not” reaction. This is a case where I just don’t have a technique for counting all of the numbers. I could certainly brute-force it with a computer algorithm, but it’s the mathematical technique that I’d find more interesting to know.

July 28, 2010 at 1:18 pm |

Unsolved #1:

Let the expected number be called X and consider the first turn: .1 chance of nothing and .9 chance of adding 5 to the result of future turns.

X = .1*0 + .9*(5+X)

Solving for X gives 45.

July 28, 2010 at 1:58 pm |

Interesting! Here I was expecting some kind of summation series, but that’s very elegant. Still trying to wrap my head around why this works…

July 28, 2010 at 2:46 pm

If you want to be rigorous, here is how you can get there from the summation:

X = .1*0 + .9*.1*5 + .9*.9*.1*(5+5) + …

= sigma(i = 0 to inf, .1*.9^i*i*5)

(pull out first term)

= .1*0 + sigma(i = 1 to inf, .1*.9^i*i*5)

(now replace dummy variable i with i+1)

= .1*0 + sigma(i+1 = 1 to inf, .1*.9^(i+1)*(i+1)*5)

= .1*0 + sigma(i = 0 to inf, .9(.1*.9^i*i*5+5*.1*.9^i))

(factor and separate stuff from inside to sigma)

= .1*0 + .9(sigma(i = 0 to inf, .1*.9^i*i*5)+sigma(i = 0 to inf, 5*.1*.9^i))

(apply geometric series formula for .9^i)

= .1*0 + .9(sigma(i = 0 to inf, .1*.9^i*i*5)+5)

(notice sigma matches the second line above)

= .1*0 + .9(X+5)

This is really just jumping through math hoops to factor out a .9 and the +5 from all the terms after the first, before recognizing an X within. It’s easier conceptually to award yourself 5 tokens immediately as you reach the second turn, and recognize that the rest of the turns must give the same expected result as the beginning.

July 28, 2010 at 2:58 pm

Btw, what you are looking at is called a geometric distribution

http://en.wikipedia.org/wiki/Geometric_distribution

July 28, 2010 at 10:52 pm |

Unsolved #2 (simplified):

This solution uses a computer, but uses more technique than brute-force (which would take longer than the heat death of the universe to complete).

Let f(x, y, z) be the number of ways that x ones and y zeroes can be arranged such that there is at least one clump of 10 and ends with at least z ones, where 0<=z0. Otherwise, it has f(x, y-1, 0) arrangements, corresponding to its remaining composition if you chop off the last zero.

For set B, if you chop off the last one, there are some different possible leftovers (each having x-1 ones and y zeroes):

C) still has a clump of 10 within and ends with at least z-1 ones (=f(x-1,y,max(0,z-1)))

D) has 9 ones at the end (=number of ways to arrange the y zeroes among the x+y-10 remaining positions = x+y-10 choose y)

E) is in both C and D (=f(x-1,y,9))

Every case in B must fall into C or D (or both). So you can add their sizes and subtract back out the ones counted twice to get B = C+D-E = f(x-1,y,max(0,z-1)))+(x+y-10 C y)-f(x-1,y,9)

Now you have a recursive formula for f(x,y,z) where the recursive calculations always decrement x or y. You can start off calculating f(0,0,?), use that to calculate f(0,1,?) and f(1,0,?), use those to calculate a higher layer etc, and build up to f(128,288,0) for the answer. (This method is called “dynamic programming” for very arbitrary reasons.) Then divide by the number of random arrangements which is 416 choose 128, or 416!/288!/128!.

This is where the computer comes in to calculate these 128x288x10 values. Final printout:

With 128 ones and 288 zeroes,

236779178404896208120691622689403367619067743771989852408498548238005455393419623904153415254378072341018260 ways with clumps of 10 out of

138594834613418627496609864757437921511978213696076182671037070009261613273662470703706238316410624145827894755 ways

= 0.001708427 chance

or 1 out of 585.3337

July 28, 2010 at 10:57 pm |

Part of the above post got chopped out, fixed here:

Unsolved #2 (simplified):

This solution uses a computer, but uses more technique than brute-force (which would take longer than the heat death of the universe to complete).

Let f(x, y, z) be the number of ways that x ones and y zeroes can be arranged such that there is at least one clump of 10 and ends with at least z ones, where z is between 0 and 9 (inclusive).

To calculate f, break up its possible ways into 2 exclusive sets:

A) arrangements that end with a zero

B) arrangements that end with a one

Set A is empty if z is positive. Otherwise, it has f(x, y-1, 0) arrangements, corresponding to its remaining composition if you chop off the last zero.

For set B, if you chop off the last one, there are some different possible leftovers (each having x-1 ones and y zeroes):

C) still has a clump of 10 within and ends with at least z-1 ones (=f(x-1,y,max(0,z-1)))

D) has 9 ones at the end (=number of ways to arrange the y zeroes among the x+y-10 remaining positions = x+y-10 choose y)

E) is in both C and D (=f(x-1,y,9))

Every case in B must fall into C or D (or both). So you can add their sizes and subtract back out the ones counted twice to get B = C+D-E = f(x-1,y,max(0,z-1)))+(x+y-10 C y)-f(x-1,y,9)

Now you have a recursive formula for f(x,y,z) where the recursive calculations always decrement x or y. You can start off calculating f(0,0,?), use that to calculate f(0,1,?) and f(1,0,?), use those to calculate a higher layer etc, and build up to f(128,288,0) for the answer. (This method is called “dynamic programming” for very arbitrary reasons.) Then divide by the number of random arrangements which is 416 choose 128, or 416!/288!/128!.

This is where the computer comes in to calculate these 128x288x10 values. Final printout:

With 128 ones and 288 zeroes,

236779178404896208120691622689403367619067743771989852408498548238005455393419623904153415254378072341018260 ways with clumps of 10 out of

138594834613418627496609864757437921511978213696076182671037070009261613273662470703706238316410624145827894755 ways

= 0.001708427 chance

or 1 out of 585.3337

July 29, 2010 at 2:26 pm |

Code for this solution: http://gist.github.com/498838

July 29, 2010 at 1:01 am |

For unsolved #2, the probability of getting 10 face cards in a row is exactly 4430882041456670836580799613118307965105941504317415971177325346259781015867526425506271173047828/2593544617665399666070872329313186230262721234289066637634396887797267773369635096324303347565025539. That’s approximately 1 in 585 (to within 1 part per million). I’ll try to write up my method tomorrow morning.

July 29, 2010 at 1:05 pm |

The writeup was a bit long for a comment, so I posted it at http://blog.haleret.com/complex-probability-calculations . My approach is significantly different from anon234’s (which I’m still trying to understand), but we both came up with exactly the same answer, which is promising.

July 29, 2010 at 1:38 pm

The really interesting thing to me is that both you and anon234 required the use of a computer to solve, because the math is just too tedious to do by hand. In the past (you know, before computers) when presented with a problem like this, mathematicians would just invent a new branch of math or a new technique or something to solve it, so you’d end up with some kind of new math operation like the Choose or Permutation operator or a transform or something. These days it’s just “use the computer”… of course, we still get the right answer, so maybe that’s better 🙂

July 29, 2010 at 2:30 pm |

Well, it’s not really ‘just “use the computer”‘. As anon234 pointed out, the problem was big enough that a basic brute-force solution would be impossible no matter how much computing power you threw at it. The amount of computation required for either of our methods is about the same. If there were no computers available, it would be feasible to throw either of our approaches to an army of graduate students to calculate by hand, which is exactly how these sorts of things used to be done. A single person working alone could finish it in about a year.

All the computer changed was the cost of performing basic arithmetic accurately. There are plenty of problems out there that still require some amount of ingenuity to solve, either because no-one yet knows how to reduce it to basic arithmetic or (as in this case) the amount of basic arithmetic required is so astronomically huge that it can’t be tackled with computers.

An interesting thing to note is that the fraction I gave is fully reduced; there is no more compact way to state its exact value. I can’t think of any process that would produce a number like that in which I would feel confident in my ability to execute without error.

July 29, 2010 at 3:58 pm |

You are of course correct, I spoke a bit flippantly by saying “just use the computer” — you do still need a reasonable algorithm.

What I meant to say is what you seem to be hinting at: that certain problems just take too long to do by hand with arithmetic. It used to be that something like “find the probability of a Royal Flush” would fall into this category (since there are a huge amount of shuffles, too many to count individually) until we invented certain techniques of counting that allowed us to reduce the work required to a few multiplications. Or think of the work you save by doing something like a binomial distribution by formula, rather than by hand-counting.

And so I was trying to say, I wonder if similar formulas exist for this kind of “minimum string length” problem that just haven’t been discovered yet (or they have and I just never heard about them)… or if, by the time someone got around to solving it, we could do it algorithmically with computer so these other mathematical reductions never happened. (Or maybe it’s impossible to reduce, and so this is the kind of problem that literally couldn’t be solved in the 18th century without an army of slave labor. I don’t know. This is why I thought it was worth asking.)

Still, it’s amazing how fast I could get a solution to this problem just by asking. It’s things like this that make me love the internet (and teaching) 🙂

July 31, 2010 at 2:08 pm |

[…] just wrapped up Level 4 (or week 4, for you non-gamers out there) of the Game Balance Concepts course that I’ve been […]

August 9, 2010 at 7:28 am |

If you want more probability games so you found the best source here:

3rd grade Probability Games online

to the main site for more math subjects: Math Games

September 21, 2010 at 5:19 pm |

‘(I’ve never actually seen a game that involves drawing balls from an urn, but probability professors seem to have a love of them for some reason).’

Some people in Mexico used urns to play Loteria instead of a deck of cards. I always liked the urn better 😛

May 19, 2011 at 2:02 pm |

[…] Some good resources: -1- Game Balance Concepts […]

September 27, 2012 at 9:31 pm |

Your Sibling Problem is incorrect.

“The question is this: I have a friend with two kids, and at least one of them is a girl. What is the probability that the other one is also a girl?”

You answered 1/3 chance. But it’s just a 50% chance. Why? Because you’re counting one scenario twice. (A = girl, B = boy, and A = boy, B = girl) These are the same scenarios because we already know at least one of them is a girl. So it doesn’t matter which one is named A or B, does it?

So we are left with two choices. The second child is a girl, or the second child is a boy. 50% chance.

September 27, 2012 at 10:42 pm |

Actually it does matter, because A=girl/B=boy and A=boy/B=girl are two different scenarios.

Give them names. My friend has two kids, Pat and Jessie. Without any givens, there are four equally likely scenarios: both are girls, both are boys, Pat is a girl and Jessie is a boy, or Pat is a boy and Jessie is a girl. (If you say these last two are the same, I think both Pat and Jessie would disagree 😉

The given information (at least one is a girl) eliminates the boy/boy possibility so we are left with three equally likely scenarios: all of the above minus boy/boy. So it’s 1/3.