Archive for July, 2010

Level 4: Probability and Randomness

July 28, 2010

Readings/Playings

Read this article on Gamasutra by designer Tyler Sigman. I affectionately refer to it as the “Orc Nostril Hair” article, but it provides a pretty good primer for probabilities in games.

This Week’s Topic

Up until now, nearly everything we’ve talked about was deterministic, and last week we really went deep into transitive mechanics and took them about as far as I can go with them. But so far we’ve ignored a huge aspect of many games, the non-deterministic aspects: in other words, randomness. Understanding the nature of randomness is important for game designers because we make systems in order to craft certain player experiences, so we need to know how those systems work. If a system includes a random input, we need to understand the nature of that randomness and how to modify it to get the results that we want.

Dice

Let’s start with something simple: a die-roll. When most people think of dice, they are thinking of six-sided dice, also known as d6s. But most gamers have encountered plenty of other dice: four-sided (d4), eight-sided (d8), d12, d20… and if you’re really geeky, you might have a d30 or d100 lying around. In case you haven’t seen this terminology before, “d” with a number after it means a die with that many sides; if there is a number before the “d” that is the number of dice rolled – so for example, in Monopoly you roll 2d6.

Now, when I say “dice” here, that is shorthand. We have plenty of other random-number generators that aren’t globs of plastic but that still serve the same function of generating a random number from 1 to n. A standard coin can be thought of as a d2. I’ve seen two designs for a d7, one which looks like a die and the other which looks more like a seven-sided wooden pencil. A four-sided dreidel (also known as a teetotum) is equivalent to a d4. The spinner that comes with Chutes & Ladders that goes from 1 to 6 is equivalent to a d6. A random-number generator in a computer might create a random number from 1 to 19 if the designer wants it to, even though there’s no 19-sided die inside there (I’ll actually talk a bit more about numbers from computers next week). While all of these things look different, really they are all equivalent: you have an equal chance of choosing one of several outcomes.

Dice have some interesting properties that we need to be aware of. The first is that each side is equally likely to be rolled (I’m assuming you’re using a fair die and not a rigged one). So, if you want to know the average value of a roll (also known as the “expected value” by probability geeks), just add up all the sides and divide by the number of sides. The average roll for a standard d6 is 1+2+3+4+5+6 = 21, divided by the number of sides (6), which means the average is 21/6 = 3.5. This is a special case, because we assume all results are equally likely.

What if you have custom dice? For example, I’ve seen one game with a d6 with special labels: 1, 1, 1, 2, 2, 3, so it behaves sort of like this weird d3 where you’re more likely to get a 1 than a 2 and more likely to get 2 than 3. What’s the average roll for this die? 1+1+1+2+2+3 = 10, divided by 6, equals 5/3 or about 1.66. So if you’ve got that custom die and you want players to roll three of them and add the results, you know they’ll roll an average of about a total of 5, and you can balance the game on that assumption.

Dice and Independence

As I said before, we’re going on the assumption that each roll is equally likely. This is true no matter how many dice are rolled. Each die roll is what we call independent, meaning that previous rolls do not influence later rolls. If you roll dice enough times you definitely will see “streaks” of numbers, like a run of high or low numbers or something, and we’ll talk later about why that is, but it doesn’t mean the dice are “hot” or “cold”; if you roll a standard d6 and get two 6s in a row, the probability of rolling another 6 is… exactly 1/6. It is not more likely because the die is “running hot”. It is not less likely because “we already got two 6s, so we’re due for something else”. (Of course, if you roll it twenty times and get 6 on each time, the odds of getting a 6 on the twenty-first roll are actually pretty good… because it probably means you have a rigged die!) But assuming a fair die, each roll is equally likely, independent of the others. If it helps, assume that we’re swapping out dice each time, so if you roll a 6 twice, remove that “hot” die from play and replace with a new, “fresh” d6. For those of you that knew this already, I apologize, but I need to be clear about that before we move on.

Making Dice More or Less Random

Now, let’s talk about how you can get different numbers from different dice. If you’re only making a single roll or a small number of rolls, the game will feel more random if you use more sides of the dice. The more you roll a die, or the more dice you roll, the more things will tend towards the average. For example, rolling 1d6+4 (that is, rolling a standard 6-sided die and adding 4 to the result) generates a number between 5 and 10. Rolling 5d2 also generates a number between 5 and 10. But the single d6 roll has an equal chance of rolling a 5, or 8, or 10; the 5d2 roll will tend to create more rolls of 7 and 8 than any other results. Same range, and even the same average value (7.5 in both cases), but the nature of the randomness is different.

Wait a minute. Didn’t I just say that dice don’t run hot or cold? And now I’m saying if you roll a lot of them, they tend towards an average? What’s going on?

Let me back up. If you roll a single die, each roll is equally likely. That means if you roll a lot, over time, you’ll roll each side about as often as the others. The more you roll, the more you’ll tend towards the average, collectively. This is not because previous numbers “force” the die to roll what hasn’t been rolled before. It’s because a small streak of 6s (or 20s or whatever) ends up not being much influence when you roll another ten thousand times and get mostly average rolls… so you might get a bunch of high numbers now, but you might also get a bunch of low numbers later, and over time it all tends to go towards the mean. Not because the die is being influenced by previous rolls (seriously, the die is a piece of plastic, is doesn’t exactly have a brain to be thinking “gosh, I haven’t come up 2 in awhile”), but because that’s just what tends to happen in large sets of rolls. Your small streak in a large ocean of die-rolls will be mostly drowned out.

So, doing the math for a random die roll is pretty straightforward, at least in terms of finding the average roll. There are also ways to quantify “how random” something is, a way of saying that 1d6+4 is “more random” than 5d2 in that it gives a more even spread, mostly you do that by computing something called “standard deviation” and the larger that is the more random it is, but that takes more computation than I want to get into today (I’ll get into it later on). All I’m asking you to know is that in general, fewer dice rolled = more random. While I’m on the subject, more faces on a die is also more random since you have a wider spread.

Computing Probability through Counting

You might be wondering: how can we figure out the exact probability of getting a specific roll? This is actually pretty important in a lot of games, because if you’re making a die roll in the first place there is probably some kind of optimal result. And the answer is, we count two things. First, count the total number of ways to roll dice (no matter what the result). Then, count the number of ways to roll dice that get the result you actually want. Divide the first number by the second and you’ve got your probability; multiply by 100 if you want the percentage.

Examples

Here’s a very simple example. You want to roll 4 or more on 1d6. There are 6 total possible results (1, 2, 3, 4, 5, or 6). Of those, 3 of the results (4, 5, or 6) are a success. So your probability is 3 divided by 6, or 0.5, or 50%.

Here’s a slightly more complicated example. You want to roll an even number on 2d6. There are 36 total results (6 for each die, and since neither die is influenced by the other you multiply 6 results by 6 to get 36). The tricky thing with questions like this, is that it’s easy to double-count. For example, there are actually two ways to roll the number 3 on 2d6: 1+2 and 2+1. Those look the same, but the difference is which number appears on the first die, and which appears on the second die. If it helps, think of each die as having a different color, so maybe you have a red die and a blue die in this case. And then you can count the ways to roll an even number: 2 (1+1), 4 (1+3), 4 (2+2), 4 (3+1), 6 (1+5), 6 (2+4), 6 (3+3), 6 (4+2), 6 (5+1), 8 (2+6), 8 (3+5), 8 (4+4), 8 (5+3), 8 (6+2), 10 (4+6), 10 (5+5), 10 (6+4), 12 (6+6). It turns out there are exactly 18 ways to do this out of 36, also 0.5 or 50%. Perhaps unexpected, but kind of neat.

Monte Carlo Simulations

What if you have too many dice to count this way? For example, say you want to know the odds of getting a combined total of 15 or more on a roll of 8d6. There are a LOT of different individual results of eight dice, so just counting by hand takes too long. Even if we find some tricks to group different sets of rolls together, it still takes a really long time. In this case the easiest way to do it is to stop doing math and start using a computer, and there are two ways to do this.

The first way gets you an exact answer but takes a little bit of programming or scripting. You basically have the computer run through every possibility inside a for loop, evaluating and counting up all the iterations total and also all the iterations that are a success, and then have it spit out the answers at the end. Your code might look something like this:

int wincount=0, totalcount=0;

for (int i=1; i<=6; i++) {

for (int j=1; j<=6; j++) {

for (int k=1; k<=6; k++) {

… // insert more loops here

if (i+j+k+… >= 15) {

wincount++;

}

totalcount++;

}

}

}

float probability = wincount/totalcount;

If you don’t know programming but you just need a ballpark answer and not an exact one, you can simulate it in Excel by having it roll 8d6 a few thousand times and take the results. To roll 1d6 in Excel, use this formula:

=FLOOR(RAND()*6)+1

When you don’t know the answer so you just try it a lot, there’s a name for that: Monte Carlo simulation, and it’s a great thing to fall back on when you’re trying to do probability calculations and you find yourself in over your head. The great thing about this is, we don’t have to know the math behind why it works, and yet we know the answer will be “pretty good” because like we learned before, the more times you do something, the more it tends towards the average.

Combining Independent Trials

If you’re asking for several repeated but independent trials, so the result of one roll doesn’t affect other rolls, we have an extra trick we can use that makes things a little easier.

How do you tell the difference between something that’s dependent and something that’s independent? Basically, if you can isolate each individual die-roll (or series) as a separate event, then it is independent. For example, rolling a total of 15 on 8d6 is not something that can be split up into several independent rolls. Since you’re summing the dice together, what you get on one die affects the required results of the others, because it’s only all of them added together to give you a single result.

Here’s an example of independent rolls: say you have a dice game where you roll a series of d6s. On the first roll, you have to get 2 or higher to stay in the game. On the second roll you have to get 3 or higher. Third roll requires 4 or higher, fourth roll is 5 or higher, and fifth roll requires a 6. If you make all five rolls successfully, you win. Here, the rolls are independent. Yes, if you fail one roll it affects the outcome of the entire dice game, but each individual roll is not influenced by the others; for example, if you roll really well on your second roll, that doesn’t make you any more or less likely to succeed on future rolls. Because of this, we can consider the probability of each roll separately.

When you have separate, independent probabilities and you want to know what is the probability that all of them will happen, you take each of the individual probabilities and multiply them together. Another way to think about this: if you use the word “and” to describe several conditions (as in, “what is the probability that some random event happens and that some other independent random event happens?”), figure out the individual probabilities and multiply.

No matter what you do, do not ever add independent probabilities together. This is a common mistake. To see why it doesn’t work, imagine a 50/50 coin flip, and you’re wondering what is the probability that you’ll get Heads twice in a row. The probability of each is 50%, so if you add those together you’d expect a 100% chance of getting Heads, but we know that’s not true, because you could get Tails twice. If instead you multiply, you get 50%*50% = 25%, which is the correct probability of getting Heads twice.

Example

Let’s go back to our d6 game where you have to roll higher than 2, then higher than 3, and so on up to 6. In this series of 5 rolls, what are the chances you make all of them?

As we said before, these are independent trials, so we just compute the odds of each roll and then multiply together. The first roll succeeds 5/6 times. The second roll, 4/6. The third, 3/6. The fourth, 2/6, and the fifth roll 1/6. Multiplying these together, we get about 1.5%… So, winning this game is pretty rare, so you’d want a pretty big jackpot if you were putting that in your game.

Negation

Here’s another useful trick: sometimes it’s hard to compute a probability directly, but you can figure out the chance that the thing won’t happen much easier.

Here’s an example: suppose we make another game where you roll 6d6, and you win if you roll at least one 6. What are your chances of winning?

There are a lot of things to compute here. You might roll a single 6, which means one of the dice is showing 6 and the others are all showing 1-5, and there are 6 different ways to choose which die is showing 6. And then you might roll two 6s, or three, or more, and each of those is a separate computation, and it gets out of hand pretty quickly.

However, there’s another way to look at this, by turning it around. You lose if none of the dice are showing 6. Here we have six independent trials, each of which has a probability of 5/6 (the die can roll anything except 6). Multiply those together and you get about 33%. So you have about a 1 in 3 chance of losing.

Turning it around again, that means a 67% (or 2 in 3) chance of winning.

The most obvious lesson here is that if you take a probability and negate it, just subtract from 100%. If the odds of winning are 67%, the odds of not winning are 100% minus 67%, or 33%. And vice versa. So if you can’t figure out one thing but it’s easy to figure out the opposite, figure out that opposite and then subtract from 100%.

Combining Conditions Within a Single Independent Trial

A little while ago, I said you should never add probabilities together when you’re doing independent trials. Are there any cases where you can add probabilities together? Yes, in one special situation.

When you are trying to find the probability for several non-overlapping success criteria in a single trial, add them together. For example, the probability of rolling a 4, 5 or 6 on 1d6 is equal to the probability of rolling 4 plus the probability of rolling 5 plus the probability of rolling 6. Another way of thinking about this: when you use the word “or” in your probability (as in, “what is the probability that you will get a certain result or a different result from a single random event?”), figure out the individual probabilities and add them together.

One important trait here is that you add up all possible outcomes for a game, the combined probabilities should add up to exactly 100%. If they don’t, you’ve done your math wrong, so this is a good reality check to make sure you didn’t miss anything. For example, if you were analyzing the probability of getting all of the hands in Poker, if you add them all up you should get exactly 100% (or at least really close – if you’re using a calculator you might get a very slight rounding error, but if you’re doing exact numbers by hand it should be exact). If you don’t, it means there are probably some hands that you haven’t considered, or that you got the probabilities of some hands wrong, so you need to go back and check your figures.

Uneven Probabilities

So far, we’ve been assuming that every single side on a die comes up equally often, because that’s how dice are supposed to work. But occasionally you end up with a situation where there are different outcomes that have different chances of coming up. For example, there’s this spinner in one of the Nuclear War card game expansions that modifies the result of a missile launch: most of the time it just does normal damage plus or minus a few, but occasionally it does double or triple damage, or blows up on the launchpad and damages you, or whatever. Unlike the spinner in Chutes & Ladders or A Game of Life, the Nuclear War spinner has results that are not equally probable. Some results have very large sections where the spinner can land so they happen more often, while other results are tiny slivers that you only land on rarely.

Now, at first glance this is sort of like that 1, 1, 1, 2, 2, 3 die we were talking about earlier, which was sort of like a weighted 1d3, so all we have to do is divide all these sections evenly, find the smallest unit that everything is a multiple of, and then make this into a d522 roll (or whatever) with multiple sides of the die showing the same thing for the more common results. And that’s one way to do it, and that would technically work, but there’s an easier way.

Let’s go back to our original single standard d6 roll. For a normal die, we said to add up all of the sides and then divide by the number of sides, but what are we really doing there? We could say this another way. For a 6-sided die, each side has exactly 1/6 chance of being rolled. So we multiply each side’s result by the probability of that result (1/6 for each side in this case), then add all of these together. Doing this, we get (1*1/6) + (2*1/6) + (3*1/6) + (4*1/6) + (5*1/6) + (6*1/6), which gives us the same result (3.5) as we got before. And really, that’s what we’re doing the whole time: multiplying each outcome by the probability of that outcome.

Can we do this with the Nuclear War spinner? Sure we can. And this will give us the average result if we add these all together. All we have to do is figure out the probability of each spin, and multiply by the result.

Another Example

This technique of computing expected value by multiplying each result by its individual probability also works if the results are equally probable but weighted differently, like if you’re rolling dice but you win more on some rolls than others. As an example, here’s a game you might be able to find in some casinos: you place a wager, and roll 2d6. If you roll the lowest three numbers (2, 3 or 4) or the highest four numbers (9, 10, 11 or 12), you win an amount equal to your wager. The extreme ends are special: if you roll 2 or 12, you win double your wager. If you roll anything else (5, 6, 7 or 8), you lose your wager. This is a pretty simple game. But what is the chance of winning?

We can start by figuring out how many times you win:

  • There are 36 ways to roll 2d6, total. How many of these are winning rolls?
  • There’s 1 way to roll two, and 1 way to roll twelve.
  • There are 2 ways to roll three and eleven.
  • There are 3 ways to roll four, and 3 more ways to roll ten.
  • There are 4 ways to roll nine.
  • Adding these all up, there are 16 winning rolls out of 36.

So, under normal conditions, you win 16 times out of 36… slightly less than 50%.

Ah, but two of those times, you win twice as much, so that’s like winning twice! So if you play this game 36 times with a wager of $1 each time, and roll each possible roll exactly once, you’ll win $18 total (you actually win 16 times, but two of those times it counts as two wins). Since you play 36 times and win $18, does that mean these are actually even odds?

Not so fast. If you count up the number of times you lose, there are 20 ways to lose, not 18. So if you play 36 times for $1 each, you’ll win a total of $18 from the times when you win… but you’ll lose $20 from the twenty times you lose! As a result, you come out very slightly behind: you lose $2 net, on average, for every 36 plays (you could also say that on average, you lose 1/18 of a dollar per play). You can see how easy it is to make one misstep and get the wrong probability here!

Permutations

So far, all of our die rolls assume that order doesn’t matter. Rolling a 2+4 is the same as rolling a 4+2. In most cases, we just manually count the number of different ways to do something, but sometimes that’s impractical and we’d like a math formula.

Here’s an example problem from a dice game called Farkle. You start each round by rolling 6d6. If you’re lucky enough to roll one of each result, 1-2-3-4-5-6 (a “straight”), you get a huge score bonus. What’s the probability that will happen? There are a lot of different ways to have one of each!

The answer is to look at it this way: one of the dice (and only one) has to be showing 1. How many ways are there to do that? Six – there are 6 dice, and any of them can show the 1. Choose that and put it aside. Now, one of the remaining dice has to show 2. There’s five ways to do that. Choose it and put it aside. Continuing along these lines, four remaining dice can show 3, three dice of the remaining ones after that can show 4, two of the remaining dice after that can show 5, and at the end you’re left with a single die that must show 6 (no choice involved in that last one). To figure out how many ways there are to roll a straight, we multiply all the different, independent choices: 6x5x4x3x2x1 = 720 – that seems like a lot of ways to roll a straight.

To get the probability of rolling a straight, we have to divide 720 by the number of ways to roll 6d6, total. How many ways can we do that? Each die can show 6 sides, so we multiply 6x6x6x6x6x6 = 46656 (a much larger number!). Dividing 720/46656 gives us a probability of about 1.5%. If you were designing this game, that’s good to know so you can design the scoring system accordingly. We can see why Farkle gives you such a high score bonus for rolling a straight; it only happens very rarely!

This result is interesting for another reason. It shows just how infrequently we actually roll exactly according to probability in the short term. Sure, if we rolled a few thousand dice, we would see about as many of each of the six numbers on our rolls. But rolling just six dice, we almost never roll exactly one of each! We can see from this, another reason why expecting dice to roll what hasn’t been rolled yet “because we haven’t rolled 6 in awhile so we’re about due” is a fool’s game.

Dude, Your Random Number Generator Is Broken…

This brings us to a common misunderstanding of probability: the assumption that everything is split evenly in the short term, which it isn’t. In a small series of die-rolls, we expect there to be some unevenness.

If you’ve ever worked on an online game with some kind of random-number generator before, you’ve probably heard this one: a player writes tech support to tell you that your random number generator is clearly broken and not random, and they know this because they just killed 4 monsters in a row and got 4 of the exact same drop, and those drops are only supposed to happen 10% of the time, so this should almost never happen, so clearly your die-roller is busted.

You do the math. 1/10 * 1/10 * 1/10 * 1/10 is 1 in 10,000, which is pretty infrequent. This is what the player is trying to tell you. Is there a problem?

It depends. How many players are on your server? Let’s say you’ve got a reasonably popular game, and you get 100,000 daily players. How many of those kill four monsters in a row? Maybe all of them, multiple times per day, but let’s be conservative and say that half of them are just there to trade stuff in the auction house or chat on the RP servers or whatever, so only half of them actually go out monster-hunting. What’s the chance this will happen to someone? On a scale like that, you’d expect it to happen several times a day, at least!

Incidentally, this is why it seems like every few weeks at least, someone wins the lottery, even though that someone is never you or anyone you know. If enough people play each week, the odds are you’ll have at least one obnoxiously lucky sap somewhere… but that if you play the lottery, you’ve got worse odds of winning than your odds of being hired at Infinity Ward.

Cards and Dependence

Now that we’ve talked about independent events like die-rolling, we have a lot of powerful tools to analyze the randomness of many games. Things get a little more complicated when we talk about drawing cards from a deck, because each card you draw influences what’s left in the deck. If you have a standard 52-card deck and draw, say, the 10 of Hearts, and you want to know the probability that the next card is also a heart, the odds have changed because you’ve already removed a heart from the deck. Each card that you remove changes the probability of the next card in the deck. Since each card draw is influenced by the card draws that came before, we call this dependent probability.

Note that when I say “cards” here I am talking about any game mechanic where you have a set of objects and you draw one of them without replacing, so in this case “deck of cards” is mechanically equivalent to a bag of tiles where you draw a tile and don’t replace it, or an urn where you’re drawing colored balls from (I’ve never actually seen a game that involves drawing balls from an urn, but probability professors seem to have a love of them for some reason).

Properties of Dependence

Just to be clear, with cards I’m assuming that you are drawing cards, looking at them, and removing them from the deck. Each of these is an important property.

If I had a deck with, say, six cards numbered 1 through 6, and I shuffled and drew a card and then reshuffled all six cards between card draws, that is equivalent to a d6 die roll; no result influences the future ones. It’s only if I draw cards and don’t replace them that pulling a 1 on my first roll makes it more likely I’ll draw 6 next time (and it will get more and more likely until I finally draw it, or until I reshuffle).

The fact that we are looking at the cards is also important. If I pull a card from the deck but don’t look at it, I have no additional information, so the probabilities haven’t really changed. This is something that may sound counterintuitive; how does just flipping a card over magically change the probabilities? But it does, because you can only calculate the probability of unknown stuff based on what you do know. So, for example, if you shuffle a standard deck, reveal 51 cards and none of them is the Queen of Clubs, you know with 100% certainty that this is what the missing card is. If instead you shuffle a standard deck, and take 51 cards away without revealing them, the probability that the last card is the Queen of Clubs is still 1/52. For each additional card you reveal, you get more information.

Calculating probabilities for dependent events follows the same principles as independent, except it’s a little trickier because the probabilities are changing whenever you reveal a card. So you have to do a lot of multiplying different things together rather than multiplying the same thing against itself if you want to repeat a challenge. Really, all this means is we have to put together everything we’ve done already, in combination.

Example

You shuffle a standard 52-card deck, and draw two cards. What’s the probability that you’ve drawn a pair? There are a few ways to compute this, but probably the easiest is to say this: what’s the probability that the first card you draw makes you totally ineligible to draw a pair? Zero, so the first card doesn’t really matter as long as the second card matches it. No matter what we draw for our first card, we’re still in the running to draw a pair, so we have a 100% chance that we can still get a pair after drawing the first card.

What’s the probability that the second card matches? There are 51 cards remaining in the deck, and 3 of them match (normally it’d be 4 out of 52, but you already removed a “matching” card on your first draw!) so the probability ends up being exactly 1/17. (So, the next time that guy sitting across the table from you in Texas Hold ‘Em says “wow, another pocket pair? Must be my lucky day” you know there’s a pretty good chance he’s bluffing.)

What if we add two jokers so it’s now a 54-card deck, and we still want to know the chance of drawing a pair? Occasionally your first card will be a joker, and there will only be one matching card in the rest of the deck, rather than 3. How do we figure this out? By splitting up the probabilities and then multiplying each possibility.

Your first card is either going to be a Joker, or Something Else. Probability of a Joker is 2/54, probability of Something Else is 52/54.

If the first card is a Joker (2/54), then the probability of a match on the second card is 1/53. Multiplying these together (we can do that since they’re separate events and we want both to happen), we have 1/1431 – less than a tenth of a percent.

If the first card is Something Else (52/54), the probability of a match on the second card is up to 3/53. Multiplying these together, we have 78/1431 (a little more than 5.5%).

What do we do with these two results? Since they do not overlap, and we want to know the probability of either of them, we add! 79/1431 (still around 5.5%) is the final answer.

If we really wanted to be careful, we could calculate the probability of all other possible results: drawing a Joker and not matching, or drawing Something Else and not matching, and adding those together with the probability of winning, and we should get exactly 100%. I won’t do the math here for you, but feel free to do it yourself to confirm.

The Monty Hall Problem

This brings us to a pretty famous problem that tends to really confuse people, called the Monty Hall problem. It’s called that because there used to be this game show called Let’s Make a Deal, with your host, Monty Hall. If you’ve never seen the show, it was sort of like this inverse The Price Is Right. In The Price Is Right, the host (used to be Bob Barker, now it’s… Drew Carey? Anyway…) is your friend. He wants to give away cash and fabulous prizes. He tries to give you every opportunity to win, as long as you’re good at guessing how much their sponsored items actually cost.

Monty Hall wasn’t like that. He was like Bob Barker’s evil twin. His goal was to make you look like an idiot on national television. If you were on the show, he was the enemy, you were playing a game against him, and the odds were stacked in his favor. Maybe I’m being overly harsh, but when your chance of being selected as a contestant seems directly proportional to whether you’re wearing a ridiculous-looking costume, I tend to draw these kinds of conclusions.

Anyway, one of the biggest memes from the show was that you’d be given a choice of three doors, and they would actually call them Door Number 1, Door Number 2 and Door Number 3. They’d give you a door of your choice… for free! Behind one door, you’re told, is a fabulous prize like a Brand New Car. Behind the other doors, there’s no prize at all, no nothing, those other two doors are worthless. Except the goal is to humiliate you, so they wouldn’t just have an empty door, they’d have something silly-looking back there like a goat, or a giant tube of toothpaste, or something… something that was clearly not a Brand New Car.

So, you’d pick your door, and Monty would get ready to reveal if you won or not… but wait, before we do that, let’s look at one of the other doors that you didn’t choose. Since Monty knows where the prize is, and there’s only one prize and two doors you didn’t choose, no matter what he can always reveal a door without a prize. Oh, you chose Door Number 3? Well, let’s reveal Door Number 1 to show you that there was no prize there. And now, being the generous guy he is, he gives you the chance to trade your Door Number 3 for whatever’s behind Door Number 2 instead. And here’s where we get into probability: does switching doors increase your chance of winning, or decrease it, or is it the same? What do you think?

The real answer is that switching increases your chance of winning from 1/3 to 2/3. This is counterintuitive. If you haven’t seen this problem before, you’re probably thinking: wait, just by revealing a door we’ve magically changed the odds? But as we saw with our card example earlier, that is exactly what revealed information does. Your odds of winning with your first pick are obviously 1/3, and I think everyone here would agree to that. When that new door is revealed, it doesn’t change the odds of your first pick at all – it’s still 1/3 –but that means the other door now has a 2/3 chance of being the right one.

Let’s look at it another way. You choose a door. Chance of winning: 1/3. I offer to swap you for both of the other doors, which is basically what Monty Hall is doing. Sure, he reveals one of them to not be a prize, but he can always do that, so that doesn’t really change anything. Of course you’d want to switch!

If you’re still wondering about this and need more convincing, clicking here will take you to a wonderful little Flash app that lets you explore this problem. You can actually play, starting with something like 10 doors and eventually working down your way to 3; there’s also a simulator where you can give it any number of doors from 3 to 50 and just play on your own, or to have it actually run a few thousand simulations and give you how many times you would have won if you stayed versus when you switched.

Monty Hall, Redux

Now, in practice on the actual show, Monty Hall knew this, because he was good at math even if his contestants weren’t. So here’s what he’d do to change the game a little. If you picked the door with the prize behind it, which does happen 1/3 of the time, he’d always offer you the chance to switch. After all, if you’ve got a car and then you give it away for a goat, you’re going to look pretty dumb, which is exactly what he wants, because that’s the kind of evil guy he is. But if you pick a door with no prize behind it, he’ll only offer you the chance to switch about half of those times, and the other half he’ll just show you your Brand New Goat and boot you off the stage. Let’s analyze this new game, where Monty can choose whether or not to give you the chance to switch.

Suppose he follows this algorithm: always let you switch if you picked the door with the car, otherwise he has a 50/50 chance of giving you your goat or giving you the chance to switch. Now what are your chances of winning?

1/3 of the time, you pick the prize right away and he offers you to switch.

Of the remaining 2/3 of the time (you pick wrong initially), half of the time he’ll offer to switch, half the time he won’t. Half of 2/3 is 1/3, so basically 1/3 of the time you get your goat and leave, 1/3 of the time you picked wrong and he offers the switch, and 1/3 of the time you picked right and he offers the switch.

If he offers an exchange, we already know that the 1/3 of the time when he gives you your goat and you leave didn’t happen. That is useful information, because it means our chance of winning has now changed. Of the 2/3 of the time where we’re given a choice, 1/3 means we guessed right, and the other 1/3 means we guessed wrong, so if we’re given a choice at all it means our probability of winning is now 50/50, and there’s no mathematical advantage to keeping or switching.

Like in Poker, this is no longer a game of math and now a game of psychology. Did Monty offer you a choice because he thinks you’re a sucker who doesn’t know that switching is the “right” choice, and that you’ll stubbornly hold onto the door you picked because psychologically it’s worse to have a car and then lose it? Or does he think you’re smart and that you’ll switch, and he’s offering you the chance because he knows you guessed right at the beginning and you’ll take the bait and fall into his trap? Or maybe he’s being uncharacteristically nice, and goading you into doing something in your own best interest, because he hasn’t given away a car in awhile and his producers are telling him the audience is getting bored and he’d better give away a big prize soon so their ratings don’t drop?

In this way, Monty manages to offer a choice (sometimes) while still keeping the overall probability of winning at 1/3. Remember, a third of the time you’ll just lose outright. A third of the time you’ll guess right initially, and 50% of that time you’ll win (1/3 x 1/2 = 1/6). And a third of the time, you’ll guess wrong initially but be given the choice to switch, and 50% of that time you’ll win (also 1/6). Add the two non-overlapping win states together and you get 1/3, so whether you switch or stay your overall odds are 1/3 throughout the whole game… no better than if you just guessed and he showed you the door, without any of this switching business at all! So the point of offering to switch doors is not done for the purpose of changing the odds, but simply because drawing out the decision makes for more exciting television viewing.

Incidentally, this is one of the same reasons Poker can be so interesting, is that most of the formats involve slowly revealing cards in between rounds of betting (like the Flop, Turn and River in Texas Hold ‘Em), because you start off with a certain probability of winning and that probability is changing in between each betting round as more cards are revealed.

The Sibling Problem

And that brings us to another famous problem that tends to throw people, the Siblings problem. This is about the only thing I’m writing about today that isn’t directly related to games (although I guess that just means I should challenge you to come up a game mechanic that uses this). It’s more a brain teaser, but a fun one, and in order to solve it you really have to be able to understand conditional probability like we’ve been talking about.

The question is this: I have a friend with two kids, and at least one of them is a girl. What is the probability that the other one is also a girl? Assume that in the normal human population, there’s a 50/50 chance of having a boy or a girl, and assume that this is universally true for any child (in reality some men actually do produce more X or Y sperm, so that would skew the odds a bit where if you know one of their kids is already a girl, that the odds are slightly higher they’ll have more girls, and then there are conditions like hermaphrodism, but for our purposes let’s ignore that and assume that each kid is an independent trial with an equal chance of being male or female).

Intuitively, since we’re dealing with a core 1/2 chance, we would expect the answer would be something like 1/2 or 1/4 or some other nice, round number that’s divisible by 2. The actual answer is 1/3. Wait, what?

The trick here is that the information we were given narrows down the possibilities. Let’s say the parents are Sesame Street fans and so no matter what the sex, they name their kids A and B. Under normal conditions, there are four possibilities that are equally likely: A and B are both boys, A and B are both girls, A is boy and B is girl, or A is girl and B is boy. Since we know at least one of them is a girl, we can eliminate the possibility that A and B are both boys, so we have three (still equally likely) scenarios remaining. Since they’re equally likely and there are three of them, we know each one has a probability of 1/3. Only one of those three scenarios involves two girls, so the answer is 1/3.

The Sibling Problem, Redux

It gets weirder. Suppose instead I tell you my friend has two children, and one is a girl who was born on a Tuesday. Assume that under normal conditions, a child is equally likely to be born on any of the seven days of the week. What’s the probability the other child is also a girl? You’d think the answer would still be 1/3; what does Tuesday have to do with anything? But again, intuition fails. The actual answer is 13/27, which isn’t just unintuitive, it’s plain old weird-looking. What’s going on here?

Tuesday actually changes the odds, again because we don’t know which child it was, or if both children were born on Tuesday. By the same logic as earlier, we count all valid combinations of children where at least one is a Tuesday girl. Again assuming the children are named A and B, the combinations are:

  • A is a Tuesday girl, B is a boy (there are 7 possibilities here, one for each day of the week that B could be born on).
  • B is a Tuesday girl, A is a boy (again, 7 possibilities).
  • A is a Tuesday girl, B is a girl born on a different day of the week (6 possibilities).
  • B is a Tuesday girl, A is a non-Tuesday girl (again, 6 possibilities).
  • A and B are both girls born on Tuesday (1 possibility, but we have to take care not to double-count this).

Adding it up, there are 27 different, equally likely combinations of children and days with at least one Tuesday girl. Of those, 13 possibilities involve two girls. Again, this is totally counterintuitive, and apparently designed for no other reason than to make your brain hurt. If you’re still scratching your head, ludologist Jesper Juul has a nice explanation of this problem on his website.

If You’re Working on a Game Now…

If a game you’re designing has any randomness, this is a great excuse to analyze it. Choose a random element you want to analyze. For that element, first ask yourself what kind of probability you’re expecting to see, what makes sense to you in the context of the game. For example, if you’re making an RPG and looking at the probability that the player will hit a monster in combat, ask yourself what to-hit percentage feels right to you. Usually in console RPGs, misses by the player are very frustrating, so you wouldn’t usually want them to miss a lot… maybe 10% of the time or less? If you’re an RPG designer you probably know better than I, but you should have some basic idea of what feels right.

Then, ask yourself if this is something that’s dependent (like cards) or independent (like dice). Break down all possible results, and the probabilities of each. Make sure your probabilities sum to 100%. And lastly, of course, compare the actual numbers to the numbers you were expecting. Is this particular random die-roll or card-draw acting how you want it to, or do you see signs that you need to adjust the numbers? And of course, if you do find something to adjust, you can use these same calculations to figure out exactly how much to adjust it!

Homework

Your “homework” this week is meant to help you practice your probability skills. I have two dice games and a card game for you to analyze using probability, and then a weird mechanic from a game I once worked on that provides a chance to try out a Monte Carlo simulation.

Game #1: Dragon Die

This is a dice game that I invented with some co-workers one day (thanks Jeb Havens and Jesse King!) specifically to mess with people’s heads on probability. It’s a simple casino game called Dragon Die, and it’s a dice gambling contest between you and the House. You are given a standard 1d6, and you roll it. You’re trying to roll higher than the House. The House is given a non-standard 1d6 – it’s similar to yours, but instead of a 1 it has a Dragon on it (so the House die is Dragon-2-3-4-5-6). If the House rolls a Dragon, then the House automatically wins and you automatically lose. If you both roll the same number, it’s a push, and you both re-roll. Otherwise, the winner is whoever rolls highest.

Obviously, the odds are slightly against the player here, because the House has this Dragon advantage. But how much of an advantage is it? You’re going to calculate it. But first, before you do, exercise your intuition. Suppose I said this game was offered with a 2 to 1 payout. That is, if you win, you keep your bet and get twice your bet in winnings. So, if you bet $1 and win, you keep your $1 and get $2 extra, for a total of $3. If you lose, you just lose your standard bet. Would you play? That is, intuitively, do you think the odds are better or worse than 2 to 1? Said another way, for every 3 games you play, do you expect to win more than once, or less than once, or exactly once, on average?

Once you’ve used your intuition, do the math. There are only 36 possibilities for both dice, so you should have no problem counting them all up. If you’re not sure about this “2 to 1” business, think of it this way: suppose you played the game 36 times (wagering $1 each time). A win nets you $2 up, a loss causes you to lose $1, and a push is no change. Count up your total winnings and losses and figure out if you come out ahead or behind. And then ask yourself how close your intuition was. And then realize how evil I am.

And yes, if you’re wondering, the actual dice-roll mechanics here are something I’m intentionally obfuscating, but I’m sure you’ll all see through that once you sit down and look at it. Try and solve it yourself. I’ll post all answers here next week.

Game #2: Chuck-a-Luck

There is a gambling dice game called Chuck-a-Luck (also known as Birdcage, because sometimes instead of rolling dice they’re placed in a wire cage that somewhat resembles a Bingo cage). This is a simple game that works like this: place your bet (say, $1) on any number from 1 to 6. You then roll 3d6. For each die that your number shows up on, you get $1 in winnings (and you get to keep your original bet). If no dice show your number, the house takes your $1 and you get nothing. So, if you place on 1 and you roll triple 1s, you actually win $3.

Intuitively, it seems like this is an even-odds game. Each die is individually a 1/6 chance of winning, so adding all three should give you a 3/6 chance of winning. But of course, if you calculate that way you’re adding when these are separate die-rolls, and remember, you’re only allowed to add if you’re talking about separate win conditions from the same die. You need to be multiplying something.

When you count out all possible results (you’ll probably find it easier to do this in Excel than by hand since there are 216 results), it still looks at first like an even-odds game. But in reality, the odds of winning are actually slightly in favor of the House; how much? In particular, on average, how much money do you expect to lose each time you play this game? All you have to do is add up the gains and losses for all 216 results, then divide by 216, so this should be simple… but as you’ll see, there are a few traps you can fall into, which is why I’m telling you right now that if you think it’s even-odds, you’ve got it wrong.

Game #3: 5-Card Stud Poker

When you’ve warmed up with the previous two exercises, let’s try our hand at dependent probability by looking at a card game. In particular, let’s assume Poker with a 52-card deck. Let’s also assume a variant like 5-card Stud where each player is dealt 5 cards, and that’s all they get. No ability to discard and draw, no common cards, just a straight-up you get 5 cards and that’s what you get.

A “Royal Flush” is the 10-J-Q-K-A in the same suit, and there are four suits, so there are four ways to get a Royal Flush. Calculate the probability that you’ll get one.

One thing I’ll warn you about here: remember that you can draw those five cards in any order. So you might draw an Ace first, or a Ten, or whatever. So the actual way you’ll be counting these, there are actually a lot more than 4 ways to get dealt a Royal Flush, if you consider the cards to be dealt sequentially!

Game #4: IMF Lottery

This fourth question is one that can’t easily be solved through the methods we’ve talked about today, but you can simulate it pretty easily, either with programming or with some fudging around in Excel. So this is a way to practice your Monte Carlo technique.

In a game I worked on that I’ve mentioned before called Chron X, there was this really interesting card called IMF Lottery. Here’s how it worked: you’d put it into play. At the end of your turn, the game would roll a percentile, and there was a 10% chance it would leave play, and a random player would gain 5 of each resource type for every token on the card. The card didn’t start with any tokens, but if it stayed around then at the start of each of your turns, it gained a token. So, there is a 10% chance you’ll put it into play, end your turn, and it’ll leave and no one gets anything. If that doesn’t happen (90% chance), then there is a further 10% chance (actually 9% at this point, since it’s 10% of 90%) that on the very next turn, it’ll leave play and someone will get 5 resources. If it leaves play on the turn after that (10% of the remaining 81%, so 8.1% chance) someone gets 10 resources, then the next turn it would be 15, then 20, and so on. The question is, what is the expected value of the number of total resources you’ll get from this card when it finally leaves play?

Normally, we’d approach this by finding the probability of each outcome, and multiplying by the outcome. So there is a 10% chance you get 0 (0.1*0 = 0). There’s a 9% chance you get 5 resources (that’s 9%*5 = 0.45 resources). There’s an 8.1% chance you get 10 resources (8.1%*10 = 0.81 resources total, expected value). And so on. And then we add all of these up.

Now, you can quickly see a problem: there is always going to be a chance that it will not leave play, so this could conceivably stay in play without leaving forever, for an infinite number of turns, so there’s no actual way to write out every single possibility. The techniques we learned today don’t give us a way to deal with infinite recursion, so we’ll have to fake it.

If you know enough programming or scripting to feel comfortable doing this, write a program to simulate this card. You should have a while loop that initializes a variable to zero, rolls a random number, and 10% of the time it exits the loop. Otherwise it adds 5 to the variable, and iterates. When it finally breaks out of the loop, have it increment the total number of trials by 1, and the total number of resources by whatever the variable ended up as. Then, re-initialize the variable and try again. Run this a few thousand times. At the end, divide total number of resources by total number of trials, and that’s your Monte Carlo expected value. Run the program a few times to see if the numbers you’re getting are about the same; if there’s still a lot of variation in your final numbers, increase the number of iterations in the outer loop until you start getting some consistency. And you can be pretty sure that whatever you come up with is going to be about right.

If you don’t know programming (or even if you do), this is an excuse to exercise your Excel skills. You can never have enough Excel skills as a game designer.

Here you’ll want to make a good use of the IF and RAND statements. RAND takes no values, it just returns a random decimal number between 0 and 1. Usually we combine it with FLOOR and some plusses or minuses to simulate a die roll, as I mentioned earlier. In this case, though, we just have a 10% check for the card leaving play, so we can just check if RAND is less then 0.1 and not mess with this other stuff.

IF takes in three values. In order: a condition that’s either true or false, and then a value to return if it’s true, and then a value to return if it’s false. So the following statement will return 5 ten percent of the time, and 0 the other ninety percent of the time:

=IF(RAND()<0.1,5,0)

There are a lot of ways to set this up, but if I were doing it, I’d use a formula like this for the cell that represents the first turn, let’s say this is cell A1:

=IF(RAND()<0.1,0,-1)

Here I’m using negative one as shorthand for “this card hasn’t left play and given out any resources yet.” So if the first turn ended and the card left play right away, A1 would be zero; otherwise it’s -1.

For the next cell, representing the second turn:

=IF(A1>-1, A1, IF(RAND()<0.1,5,-1))

So if the first turn ended and the card left play right away, A1 would be 0 (number of resources), and this cell would just copy that value. Otherwise A1 would be -1 (hasn’t left play yet), and this cell proceeds to roll randomly again: 10% of the time it returns 5 (for the 5 resources), the rest of the time it is still -1. Continuing this formula for additional cells simulates additional turns, and whatever cell is at the end gives you a final result (or -1 if it never left play after all of the turns you’re simulating).

Take this row of cells, which represents a single play of the card, and copy and paste for a few hundred (or a few thousand) rows. We might not be able to do an infinite test for Excel (there are only so many cells that fit in a spreadsheet), but we can at least cover the majority of cases. Then, have a single cell where you take the average of the results of all the turns (Excel helpfully provides the AVERAGE() function for this).

In Windows, at least, you can hit F9 to reroll all your random numbers. As before, do that a few times and see if the values you get are similar to each other. If there’s too much variety, double the number of trials and try again.

Unsolved Problems

If you happen to have a Ph.D. in Probability already and the problems above are too easy for you, here are two problems that I’ve wondered about for years, but I don’t have the math skills to solve them. If you happen to know how to do these, post as a comment; I’d love to know how.

Unsolved #1: IMF Lottery

The first unsolved problem is the previous homework. I can do a Monte Carlo simulation (either in C++ or Excel) pretty easily and be confident of the answer for how many resources you get, but I don’t actually know how to come up with a definitive, provable answer mathematically (since this is an infinite series). If you know how, post your math… after doing your own Monte Carlo simulation to verify the answer, of course.

Unsolved #2: Streaks of Face Cards

This problem, and again this is way beyond the scope of this blog post, is a problem I was posed by a fellow gamer over 10 years ago. They witnessed a curious thing while playing Blackjack in Vegas: out of an eight-deck shoe, they saw ten face cards in a row (a face card is 10, J, Q or K, so there are 16 of them in a standard 52-card deck, which means there are 128 of them in a 416-card shoe). What is the probability that there is at least one run of ten or more face cards, somewhere, in an eight-deck shoe? Assume a random, fair shuffle. (Or, if you prefer, what are the odds that there are no runs of ten or more face cards, anywhere in the sequence?)

You can simplify this. There’s a string of 416 bits. Each bit is 0 or 1. There are 128 ones and 288 zeros scattered randomly throughout. How many ways are there to randomly interleave 128 ones and 288 zeros, and how many of those ways involve at least one clump of ten or more 1s?

Every time I’ve sat down to solve this problem, it seems like it should be really easy and obvious at first, but then once I get into the details it suddenly falls apart and becomes impossible. So before you spout out a solution, really sit down to think about it and examine it, work out the actual numbers yourself, because every person I’ve ever talked to about this (and this includes a few grad students in the field) has had that same “it’s obvious… no, wait, it’s not” reaction. This is a case where I just don’t have a technique for counting all of the numbers. I could certainly brute-force it with a computer algorithm, but it’s the mathematical technique that I’d find more interesting to know.

Advertisements

Level 3: Transitive Mechanics and Cost Curves

July 21, 2010

Readings/Playings

None for this week, other than this post. This is a pretty long post, though, so it should be enough. As with last week, you’ll be doing a bit of outside research to compensate.

This Week’s Topic

This week is one of the most exciting for me, because we really get to dive deep into nuts-and-bolts game balance in a very tangible way. We’ll be talking about something that I’ve been doing for the past ten years, although until now I’ve never really written down this process or tried to communicate it to anyone. I’m going to talk about how to balance transitive mechanics within games.

As a reminder, intransitive is like Rock-Paper-Scissors, where everything is better than something else and there is no single “best” move. In transitive games, some things are just flat out better than others in terms of their in-game effects, and we balance that by giving them different costs, so that the better things cost more and the weaker things cost less in the game. How do we know how much to cost things? That is a big problem, and that is what we’ll be discussing this week.

Examples of Transitive Mechanics

Just to contextualize this, what kinds of games do we see that have transitive mechanics? The answer is, most of them. Here are some examples:

  • RPGs often have currency costs to upgrade your equipment and buy consumable items. Leveling is also transitive: a higher-level character is better than a lower-level character in nearly every RPG I can think of.
  • Shooters with progression mechanics like BioShock and Borderlands include similar mechanics. In BioShock, for example, there are costs to using vending machines to get consumable items, and you also spend ADAM to buy new special abilities; higher-level abilities are just better (e.g. doing more damage) than their lower-level counterparts, but they cost more to buy.
  • Professional sports in the real world do this with monetary costs: a player who is better at the game commands a higher salary.
  • Sim games (like The Sims and Sim City) have costs for the various objects you can buy, and often these are transitive. A really good bed in The Sims costs more than a cheap bed, but it also performs its function of restoring your needs much more effectively.
  • Retro arcade games generally have a transitive scoring mechanism. The more dangerous or difficult an enemy, the more points you get for defeating it.
  • Turn-based and real-time strategy games may have a combination of transitive and intransitive mechanics. Some unit types might be strong or weak against others inherently (in an intransitive fashion), like the typical “footmen beat archers, archers beat fliers, fliers beat footmen” comparison. However, you also often see a class of units that all behave similarly, but with stronger and more expensive versions of weaker ones… such as light infantry vs. heavy infantry.
  • Tower Defense games are often intransitive in that certain tower types are strong against certain kinds of attacks, like splash damage is strong against enemies that come clustered together (intransitive), but in most of these games the individual towers are upgradeable to stronger versions of themselves and the stronger versions cost more (transitive).
  • Collectible-card games are another example where there may be intransitive mechanics (and there are almost always intransitive elements to the metagame, thank goodness – that is, a better deck isn’t just “more expensive”), but the individual cards themselves generally have some kind of cost and they are all balanced according to that cost, so that more expensive cards are more useful or powerful.

You might notice something in common with most of these examples: in nearly all cases, there is some kind of resource that is used to buy stuff: Gil in Final Fantasy, Mana in Magic: the Gathering, ADAM in BioShock. Last week we talked about relating everything to a single resource in order to balance different game objects against each other, and as you might expect, this is an extension of that concept.

However, another thing we said last week is that a central resource should be the win or loss condition for the game, and we see that is no longer the case here (the loss condition for an RPG is usually running out of Hit Points, not running out of Gold Pieces). In games that deal with costs, it is common to make the central resource something artificially created for that purpose (some kind of “currency” in the game) rather than a win or loss condition, because everything has a monetary cost.

Costs and Benefits

With all that said, let’s assume we have a game with some kind of currency-like resource, and we want to balance two things where one might be better than the other but it costs more. I’ll start with a simple statement: in a transitive mechanic, everything has a set of costs and a set of benefits, and all in-game effects can be put in terms of one or the other.

When we think of costs we’re usually thinking in terms of resource costs, like a sword that costs 250 Gold. But when I use this term, I’m defining it more loosely to be any kind of drawback or limitation. So it does include resource costs, because that is a setback in the game. But for example, if the sword is only half as effective against demons, that is part of a cost as well because it’s less powerful in some situations. If the sword can only be equipped by certain character classes, that’s a limitation (you can’t just buy one for everyone in your party). If the sword disintegrates after 50 encounters, or if it does 10% of damage dealt back to the person wielding it, or if it prevents the wielder from using magic spells… I would call all of these things “costs” because they are drawbacks or limitations to using the object that we’re trying to balance.

If costs are everything bad, then benefits are everything good. Maybe it does a lot of damage. Maybe it lets you use a neat special ability. Maybe it offers some combination of increases to your various stats.

Some things are a combination of the two. What if the sword does 2x damage against dragons? This is clearly a benefit (it’s better than normal damage sometimes), but it’s also a limitation on that benefit (it doesn’t do double damage all the time, only in specific situations). Or maybe a sword prevents you from casting Level 1 spells (obviously a cost), but if most swords in the game prevent you from casting all spells, this is a less limiting limitation that provides a kind of net benefit. How do you know whether to call something a “cost” or a “benefit”? For our purposes, it doesn’t matter: a negative benefit is the same as a cost, and vice versa, and our goal is to equalize everything. We want the costs and benefits to be equal, numerically. Whether you add to one side or subtract from the other, the end result is the same.

Personally, I find it easiest to keep all numbers positive and not negative, so if something would be a “negative cost” I’ll call it a benefit. That way I only have to add numbers and never subtract them. Adding is easier. But if you want to classify things differently, go ahead; the math works out the same anyway.

So, this is the theory. Add up the costs for an object. Add up the benefits. The goal is to get those two numbers to be equal. If the costs are less than the benefits, it’s too good: add more costs or remove some benefits. If the costs are greater than the benefits, it’s too weak; remove costs or add benefits. You might be wondering how we would relate two totally different things (like a Gold cost and the number of Attack Points you get from equipping a sword). We will get to that in a moment. But first, there’s one additional concept I want to introduce.

Overpowered vs. Underpowered vs. Overcosted vs. Undercosted

Let’s assume for now that we can somehow relate everything back to a single resource cost so they can be directly compared. And let’s say that we have something that provides too many benefits for its costs. How do we know whether to reduce the benefits, increase the costs, or both?

In most cases, we can do either one. It is up to the designer what is more important: having the object stay at its current cost, or having it retain its current benefits. Sometimes it’s more important that you have an object within a specific cost range because you know that’s what the player can afford when they arrive in the town that sells it. Sometimes you just have this really cool effect that you want to introduce to the game, and you don’t want to mess with it. Figure out what you want to stay the same… and then change the other thing.

Sometimes, usually when you’re operating at the extreme edges of a system, you don’t get a choice. For example, if you have an object that’s already free, you just can’t reduce the cost anymore, so it is possible you’ve found an effect that is just too weak at any cost. Reducing the cost further is impossible, so you have no choice: you must increase the benefits. We have a special term for this: we say the object is underpowered, meaning that it is specifically the level of benefits (not the cost) that must be adjusted.

Likewise, some objects are just too powerful to exist in the game at any cost. If an object has an automatic “I win / you lose” effect, it would have to have such a high cost that it would be essentially unobtainable. In such cases we say it is overpowered, that is, that the level of benefits must be reduced (and that a simple cost increase is not enough to solve the problem).

Occasionally you may also run into some really unique effects that can’t easily be added to, removed, or modified; the benefits are a package deal, and the only thing you can really do is adjust the cost. In this case, we might call the object undercosted if it is too cheap, or overcosted if it is too expensive.

I define these terms because it is sometimes important to make the distinction between something that is undercosted and something that’s overpowered. In both cases the object is too good, but the remedy is different.

There is a more general term for an object that is simply too good (although the cost or benefits could be adjusted): we say it is above the curve. Likewise, an object that is too weak is below the curve. What do curves have to do with anything? We’ll see as we talk about our next topic.

Cost Curves

Let’s return to the earlier question of how to relate things as different as Gold, Attack Points, Magic Points, or any other kinds of stats or abilities we attach to an object. How do we compare them directly? The answer is to put everything in terms of the resource cost. For example, if we know that each point of extra Attack provides a linear benefit and that +1 Attack is worth 25 Gold, then it’s not hard to say that a sword that gives +10 Attack should cost 250 Gold. For more complicated objects, add up all the costs (after putting them in terms of Gold), add up all the benefits (again, converting them to their equivalent in Gold), and compare. How do you know how much each resource is worth? That is what we call a cost curve.

Yes, this means you have to take every possible effect in the game, whether it be a cost or a benefit, and find the relative values of all of these things. Yes, it is a lot of work up front. On the bright side, once you have this information about your game, creating new content that is balanced is pretty easy: just put everything into your formula and you can pretty much guarantee that if the numbers add up, it’s balanced.

Creating a Cost Curve

The first step, and the reason it’s called a “cost curve” and not a “cost table” or “cost chart” or “cost double-entry accounting ledger” is that you need to figure out a relationship between increasing resource costs and increasing benefits. After that, you need to figure out how all game effects (positive and negative) relate to your central resource cost. Neither of these is usually obvious.

Defining the relationship between costs and benefits

The two might scale linearly: +1 cost means +1 benefit. This relationship is pretty rare.

Costs might be on an increasing curve, where each additional benefit costs more than the last, so incremental gains get more and more expensive as you get more powerful. You see this a lot in RPGs, for example. The amount of currency you receive from exploration or combat encounters is increasing over time. As a result, if you’re getting more than twice as much Gold per encounter as you used to earlier, even if a new set of armor costs you twice as much as your old one, it would actually take you less time to earn the gold to upgrade. Additionally, the designer might want incremental games for other design reasons, such as to create more interesting choices. For example, if all stat gains cost the same amount, it’s usually an obvious decision to dump all of your gold into increasing your one or two most important stats while ignoring the rest; but if each additional point in a stat costs progressively more, players might consider exploring other options. Either way, you might see an increasing curve (such as a triangular or exponential curve), where something twice as good actually costs considerably more than twice as much.

Some games have costs on a decreasing curve instead. For example, in some turn-based strategy games, hoarding resources has an opportunity cost. In the short term, everyone else is buying stuff and advancing their positions, and if you don’t make purchases to keep up with them, you could fall hopelessly behind. This can be particularly true in games where purchases are limited: wait too long to buy your favorite building in Puerto Rico and someone else might buy it first; or, wait too long to build new settlements in Settlers of Catan and you may find that other people have built in the best locations. In cases like this, if the designer wants resource-hoarding to be a viable strategy, they must account for this opportunity cost by making something that costs twice as much be more than twice as good.

Some games have custom curves that don’t follow a simple, single formula or relationship. For example, in Magic: the Gathering, your primary resource is Mana and you generally are limited to playing one Mana-generating card per turn. If a third of your deck is cards that generate Mana, you’ll get (on average) one Mana-genrating card every three card draws. Since your opening hand is 7 cards and you typically draw one card per turn, this means a player would typically gain one Mana per turn for the first four turns, and then one mana every three turns thereafter. Thus, we might expect to see a shift in the cost curve at or around five mana, where suddenly each additional point of Mana is worth a lot more, which would explain why some of the more expensive cards have crazy-huge gameplay effects.

In some games, any kind of cost curve will be potentially balanced, but different kinds of curves have different effects. For example, in a typical Collectible Card Game, players are gaining new resources at a constant rate throughout the game. If a game has an increasing cost curve where higher costs give progressively smaller gains, it puts a lot of focus on the early game: cheap cards are almost as good as the more expensive ones, so bringing out a lot of forces early on provides an advantage over waiting until later to bring out only slightly better stuff. If instead you feature a decreasing cost curve where the cheap stuff is really weak and the expensive stuff is really powerful, this instead puts emphasis on the late game, where the really huge things dominate. You might have a custom curve that has sudden jumps or changes at certain thresholds, to guide the play of the game into definite early-game, mid-game and late-game phases. None of these are necessarily “right” or “wrong” in a universal sense. It all depends on your design goals, in particular your desired game length, number of turns, and overall flow of the gameplay.

At any rate, this is one of your most important tasks when balancing transitive systems: figuring out the exact nature of the cost curve, as a numeric relationship between costs and benefits.

Defining basic costs and benefits

The next step in creating a cost curve is to make a complete list of all costs and benefits in your game. Then, starting with common ones that are used a lot, identify those objects that only do one thing and nothing else. From there, try to figure out how much that one thing costs. (If you were unsure about the exact mathematical nature of your cost curve, something like this will probably help you figure that out.)

Once you’ve figured out how much some of the basic costs and benefits are worth, start combining them. Maybe you know how much it costs to have a spell that grants a damage bonus, and also how much it costs to have a spell that grants a defense bonus. What about a spell that gives both bonuses at the same time? In some games, the cost for a combined effect is more than their separate costs, since you get multiple bonuses for a single action. In other games, the combined cost is less than the separate costs, since both bonuses are not always useful in combination or might be situational. In other games, the combined cost is exactly the sum of the separate costs. For your game, get a feel for how different effects combine and how that influences their relative costs. Once you know how to cost most of the basic effects in your game and how to combine them, this gives you a lot of power. From there, continue identifying how much new things cost, one at a time.

At some point you will start also identifying non-resource costs (drawbacks and limitations) to determine how much they cost. Approach these the same way: isolate one or more objects where you know the numeric costs and benefits of everything except one thing, and then use basic arithmetic (or algebra, if you prefer) to figure out the missing number.

Another thing you’ll eventually need to examine are benefits or costs that have limitations stacked on them. If a benefit only works half of the time because of a coin-flip whenever you try to use it, is that really half of the cost compared to if it worked all the time, or is it more or less than half? If a benefit requires you to meet conditions that have additional opportunity costs (“you can only use this ability if you have no Rogues in your party”), what is that tradeoff worth in terms of how much it offsets the benefit?

An Example: Cost Curves in Action

To see how this works in practice, I’m going to use some analysis to derive part of the cost curve for Magic 2011, the recent set that was just promoted recently for Magic: the Gathering. The reason I’m choosing this game is that CCGs are among the most complicated games to balance in these terms – a typical base or expansion set may have hundreds of cards that need to be individually balanced – so if we can analyze Magic then we can use this for just about anything else. Note that by necessity, we’re going into spoiler territory here, so if you haven’t seen the set and are waiting for the official release, consider this your spoiler warning.

For convenience, we’ll examine Creature cards specifically, because they are the type of card that is the most easily standardized and directly compared: all Creatures have a Mana cost (this is the game’s primary resource), Power and Toughness, and usually some kind of special ability. Other card types tend to only have special, unique effects that are not easily compared.

For those of you who have never played Magic before, that is fine for our purposes. As you’ll see, you won’t need to understand much of the rules in order to go through this analysis. For example, if I tell you that the Flying ability gives a benefit equivalent to 1 mana, you don’t need to know (or care) what Flying is or what it does; all you need to know is that if you add Flying to a creature, the mana cost should increase by 1. If you see any jargon that you don’t recognize, assume you don’t need to know it. For those few parts of the game you do need to know, I’ll explain as we go.

Let us start by figuring out the basic cost curve. To do this, we first examine the most basic creatures: those with no special abilities at all, just a Mana cost, Power and Toughness. Of the 116 creatures in the set, 11 of them fall into this category (I’ll ignore artifact creatures for now, since those have extra metagame considerations).

Before I go on, one thing you should understand about Mana costs is that there are five colors of Mana: White (W), Green (G), Red (R), Black (B), and Blue (U). There’s actually a sixth “type” called colorless which means any color you want. Thus, something with a cost of “G4” means five mana, one of which must be Green, and the other four can be anything (Green or otherwise). We would expect that colored Mana has a higher cost than colorless, since it is more restrictive.

Here are the creatures with no special abilities:

  • W, 2/1 (that is, a cost of one White mana, power of 2, toughness of 1)
  • W4, 3/5
  • W1, 2/2
  • U4, 2/5
  • U1, 1/3
  • B2, 3/2
  • B3, 4/2
  • R3, 3/3
  • R1, 2/1
  • G1, 2/2
  • G4, 5/4

Looking at the smallest creatures, we immediately run into a problem with three creatures (I’m leaving the names off, since names aren’t relevant when it comes to balance):

  • W, 2/1
  • R1, 2/1
  • G1, 2/2

Apparently, all colors are not created equal: you can get a 2/1 creature for either W (one mana) or R1 (two mana), so an equivalent creature is cheaper in White than Red. Likewise, R1 gets you a 2/1 creature, but the equivalent-cost G1 gets you a 2/2, so you get more creature for Green than Red. This complicates our analysis, since we can’t use different colors interchangeably. Or rather, we could, but only if we assume that the game designers made some balance mistakes. (Such is the difficulty of deriving the cost curve of an existing game: if the balance isn’t perfect, and it’s never perfect, your math may be slightly off unless you make some allowances.) Either way, it means we can’t assume every creature is balanced on the same curve.

In reality, I would guess the designers did this on purpose to give some colors an advantage with creatures, to compensate for them having fewer capabilities in other areas. Green, for example, is a color that’s notorious for having really big creatures and not much else, so it’s only fair to give it a price break since it’s so single-minded. Red and Blue have lots of cool spell toys, so their creatures might be reasonably made weaker as a result.

Still, we can see some patterns here just by staying within colors:

  • W, 2/1
  • W1, 2/2
  • B2, 3/2
  • B3, 4/2

Comparing the White creatures, adding 1 colorless is equivalent to adding +1 Toughness. Comparing the Black creatures, adding 1 colorless mana is equivalent to adding +1 Power. We might guess, then, that 1 colorless (cost) = 1 Power (benefit) = 1 Toughness (benefit).

We can also examine similar creatures across colors to take a guess:

  • W, 2/1
  • R1, 2/1
  • W4, 3/5
  • U4, 2/5

From these comparisons, we might guess that Red and Blue seem to have an inherent -1 Power or -1 Toughness “cost” compared to White, Black and Green.

Is the cost curve linear, +1 benefit for each additional colored mana? It seems to be up to a point, but there appears to be a jump around 4 or 5 mana:

  • W, 2/1 (3 power/toughness for W)
  • W4, 3/5 (5 additional power/toughness for 4 additional colorless mana)
  • G1, 2/2 (4 power/toughness for G1)
  • G4, 5/4 (5 additional power/toughness for 3 additional colorless mana)

As predicted earlier, there may be an additional cost bump at 5 mana, since getting your fifth mana on the table is harder than the first four. Green seems to get a larger bonus than White.

From all of this work, we can take our first guess at a cost curve. Since we have a definite linear relationship between colorless mana and increased power/toughness, we will choose colorless mana to be our primary resource, with each point of colorless representing a numeric cost of 1. We know that each point of power and toughness provides a corresponding benefit of 1.

Our most basic card, W for 2/1, shows a total of 3 benefits (2 power, 1 toughness). We might infer that W must have a cost of 3. Or, using some knowledge of the game, we might instead guess that W has a cost of 2, and that all cards have an automatic cost of 1 just for existing – the card takes up a slot in your hand and your deck, so it should at least do something useful, even if its mana cost is zero, to justify its existence.

Our cost curve, so far, looks like this:

  • Cost of 0 provides a benefit of 1.
  • Increased total mana cost provides a linear benefit, up to 4 mana.
  • The fifth point of mana provides a double benefit (triple for Green), presumably to compensate for the difficulty in getting that fifth mana on the table.

Our costs are:

  • Baseline cost = 1 (start with this, just for existing)
  • Each colorless mana = 1
  • Each colored mana = 2
  • Total mana cost of 5 or more = +1 (or +2 for Green creatures)

Our benefits are:

  • +1 Power or +1 Toughness = 1
  • Being a Red or Blue creature = 1 (apparently this is some kind of metagame privilege).

We don’t have quite enough data to know if this is accurate. There may be other valid sets of cost and benefit numbers that would also fit our observations. But if these are accurate, we could already design some new cards.

How much would a 4/3 Blue creature cost? The benefit is 1 (Blue) + 4 (Power) + 3 (Toughness) = 8. Our baseline cost is 1, our first colored mana (U) is 2, and if we add four colorless mana that costs an extra 4… but that also makes for a total mana cost of 5, which would give an extra +1 to the cost for a total of 8. So we would expect the cost to be U4.

What would a 4/1 Green creature cost? The benefit is 5 (4 Power + 1 Toughness). A mana cost of G2 provides a cost of 5 (1 as a baseline, 2 for the colored G mana, and 2 for the colorless mana).

What if I proposed this card: W3 for a 1/4 creature. Is that balanced? We can add it up: the cost is 1 (baseline) + 2 (W) + 3 (colorless) = 6. The benefit is 1 (power) + 4 (toughness) = 5. So this creature is exactly 1 below the curve, and could be balanced by either dropping the cost to W2 or increasing it to 2/4 or 1/5.

So you can see how a small amount of information lets us do a lot, but also how we are limited: we don’t know what happens when we have several colored mana, we don’t know what happens when we go above 5 (or below 1) total mana, and we don’t know how to cost any special abilities. We could take a random guess based on our intuition of the game, but first let’s take a look at some more creatures. In particular, there are 18 creature cards in this set that only have standard special abilities on them:

  • W3, 3/2, Flying
  • WW3, 5/5, Flying, First Strike, Lifelink, Protection from Demons and Dragons
  • WW2, 2/3, Flying, First Strike
  • WW3, 4/4, Flying, Vigilance
  • W1, 2/1, Flying
  • WW, 2/2, First Strike, Protection from Black
  • W2, 2/2, Flying
  • U3, 2/4, Flying
  • BB, 2/2, First Strike, Protection from White
  • B2, 2/2, Swampwalk
  • B1, 2/1, Lifelink
  • R3, 3/2, Haste
  • GG5, 7/7, Trample
  • GG, 3/2, Trample
  • G3, 2/4, Reach
  • GG3, 3/5, Deathtouch
  • G, 0/3, Defender, Reach
  • GG4, 6/4, Trample

How do we proceed here? The easiest targets are those with only a single ability, like all the White cards with just Flying. It’s pretty clear from looking at all of those that Flying has the same benefit of +1 power or +1 toughness, which in our math has a benefit of 1.

We can also make some direct comparisons to the earlier list of creatures without abilities to derive benefits of several special abilities:

  • B2, 3/2
  • B2, 2/2, Swampwalk
  • R3, 3/3
  • R3, 3/2, Haste

Swampwalk and Haste (whatever those are) also have a benefit of 1. And we can guess from the B1, 2/1, Lifelink card and our existing math that Lifelink is also a benefit of 1.

We run into something curious when we examine some red and blue creatures at 4 mana. Compare the following:

  • W3, 3/2, Flying
  • W4, 3/5 (an extra +1 cost but +2 benefit, due to crossing the 5-mana threshold)
  • U3, 2/4 Flying (identical total cost to the W3 but +1 benefit… in Blue?)
  • R3, 3/3 (identical total cost and benefit to the W3, but Red?)

It appears that perhaps Red and Blue get their high-cost-mana bonus at a threshold of 4 mana rather than 5. Additionally, Flying may be cheaper for Blue than it is for White… but given that it would seem to have a cost of zero here, we might instead guess that the U3 creature is slightly above the curve.

We find another strange comparison in Green:

  • G3, 2/4, Reach (cost of 6, benefit of 6+Reach?)
  • G4, 5/4 (cost of 8, benefit of 9?)

At first glance, both of these would appear to be above the curve by 1. Alternatively, since the extra bonus seems to be consistent, this may have been intentional. We might guess that Green gets a high-cost bonus not just at 5 total mana, but also at 4 total mana, assuming that Reach (like the other abilities we’ve seen) has a benefit of 1. (In reality, if you know the game, Reach gives part of the bonus of Flying but not the other part, so it should probably give about half the benefit of Flying. Unfortunately, Magic does not offer half-mana costs in standard play, so the poor G3 is probably destined to be either slightly above or below the curve.)

Let’s assume, for the sake of argument, that the benefit of Reach is 1 (or that the original designers intended this to be the benefit and balanced the cards accordingly, at least). Then we can examine this card to learn about the Defender special ability:

  • G, 0/3, Defender, Reach

The cost is 1 (baseline) + 2 (G mana) = 3. The benefit is 3 (toughness) + 1 (Reach) + ? (Defender). From this, it would appear Defender would have to have a benefit of negative 1 for the card to be balanced. What’s going on?

If you’ve played Magic, this makes sense. Defender may sound like a special ability, but it’s actually a limitation: it means the card is not allowed to attack. We could therefore consider it as an additional cost of 1 (rather than a benefit of -1) and the math works out.

We’ve learned a lot, but there are still some things out of our immediate grasp right now. We’d love to know what happens when you have a second colored mana (does it also have a +2 cost like the first one?), and we’d also like to know what happens when you get up to 6 or 7 total mana (are there additional “high cost” bonus adjustments?). While we have plenty of cards with two colored mana in their cost, and a couple of high-cost Green creatures, all of these also have at least one other special ability that we haven’t costed yet. We can’t derive the costs and benefits for something when there are multiple unknown values; even if we figured out the right total level of benefits for our GG4 creature, for example, we wouldn’t know how much of that benefit was due to the second Green mana cost, how much came from being 6 mana total, and how much came from its Trample ability. Does this mean we’re stuck? Thankfully, we have a few ways to proceed.

One trick is to find two cards that are the same, except for one thing. Those cards may have several things we don’t know, but if we can isolate just a single difference then we can learn something. For example, look at these two cards:

  • GG4, 6/4, Trample
  • GG5, 7/7, Trample

We don’t know the cost of GG4 or GG5, and we don’t know the benefit of Trample, but we can see that adding one colorless mana that takes us from 6 to 7 gives us a power+toughness benefit of 4. A total cost of 7 must be pretty hard to get to!

We can also examine these two cards that have the same mana cost:

  • WW3, 5/5, Flying, First Strike, Lifelink, Protection from Demons and Dragons
  • WW3, 4/4, Flying, Vigilance

From here we might guess that Vigilance is worth +1 power, +1 toughness, First Strike, Lifelink, and the Protection ability, making Vigilance a really freaking awesome special ability that has a benefit of at least 4. Or, if we know the game and realize Vigilance just isn’t that great, we can see that the 5/5 creature is significantly above the curve relative to the 4/4.

We still don’t know how much two colored mana costs, so let’s use another trick: making an educated guess, then trying it out through trial and error. As an example, let’s take this creature:

  • GG, 3/2, Trample

We know the power and toughness benefits are 5, and since most other single-word abilities (Flying, Haste, Swampwalk, Lifelink) have a benefit of 1, we might guess that Trample also has a benefit of 1, giving a total benefit of 6. If that’s true, we know that the cost is 1 (baseline) + 2 (first G), so the second G must cost 3. Intuitively, this might make sense: having two colored mana places more restrictions on your deck than just having one.

We can look at this another way, comparing two similar creatures:

  • G1, 2/2
  • GG, 3/2, Trample

The cost difference between G1 and GG is the difference between a cost of 1 (colorless) and the cost of the second G. The benefit difference is 1 (for the extra power) + 1 (for Trample, we guess). This means the second G has a cost of 2 more than a colorless mana, which is a cost of 3.

We’re still not sure, though. Maybe the GG creature is above the curve, or maybe Green has yet another creature bonus we haven’t encountered yet. Let’s look at the double-colored-mana White creatures to see if the pattern holds:

  • WW, 2/2, First Strike, Protection from Black
  • WW2, 2/3, Flying, First Strike
  • WW3, 4/4, Flying, Vigilance

Assuming that Protection from Black, First Strike, and Vigilance each have a +1 benefit (similar to other special abilities), most of these seem on the curve. WW is an expected cost of 6; 2/2, First Strike, Protection from Black seems like a benefit of 6. WW3 is a cost of 10 (remember the +1 for being a total of five mana); 4/4, Flying, Vigilance is also probably 10.

The math doesn’t work as well with WW2 (cost of 8); the benefits of 2/3, Flying and First Strike only add up to 7. So, this card might be under the curve by 1.

Having confirmed that the second colored mana is probably a cost of +3, we can head back to Green to figure out this Trample ability. GG, 3/2, Trample indeed gives us a benefit of 1 for Trample, as we guessed earlier.

Now that we know Trample and the second colored mana, we can examine our GG4 and GG5 creatures again to figure out exactly what’s going on at the level of six or seven mana, total. Let’s first look at GG4, 6/4, Trample. This has a total benefit of 11. The parts we know of the cost are: 1 (baseline) + 2 (first G) + 3 (second G) + 4 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 12, so not only does the sixth mana apparently have no extra benefit but we’re already below the curve. (Either that, or Trample is worth more when you have a really high power/toughness, as we haven’t considered combinations of abilities yet.)

Let’s compare to GG5, 7/7, Trample. This has a benefit of 15. Known costs are 1 (baseline) + 2 (first G) + 3 (second G) + 5 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 13, so going from five to seven mana total has an apparent additional benefit of +2. We might then guess that the benefit is +1 for 6 mana and another +1 for 7 mana, and that the GG4 is just a little below the curve.

Lastly, we have this Deathtouch ability that we can figure out how, from the creature that is GG3, 3/5, Deathtouch. The cost is 1 (baseline) + 2 (first G) + 3 (second G) + 3 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 11. Benefit is 8 (power and toughness) + Deathtouch, which implies Deathtouch has a benefit of 3. This seems high, when all of the other abilities are only costed at 1, but if you’ve played Magic you know that Deathtouch really is a powerful ability, so perhaps the high number makes sense in this case.

From here, there are an awful lot of things we can do to make new creatures. Just by going through this analysis, we’ve already identified several creatures that seem above or below the curve. (Granted, this is an oversimplification. Some cards are legacy from earlier sets and may not be balanced along the current curve. And every card has keywords which don’t do anything on their own, but some other cards affect them, so there is a metagame benefit to having certain keywords. For example, if a card is a Goblin, and there’s a card that gives all Goblins a combat bonus, that’s something that makes the Goblin keyword useful… so in some decks that card might be worth using even if it is otherwise below the curve. But keep in mind that this means some cards may be underpowered normally but overpowered in the right deck, which is where metagame balance comes into play. We’re concerning ourselves here only with transitive balance, not metagame balance, although we must understand that the two do affect each other.)

From this point, we can examine the vast majority of other cards in the set, because nearly all of them are just a combination of cost, power, toughness, maybe some basic special abilities we’ve identified already, and maybe one other custom special ability. Since we know all of these things except the custom abilities, we can look at almost any card to evaluate the benefit of its ability (or at least, the benefit assigned to it by the original designer). While we may not know which cards with these custom abilities are above or below the curve, we can at least get a feel for what kinds of abilities are marginally useful versus those that are really useful. We can also put numbers to them, and compare the values of each ability to see if they feel right.

Name That Cost!

Let’s take an example: W1, 2/2, and it gains +1 power and +1 toughness whenever you gain life. How much is that ability worth? Well, the cost is 4, the power/toughness benefit is 4, so that means this ability is free – either it’s nearly worthless, or the card is above the curve. Since there’s no intrinsic way to gain life in the game without using cards that specifically allow it, and since gaining life tends to be a weak effect on its own (since it doesn’t bring you closer to winning), we might guess this is a pretty minor effect, and perhaps the card was specifically designed to be slightly above the curve in order to give a metagame advantage to the otherwise underpowered mechanic of life-gaining.

Here’s another: W4, 2/3, when it enters play you gain 3 life. Cost is 8; power/toughness benefit is 5. That means the life-gain benefit is apparently worth 3 (+1 cost per point of life).

Another: UU1, 2/2, when it enters play return target creature to its owner’s hand. The cost here is 7; known benefits are 5 (4 for power/toughness, 1 for being Blue), so the return effect has a benefit of 2.

And another: U1, 1/1, tap to force target enemy creature to attack this turn if able. Cost is 4, known benefit is 3 (again, 2 for power/toughness, 1 for Blue), so the special ability is costed as a relatively minor benefit of 1.

Here’s one with a drawback: U2, 2/3, Flying, can only block creatures with Flying. Benefit is 5 (power/toughness) + 1 (blue) + 1 (Flying) = 7. Mana cost is 1 (baseline) + 2 (U) + 2 (colorless) = 5, suggesting that the blocking limitation is a +2 cost. Intuitively, that seems wrong, when Defender (complete inability to block) is only +1 cost, suggesting that this card is probably a little above the curve.

Another drawback: B4, 4/5, enters play tapped. Benefit is 9. Mana cost is 1 (baseline) + 2 (B) + 4 (colorless) + 1 (above 5 mana) = 8, so the additional drawback must have a cost of 1.

Here’s a powerful ability: BB1, 1/1, tap to destroy target tapped creature. Mana cost is 7. Power/toughness benefit is 2, so the special ability appears to cost 5. That seems extremely high; on the other hand, it is a very powerful ability, it combos well with a lot of other cards, so it might be justified. Or we might argue it’s strong (maybe a benefit of 3 or 4) but not quite that good, or maybe that it’s even stronger (benefit of 6 or 7) based on seeing it in play and comparing to other strong abilities we identify in the set, but this at least gives us a number for comparison.

So, you can see here that the vast majority of cards can be analyzed this way, and we could use this technique to get a pretty good feel for the cost curve of what is otherwise a pretty complicated game. Not all of the cards fit on the curve, but if you play the game for awhile you’ll have an intuitive sense of which cards are balanced and which feel too good or too weak. By using those “feels balanced” creatures as your baseline, you could then propose a cost curve and set of numeric costs and benefits, and then verify that those creatures are in fact on the curve (and that anything you’ve identified as intuitively too strong or too weak are correctly shown by your math as above or below the curve). Using what you do know, you can then take pretty good guesses at what you don’t know, to identify other cards (those you don’t have an opinion on yet) as being potentially too good or too weak.

In fact, even if you’re a player and not a game designer, you can use this technique to  help you identify which cards you’re likely to see at the tournament/competitive level.

Rules of Thumb

How do you know if your numbers are right? A lot of it comes down to figuring out what works for your particular game, through a combination of your designer intuition and playtesting. Still, I can offer a couple of basic pieces of advice.

First, a limited or restricted benefit is never a cost, and its benefit is always at least a little bit greater than zero. If you have a sword that does extra damage to Snakes, and there are only a few Snakes in the game in isolated locations, that is a very small benefit but it is certainly not a drawback.

Second, if you give the player a choice between two benefits, the cost of the choice must be at least the cost of the more expensive of the two benefits. Worst case, the player takes the better (more expensive) benefit every time, so it should be costed at least as much as what the player will choose. In general, if you give players a choice, try to make those choices give about the same benefit; if it is a choice between two equally good things, that choice is a lot more interesting than choosing between an obviously strong and an obviously weak effect.

Lastly, sometimes you have to take a guess, and you’re not in a position to playtest thoroughly. Maybe you don’t have a big playtest budget. Maybe your publisher is holding a gun to your head, telling you to ship now. Whatever the case, you’ve got something that might be a little above or a little below the curve, and you might have to err on one side or the other. If you’re in this situation, it’s better to make an object too weak than to make it too strong. If it’s too weak, the worst thing that happens is no one uses it, but all of the other objects in the game can still be viable – this isn’t optimal, but it’s not game-breaking. However, if one object is way too strong, it will always get used, effectively preventing everything else that’s actually on the curve from being used since the “balanced” objects are too weak by comparison. A sufficiently underpowered object is ruined on its own; a sufficiently overpowered object ruins the balance of the entire game.

Cost curves for new games

So far, we’ve looked at how to derive a cost curve for an existing game, a sort of design “reverse engineering” to figure out how the game is balanced. This is not necessarily an easy task, as it can be quite tedious at times, but it is at least relatively straightforward.

If you’re making a new game, creating a cost curve is much harder. Since the game doesn’t exist yet, you haven’t played it in its final form yet, which means you don’t have as much intuition for what the curve is or what kinds of effects are really powerful or really weak. This means you have to plan on doing a lot of heavy playtesting for balance purposes, after the core mechanics are fairly solidified, and you need to make sure the project is scheduled accordingly.

Another thing that makes it harder to create a cost curve for a new game is that you have the absolute freedom to balance the numbers however you want. With an existing game you have to keep all the numbers in line with everything that you’ve already released, so you don’t have many degrees of freedom; you might have a few options on how to structure your cost curve, but only a handful of options will actually make any sense in the context of everything you’ve already done. With a new game, however, there are no constraints; you may have thousands of valid ways to design your cost curve – far more than you’ll have time to playtest. When making a new game, you’ll need to grit your teeth, do the math where you can, take your best initial guess… and then get something into your playtesters’ hands as early as you can, so you have as much time as possible to learn about how to balance the systems in your game.

There’s another nasty problem when designing cost curves for new games: changes to the cost curve are expensive in terms of design time. As an example, let’s say you’re making a 200-card set for a CCG, and one of the new mechanics you’re introducing is the ability to draw extra cards, and 20 cards in the set use this mechanic in some way or other. Suppose you decide that drawing an extra card is a benefit of 2 at the beginning, but after some playtesting it becomes clear that it should actually be a benefit of 3. You now have to change all twenty cards that use that mechanic. Keep in mind that you will get the math wrong, because no one ever gets game balance right on the first try, and you can see where multiple continuing changes to the cost curve mean redoing the entire set several times over. If you have infinite time to playtest, you can just make these changes meticulously and one at a time until your balance is perfect. In the real world, however, this is an unsolved problem. The most balanced CCG that I’ve ever worked on, was a game where the cost curve was generated after three sets had already been released; it was the newer sets released after we derived the cost curve that were really good in terms of balance (and they were also efficient in terms of development time because the basic “using the math and nothing else” cards didn’t even need playtesting). Since then, I’ve tried to develop new games with a cost curve in mind, and I still don’t have a good answer for how to do this in any kind of reasonable way.

There’s one other unsolved problem, which I call the “escalation of power” problem, that is specific to persistent games that build on themselves over time – CCGs, MMOs, sequels where you can import previous characters, Facebook games, expansion sets for strategy games, and so on. Anything where your game has new stuff added to it over time, rather than just being a single standalone product. The problem is, in any given set, you are simply not going to be perfect. Every single object in your game will not be perfectly balanced along the curve. Some will be a little above, others will be a little below. While your goal is to get everything as close to the cost curve as possible, you have to accept right now that a few things will be a little better than they’re supposed to… even if the difference is just a microscopic rounding error.

Over time, with a sufficiently large and skilled player base, the things that give an edge (no matter how slight that edge) will rise to the top and become more common in use. And players will adapt to an environment where the best-of-the-best is what is seen in competitive play, and players become accustomed to that as the “standard” cost curve.

Knowing this, the game designer faces a problem. If you use the “old” cost curve and produce a new set of objects that is (miraculously) perfectly balanced, no one will use it, because none of it is as good as the best (above-the-curve) stuff from previous sets. In order to make your new set viable, you have to create a new cost curve that’s balanced with respect to the best objects and strategies in previous sets. This means, over time, the power level of the cost curve increases. It might increase quickly or slowly depending on how good a balancing job you do, but you will see some non-zero level of “power inflation” over time.

Now, this isn’t necessarily a bad thing, in the sense that it basically forces players to keep buying new stuff from you to stay current: eventually their old strategies, the ones that used to be dominant, will fall behind the power curve and they’ll need to get the new stuff just to remain competitive. And if players keep buying from us on a regular basis, that’s a good thing. However, there’s a thin line here, because when players perceive that we are purposefully increasing the power level of the game just to force them to buy new stuff, that gives them an opportunity to exit our game and find something else to do. We’re essentially giving an ultimatum, “buy or leave,” and doing that is dangerous because a lot of players will choose the “leave” option. So, the escalation-of-power problem is not an excuse for lazy design; while we know the cost curve will increase over time, we want that to be a slow and gradual process so that older players don’t feel overwhelmed, and of course we want the new stuff we offer them to be compelling in its own right (because it’s fun to play with, not just because it’s uber-powerful).

If You’re Working On a Game Now…

If you are designing a game right now, and that game has any transitive mechanics that involve a single resource cost, see if you can derive the cost curve. You probably didn’t need me to tell you that, but I’m saying it anyway, so nyeeeah.

Keep in mind that your game already has a cost curve, whether you are aware of it or not. Think of this as an opportunity to learn more about the balance of your game.

Homework

I’ll give you three choices for your “homework” this week. In each case, there are two purposes here. First, you will get to practice the skill of deriving a cost curve for an existing game. Second, you’ll get practice applying that curve to identify objects (whether those be cards, weapons, or whatever) that are too strong or too weak compared to the others.

Option 1: More Magic 2011

If you were intrigued by the analysis presented here on this blog, continue it. Find a spoiler list for Magic 2011 online (you shouldn’t have to look that hard), and starting with the math we’ve identified here, build as much of the rest of the cost curve as you can. As you do this, identify the cards that you think are above or below the curve. For your reference, here’s the math we have currently (note that you may decide to change some of this as you evaluate other cards):

  • Mana cost: 1 (baseline); 1 for each colorless mana; 2 for the first colored mana, and 3 for the second colored mana.
  • High cost bonus: +1 cost if the card requires 4 or more mana (Red, Blue and Green creatures only); +1 cost if the card requires 5 or more mana (White, Black, and Green creatures only – yes, Green gets both bonuses); and an additional +1 cost for each total mana required above 5.
  • Special costs: +1 cost for the Defender special ability.
  • Benefits: 1 per point of power and toughness. 1 for Red and Blue creatures.
  • Special benefits: +1 benefit for Flying, First Strike, Trample, Lifelink, Haste, Swampwalk, Reach, Vigilance, Protection from White, Protection from Black. +2 benefit for Deathtouch.

You may also find some interesting reading in Mark Rosewater’s archive of design articles for Magic, although finding the relevant general design stuff in the sea of  articles on the minutiae of specific cards and sets can be a challenge (and it’s a big archive!).

Option 2: D&D

If CCGs aren’t your thing, maybe you like RPGs. Take a look at whatever Dungeons & Dragons Players Handbook edition you’ve got lying around, and flip to the section that gives a list of equipment, particularly all the basic weapons in the game, along with their Gold costs. Here you’ll have to do some light probability that we haven’t talked about yet, to figure out the average damage of each weapon (hint: if you roll an n-sided die, the average value of that die is (n+1)/2, and yes that means the “average” may be a fraction; if you’re rolling multiple dice, compute the average for each individual die and then add them all together). Then, relate the average weapon damage to the Gold cost, and try to figure out the cost curve for weapons.

Note that depending on the edition, some weapons may have “special abilities” like longer range, or doing extra damage against certain enemy types. Remember to only try to figure out the math for something when you know all but one of the costs or benefits, so start with the simple melee weapons and once you’ve got a basic cost curve, then try to derive the more complicated ones.

If you find that this doesn’t take you very long and you want an additional challenge, do the cost curve for armors in the game as well, and see if you can find a relation between damage and AC.

Option 3: Halo 3

If neither of the other options appeals to you, take a look at the FPS genre, in particular Halo 3. This is a little different because there isn’t really an economic system in the game, so there’s no single resource used to purchase anything. However, there is a variety of weapons, and each weapon has a lot of stats: effective range, damage, fire rate, and occasionally a special ability such as area-effect damage or dual-wield capability.

For this exercise, use damage per second (“dps”) as your primary resource. You’ll have to find a FAQ (or experiment by playing the game or carefully analyzing gameplay videos on YouTube) to determine the dps for each weapon; to compute dps, take the amount of damage and multiply by fire rate (in shots-per-second) and that is dps.

Relate everything else to dps to try to figure out the tradeoffs between dps and accuracy, range, and each special ability. (For some things like “accuracy” that can’t be easily quantified, you may have to fudge things a bit by just making up some numbers).

Then, analyze. Which weapons feel above or below the curve based on your cost curve? How much dps would you add or remove from each weapon to balance it? And of course, is this consistent with your intuition (either from playing the game, or reading comments in player forums)?

Level 2: Numeric Relationships

July 14, 2010

Course Announcements

As promised last week, signups for the paid course are now closed. If you are just finding this blog now, I apologize, but you wouldn’t want to start two weeks behind anyway. If you’re coming late to the party, best advice I can give you is to start reading this blog from the beginning and catch up whenever you do.

Readings/Playings

None for this week, other than this post… but you will be doing a bit of reading later for your “homework” to compensate.

This Week’s Topic

This week, I’m going to talk about the different kinds of numbers you see in games and how to classify them. This is going to be important later, because you can’t really know how to balance a game or how to choose the right numbers unless you first know what kinds of numbers you’re dealing with. Sometimes, a balance change is as simple as replacing one kind of number with another, so understanding what kinds of numbers there are and getting an intuition for how they work is something we need to cover before anything else.

In particular, we’re going to be examining relationships between numbers. Numbers in games don’t exist in a vacuum. They only have meaning in relation to each other. For example, suppose I tell you that the main character in a game does 5 damage when he attacks. That tells you nothing unless you know how much damage enemies can take before they keel over dead. Now you have two numbers, Damage and Hit Points, and each one only has meaning in relation to the other.

Or, suppose I tell you that a sword costs 250 Gold. That has no meaning, until I tell you that the player routinely finds bags with thousands of Gold lying around the country side, and then you know the sword is cheap. Or, I tell you that the player only gets 1 Gold at most from winning each combat, and then it’s really expensive. Even within a game, the relative value of something can change; maybe 250 Gold is a lot at the start of the game but it’s pocket change at the end. In World of Warcraft, 1 Gold used to be a tidy sum, but today it takes tens or hundreds to buy the really epic loot.

With all that said, what kinds of ways can numbers be related to each other?

Identity and Linear Relationships

Probably the simplest type of relationship, which math geeks would call an identity relationship, is where two values change in exactly the same way. Add +1 to one value, it’s equivalent to adding +1 to the other. For game balance purposes, you can treat the two values as identical.

You would think that in such a case, you might just make a single value, but there are some cases where it makes sense to have two different values that just happen to have a one-to-one conversion. As an example, Ultima III: Exodus has Food, something that each character needed to not starve to death in a dungeon. You never got food as an item drop, and could only buy it from food vendors in towns. Food decreases over time, and has no other value (and cannot be sold or exchanged for anything else); its only purpose is to act as a continual slow drain on your resources. Each character also has Gold, something that they find while adventuring. Unlike food, Gold doesn’t degrade over time, and it is versatile (you can use it to bribe guards, buy hints, purchase weapons or armor… or purchase Food). While these are clearly two separate values that serve very different purposes within the game, each unit of Food costs 1 Gold (10 Food costs 10 Gold, 1000 Food costs 1000 Gold, and so on). Food and Gold have an identity relationship… although it is one-way in this case, since you can convert Gold to Food but not vice versa.

A more general case of an identity relationship is the linear relationship, where the conversion rate between two values is a constant. If a healing spell always costs 5 MP and heals exactly 50 HP, then there is a 1-to-10 linear relationship between MP and HP. If you can spend 100 Gold to gain +1 Dexterity, there’s a 100-to-1 linear relationship between Gold and Dexterity. And so on.

Note that we are so far ignoring cases where a relationship is partly random (maybe that healing spell heals somewhere between 25 and 75 HP, randomly chosen each time). Randomness is something we’ll get into in a few weeks, so we’re conveniently leaving that out of the picture for now.

Exponential and Triangular Relationships

Sometimes, a linear relationship doesn’t work for your game. You may have a relationship where there are either increasing or diminishing returns.

For example, suppose a player can pay resources to gain additional actions in a turn-based strategy game. One extra action might be a small boost, but three or four extra actions might be like taking a whole extra turn — it might feel a lot more than 3 or 4 times as powerful as a single action. This would be increasing returns: each extra action is more valuable than the last. You would therefore want the cost of each extra action to increase, as you buy more of them.

Or, maybe you have a game where players have incentive to spend all of their in-game money every turn to keep pace with their opponents, and hoarding cash has a real opportunity cost (that is, they miss out on opportunities they would have had if they’d spent it instead). In this case, buying a lot of something all at once is actually not as good as buying one at a time, so it makes sense to give players a discount for “buying in bulk” as it were. Here we have a decreasing return, where each extra item purchased is not as useful as the last.

In such cases, you need a numeric relationship that increases or decreases its rate of exchange as you exchange more or less at a time. The simplest way to do this is an exponential relationship: when you add to one value, multiply the other one. An example is doubling: for each +1 you give to one value, double the other one. This gives you a relationship where buying 1, 2, 3, 4 or 5 of something costs 1, 2, 4, 8 or 16, respectively. As you can see, the numbers get really big, really fast when you do this.

Because the numbers get prohibitively large very quickly, you have to be careful when using exponential relationships. For example, nearly every card in any Collectible Card Game that I’ve played that has the word “double” on it somewhere (as in, one card doubles some value on another card) ends up being too powerful. I know offhand of one exception, and that was an all-or-nothing gamble where it doubled your attack strength but then made you lose at the end of the turn if you hadn’t won already! The lesson here is to be very, very careful when using exponentials.

What if you want something that increases, but not as fast as an exponential? A common pattern in game design is the triangular relationship. If you’re unfamiliar with the term, you have probably at least seen this series:

1, 3, 6, 10, 15, 21, 28, …

That is the classic triangular pattern (so called because several ways to visualize it involve triangles). In our earlier example, maybe the first extra action costs 1 resource; the next costs 2 (for a running total of 3), the next costs 3 (for a total of 6), and so on.

An interesting thing to notice about triangular numbers is when you look at the difference between each successive pair of numbers. The difference between the first two numbers (1 and 3) is 2. The difference between the next two numbers (3 and 6) is 3. The next difference (between 6 and 10) is 4. So the successive differences are linear: they follow the pattern 1, 2, 3, 4…

Triangular numbers usually make a pretty good first guess for increasing costs. What if you want a decreasing cost, where something starts out expensive and gets cheaper? In that case, figure out how much the first one should cost, then make each one after that cost 1 less. For example, suppose you decide the first Widget should cost 7 Gold. Then try making the second cost 6 Gold (for a total of 13), the third costs 5 Gold (total of 18), and so on.

Note that in this case, you will eventually reach a point where each successive item costs zero (or even negative), which gets kind of ridiculous. This is actually a pretty common thing in game balance, that if you have a math formula the game balance will break at the mathematical extremes. The design solution is to set hard limits on the formula, so that you don’t ever reach those extremes. In our Widget example above, maybe the players are simply prevented from buying more than 3 or 4 Widgets at a time.

Other Numeric Relationships

While linear and triangular relationships are among the most common in games, they are not the only ones available. In fact, there are an infinite number of potential numeric relationships. If none of the typical relationships work for your game, come up with your own custom relationship!

Maybe you have certain cost peaks, where certain thresholds cost more than others because those have in-game significance. For example, if everything in your game has 5 hit points, there is actually a huge difference between doing 4 or 5 damage, so that 5th point of damage will probably cost a lot more than you would otherwise expect. You might have oscillations, where several specific quantities are particularly cheap (or expensive). You can create any ratio between two values that you want… but do so with some understanding of what effect it will have on play!

Relationships Within Systems

Individual values in a game usually exist within larger systems. By analyzing all of the different numbers and relationships between them in a game’s systems, we can gain a lot of insight into how the game is balanced.

Let us take a simple example: the first Dragon Warrior game for the NES. In the game’s combat system, you have four main stats: Hit Points (HP), Magic Points (MP), Attack and Defense. This is a game of attrition; you are exploring game areas, and every few steps you get attacked by an enemy. You lose if your HP is ever reduced to zero.

How are all of these numbers related? Random encounters are related to HP: each encounter reduces HP (you can also say it the other way: by walking around getting into fights, you can essentially convert HP into encounters). This is an inverse relationship, as more encounters means less HP.

There’s a direct relationship between HP and Defense: the more defense you have, the less damage you take, which means your HP lasts longer. Effectively, increasing your Defense is equivalent to giving yourself a pile of extra HP.

Ironically, we see the same relationship between HP and Attack. The higher your attack stat, the faster you can defeat an enemy. If you defeat an enemy faster, that means it has less opportunity to damage you, so you take less damage. Thus, you can survive more fights with higher Attack.

MP is an interesting case, because you can use it for a lot of things. There are healing spells that directly convert MP into HP. There are attack spells that do damage (hopefully more than you’d do with a standard attack); like a higher Attack stat, these finish combats earlier, which means they preserve your HP. There are buff/debuff spells that likewise reduce the damage you take in a combat. There are teleport spells that take you across long distances, so that you don’t have to get in fights along the way, so these again act to preserve your HP. So even though MP is versatile, virtually all of the uses for it involve converting it (directly or indirectly) into HP.

If you draw this all out on paper, you’ll see that everything — Attack, Defense, MP, Monster Encounters — is linked directly to HP. As the loss condition for the game, the designers put the HP stat in the middle of everything! This is a common technique, making a single resource central to all of the others, and it is best to make this central resource either the win or loss condition for the game.

Now, there’s one additional wrinkle here: the combat system interacts with two other systems in the game through the monster encounters. After you defeat a monster, you get two things: Gold and Experience (XP). These interact with the economic and leveling systems in the game, respectively.

Let’s examine the leveling system first. Collect enough XP and you’ll level up, which increases all of your stats (HP, MP, Attack and Defense). As you can see, this creates a feedback loop: defeating enemies causes you to gain a level, which increases your stats, which lets you defeat more enemies. And in fact, this would be a positive feedback loop that would cause the player to gain high levels of power very fast, if there weren’t some kind of counteracting force in the game. That counteraction comes in the form of an increasing XP-to-Level relationship, so it takes progressively more and more XP to gain a level. Another counteracting force is that of player time; while the player could maximize their level by just staying in the early areas of the game beating on the weakest enemies, the gain is so slow that they are incentivized to take some risks so they can level a little faster.

Examining the economic system, Gold is used for a few things. Its primary use is to buy equipment which permanently increases the player’s Attack or Defense, thus effectively converting Gold into extra permanent HP. Gold can also be used to buy consumable items, most of which mimic the effects of certain spells, thus you can (on a limited basis, since you only have a few inventory slots) convert Gold to temporary MP. Here we see another feedback loop: defeating monsters earns Gold, which the player uses to increase their stats, which lets them defeat even more monsters. In this case, what prevents this from being a positive feedback loop is that it’s limited by progression: you have a limited selection of equipment to buy, and the more expensive stuff requires that you travel to areas that you are just not strong enough to reach at the start of the game. And of course, once you buy the most expensive equipment in the game, extra Gold doesn’t do you much good.

Another loop that is linked to the economic system, is that of progression itself. Many areas in the game are behind locked doors, and in order to open them you need to use your Gold to purchase magic keys. You defeat monsters, get Gold, use it to purchase Keys, and use those keys to open new areas which have stronger monsters (which then let you get even more Gold/XP). Of course, this loop is itself limited by the player’s stats; unlocking a new area with monsters that are too strong to handle does not help the player much.

How would a designer balance things within all these systems? By relating everything back to the central value of HP, and then comparing.

For example, say you have a healing spell and a damage spell, and you want to know which is better. Calculate the amount of HP that the player would no longer lose as a result of using the damage spell and ending the combat earlier, and compare that to the amount of HP actually restored by the healing spell. Or, say you want to know which is better, a particular sword or a particular piece of armor. Again, figure out how much extra HP each would save you.

Now, this does not mean that everything in the game must be exactly equal to be balanced. For example, you may want spells that are learned later in the game to be more cost-effective, so that the player has reason to use them. You may also want the more expensive equipment to be less cost-effective, in order to make the player really work for it. However, at any given time in the game, you probably want the choices made available at that time to be at least somewhat balanced with each other. For example, if the player reaches a new town with several new pieces of equipment, you would expect those to be roughly equivalent in terms of their HP-to-cost ratios.

Another Example

You might wonder, if this kind of analysis works for a stat-driven game like an RPG, is it useful for any other kind of game? The answer is yes. Let’s examine an action title, the original Super Mario Bros. (made popular from the arcade and NES versions).

What kinds of resources do we have in Mario? There are lives, coins, and time (from a countdown timer). There’s actually a numeric score. And then there are objects within the game — coin blocks, enemies, and so on — which can sometimes work for or against you depending on the situation. Let us proceed to analyze the relationships.

  • Coins: there is a 100-to-1 relationship between Coins and Lives, since collecting 100 coins awards an extra life. There is a 1-to-200 relationship between Coins and Score, since collecting a coin gives 200 points. There is a relationship between Coin Blocks and Coins, in that each block gives you some number of coins.
  • Time: there is a 100-to-1 relationship between Time and Score, since you get a time bonus at the end of each level. There is also an inverse relationship between Time and Lives, since running out of time costs you a life.
  • Enemies: there is a relationship between Enemies and Score, since killing enemies gives you from 100 to 1000 points (Depending on the enemy). There is an inverse relationship between Enemies and Lives, since sometimes an enemy will cost you a life. (In a few select levels there is potentially a positive relationship between Enemies and Lives, as stomping enough enemies in a combo will give extra lives, but that is a special case.)
  • Lives: there is this strange relationship between Lives and everything else, because losing a life resets the Coins, Time and Enemies on a level. Note that since Coins give you extra Lives, and losing a Life resets Coins, any level with more than 100 Coins would provide a positive feedback loop where you could die intentionally, get more than 100 Coins, and repeat to gain infinite lives. The original Super Mario Bros. did not have any levels like this, but Super Mario 3 did.
  • Relationship between Lives and Score: There is no direct link between Lives and Score. However, losing a Life resets a bunch of things that give scoring opportunities, so indirectly you can convert a Life to Score. Interestingly, this does not happen the other way around; unlike other arcade games of the time, you cannot earn extra Lives by getting a sufficiently high Score.

Looking at these relationships, we see that Score is actually the central resource in Super Mario Bros. since everything is tied to Score. This makes sense in the context of early arcade games, since the win condition is not “beat the game,” but rather, “get the highest score.”

How would you balance these resources with one another. There are a few ways. You can figure out how many enemies you kill and their relative risks (that is, which enemies are harder to kill and which are more likely to kill you). Compare that with how many coins you find in a typical level, and how much time you typically complete the level with. Then, you can either change the amount of score granted to the player from each of these things (making a global change throughout the game), or you can vary the number of coins and enemies, the amount of time, or the length of a level (making a local change within individual levels). Any of these techniques could be used to adjust a player’s expected total score, and also how much each of these activities (coin collecting, enemy stomping, time completion) contributes to the final score.

When you’re designing a game, note that you can change your resources around, and even eliminate a resource or change the central resource to something else. The Mario series survived this quite well; the games that followed the original eliminated Score entirely, and everything was later related to Lives.

Interactions Between Relationships

When you form chains or loops of resources and relationships between them, the relationships stack with each other. They can either combine to become more intense, or they can cancel each other out (completely or partially).

We just saw one example of this in the Mario games, with Lives and Coins. If you have a level that contains 200 Coins, then the 100 Coins to 1 Life relationship combines with 1 Life to 200 Coins in that level, to create a doubling effect where you convert 1 Life to 2 Lives in a single iteration.

Here’s another example, from the PS2 game Baldur’s Gate: Dark Alliance. In this action-RPG, you get XP from defeating enemies, which in turn causes you to level up. The XP-to-Level relationship is triangular: going from Level 1 to Level 2 requires 1000 XP, Level 2 to Level 3 costs 2000 XP, rising to Level 4 costs 3000 XP, and so on.

Each time you level up, you get a number of upgrade points to spend on special abilities. These also follow a triangular progression: at Level 2 you get 1 upgrade point; at Level 3 you get 2 points; the next level gives you 3 points, then the next gives you 4 points, and so on.

However, these relationships chain together, since XP gives you Levels and Levels give you Upgrade Points. Since XP is the actual resource the player is earning, it is the XP-to-Points ratio we care about, and the two triangular relationships actually cancel with each other to form a linear relationship of 1000 XP to 1 Upgrade Point. While the awarding of these upgrade points is staggered based on levels, on average you are earning them at a constant XP rate.

How does Time fit into this (as in, the amount of time the player spends on the game)? If the player were fighting the same enemies over and over for the same XP rewards, there would be a triangular increase in the amount of time it takes to earn a level (and a constant amount of time to earn each Upgrade Point, on average). However, as with most RPGs, there is a system of increasing XP rewards as the player fights stronger monsters. This increasing XP curve doesn’t increase as fast as the triangular progression of level-ups, which means that it doesn’t completely cancel out the triangular effect, but it does partly reduce it — in other words, you level up slightly faster in the early game and slower in the late game, but the play time between level gains doesn’t increase as fast as a triangular relationship.

Note, however, the way this interacts with Upgrade Points. Since the XP-to-Point ratio is linear, and the player gets an increasing amount of XP per unit time, they are actually getting an increasing rate of Upgrade Point gain!

This kind of system has some interesting effects. By changing the rate of XP gain (that is, exactly how fast the XP rewards increase for defeating enemies) you can change both the rate of leveling up and the rate of Upgrade Point gains. If the XP rewards increase faster than the triangular rate of the levels themselves, the player will actually level up faster as the game progresses. If the XP rewards increase more slowly than the rate of level ups, the player will level faster in the early game and slower in the late game (which is usually what you want, as it gives the player frequent rewards early on and starts spacing them out once they’ve committed to continued play). If the XP rewards increase at exactly the same rate, the player will level up at a more or less constant rate.

Suppose you decide to have the player gain levels faster in the early game and slower in the late game, but you never want them to go longer than an hour between levels. How would you balance the XP system? Simple: figure out what level they will be at in the late game, scale the XP gains to take about an hour per level up at that point, and then work your way backwards from there.

Note another useful property this leveling system has: it provides a negative feedback loop that keeps the player in a narrow range of levels during each point in the game. Consider two situations:

  • Over-leveling: The player has done a lot of level-grinding and is now too powerful for the enemies in their current region. For one thing, they’ll be able to defeat the nearby enemies faster, so they don’t have to stick around too long. For another, the XP gains aren’t that good if their level is already high; they are unlikely to gain much in the way of additional levels by defeating weaker enemies. The maximum level a player can reach is effectively limited by the XP-reward curve.
  • Under-leveling: Suppose instead the opposite case, where the player has progressed quickly through the game and is now at a lower level than the enemies in the current region. In this case, the XP gains will be relatively high (compared to the player’s level), and the player will only need to defeat a few enemies to level up quickly.

In either case, the game’s system pushes the player’s level towards a narrow range in the middle of the extremes. It is much easier to balance a combat system to provide an appropriate level of challenge, when you know what level the player will be at during every step of the way!

How Relationships Interact

How do you know how two numeric relationships will stack together? Here’s a quick-reference guide:

  • Two linear relationships that combine: multiply them together. If you can turn 1 of Resource A into 2 Resource B, and 1 Resource B into 5 Resource C, then there is a 1-to-10 conversion between A and C (2×5).
  • Linear relationship combines with an increasing (triangular or exponential) relationship: the increasing relationship just gets multiplied by a bigger number, but the nature of the curve stays the same.
  • Linear relationship counteracts an increasing relationship: if the linear conversion is large, it may dominate early on, but eventually the increasing relationship will outpace it. Exactly where the two curves meet and the game shifts from one to the other depends on the exact numbers, and tweaking these can provide an interesting strategic shift for the players.
  • Two increasing relationships combine: you end up with an increasing relationship that’s even faster than either of the two individually.
  • Two increasing relationships counteract one another: depends on the exact relationships. In general, an exponential relationship will dominate a triangular one (how fast this happens depends on the exact numbers used). Two identical relationships (such as two pure triangulars) will cancel out to form a linear or identity relationship.

If You’re Working On a Game Now…

Are you designing your own game right now? Try this: make a list of every resource or number in your game on a piece of paper. Put a box around each, and spread the boxes out. Then, draw arrows between each set of boxes that has a direct relationship in your game, and label the arrow with the kind of relationship (linear, triangular, exponential, etc.).

Use this diagram to identify a few areas of interest in the balance of your game:

  • Do you see any loops where a resource can be converted to something else, then maybe something else, and then back to the original? If you get back more of the original than you started with by doing this, you may have just identified a positive feedback loop in your game.
  • Do you see a central resource that everything else seems tied to? If so, is that central resource either the win or loss condition, or does it seem kind of arbitrary? If not, does it make sense to create a new central resource, perhaps by adding new relationships between resources?

You can then use this diagram to predict changes to gameplay. If you change the nature of a relationship, you might be able to make a pretty good guess at what other relationships will also change as a result, and what effect that might have on the game’s systems overall.

If your game is a single-player game with some kind of progression system, “Time” (as in, the amount of time the player spends actually playing the game) should be one of your resources, and you can use your diagram to see if the rewards and power gains the player gets from playing are expected to increase, decrease, or remain constant over time.

Homework

Here’s your game balance challenge for this week. First, choose any single-player game that you’ve played and are familiar with, that has progression mechanics. Examples of games with progression are action-adventure games (Zelda), action-RPGs (Diablo), RPGs (Final Fantasy), or MMORPGs (World of Warcraft). I’ll recommend that you choose something relatively simple, such as an NES-era game or earlier. You’re going to analyze the numbers in this game, and as you’ve seen from the earlier examples here, even simple games can have pretty involved systems.

In these games, there is some kind of progression where the player gains new abilities and/or improves their stats over time. As the player progresses, enemies get stronger; again this could just mean they have higher stats, or they might also gain new abilities that require better strategy and tactics to defeat.

Start by asking yourself this question: overall, what was the difficulty curve of the game like? Did it start off easy and get slowly, progressively harder? Or, did you notice one or more of these undesirable patterns:

  • A series of levels that seemed to go by very slowly, because the player was underpowered at the time and did not gain enough power fast enough to compensate, so you had to grind for a long time in one spot.
  • A sudden spike in difficulty with one dungeon that had much more challenging enemies than those that came immediately before or after.
  • A dungeon that was much easier than was probably intended, allowing you to blast through it quickly since you were much more powerful than the inhabitants by the time you actually reached it.
  • The hardest point in the game was not at the end, but somewhere in the middle. Perhaps you got a certain weapon, ally, or special ability that was really powerful, and made you effectively unbeatable from that point on until the end of the game.

So far, all you’re doing is using your memory and intuition, and it probably takes you all of a few seconds to remember the standout moments of epic win and horrible grind in your chosen game. It’s useful to build intuition, but it is even better to make your intuition stronger by backing it up with math. So, once you’ve written down your intuitive guesses at the points where the game becomes unbalanced, let’s start analyzing.

First, seek a strategy guide or FAQ that gives all of the numbers for the game. A web search may turn up surprisingly detailed walkthroughs that show you every number and every resource in the game, and exactly how they are all related.

Next, make a list on paper of all of the resources in the game. Using the FAQ as your guide, also show all relationships between the resources (draw arrows between them, and label the arrows with the relationship type). From this diagram, you may be able to identify exactly what happened.

For example, maybe you seemed to level up a lot in one particular dungeon, gaining a lot of power in a short time. In such a case, you might start by looking at the leveling system: perhaps there is a certain range of levels where the XP requirements to gain a level are much lower than the rest of the progression curve. You might also look at the combat reward system: maybe you just gain a lot more XP than expected from the enemies in that dungeon.

As another example, maybe the game felt too easy after you found a really powerful weapon. In this case you’d look at the combat system: look at how much damage you do versus how much enemies can take, as separate curves throughout the game, and identify the sudden spike in power when you get that weapon. You may be able to graphically see the relationship of your power level versus that of the enemies over time.

Lastly, if you do identify unbalanced areas of the game from this perspective, you should be able to use your numbers and curves to immediately suggest a change. Not only will you know exactly which resource needs to be changed, but also by how much.

This exercise will probably take you a few hours, as researching a game and analyzing the numbers is not a trivial task (even for a simple game). However, after doing this, you will be much more comfortable with identifying resources and relationships in games, and also being able to use your understanding of a game’s systems to improve the balance of those systems.

Level 1: Intro to Game Balance

July 7, 2010

Class Announcements

I have to admit I was a little surprised to see that people were still signing up for the paid course after it started, but I suppose it’s common enough for people to join a class early in the term. However, to be fair to those who signed up well in advance, I’ll be closing signups this Sunday (July 10) at midnight EDT. So, if you haven’t signed up and still want in, make sure to click the Paypal link before then!

Readings/Playings

If you haven’t already, you should watch the intro video for this course first, before reading on. You may need to create an account on that website, but registration for the intro video is free.

This Week’s Topic

This week is probably going to start a bit slow for those of you who are experienced game designers (or those who are hoping to dive deep into the details). Instead, I want to use this week mostly to get everyone into the mindset of a game designer presented with a balance task, and I want to lay out some basic vocabulary terms so we can communicate about game balance properly.

You can think of this week like a tutorial level. The difficulty and pacing of this course will ramp up in the following weeks.

What is Game Balance?

I would start by asking the question “what is game balance?” but I answered it in the teaser video already. While perhaps an oversimplification, we can say that game balance is mostly about figuring out what numbers to use in a game.

This immediately brings up the question: what if a game doesn’t have any numbers or math involved? The playground game of Tag has no numbers, for example. Does that mean that the concept of “game balance” is meaningless when applied to Tag?

The answer is that Tag does in fact have numbers: how fast and how long each player can run, how close the players are to each other, the dimensions of the play area, how long someone is “it.” We don’t really track any of these stats because Tag isn’t a professional sport… but if it was a professional sport, you’d better believe there would be trading cards and websites with all kinds of numbers on them!

So, every game does in fact have numbers (even if they are hidden or implicit), and the purpose of those numbers is to describe the game state.

How do you tell if a game is balanced?

Knowing if a game is balanced is not always trivial. Chess, for example, is not entirely balanced: it has been observed that there is a slight advantage to going first. However, it hasn’t been definitively proven whether this imbalance is mechanical (that is, there is a bona fide tactical/strategic advantage to the first move) or psychological (players assume there is a first-move advantage, so they trick themselves into playing worse when they go second). Interestingly, this first-move advantage disappears at lower skill levels; it is only observed at championship tournaments. Keep in mind that this is a game that has been played, in some form, for thousands of years. And we still don’t know exactly how unbalanced it is!

In the case of Chess, a greater degree of player skill makes the game unbalanced. In some cases, it works the other way around, where skilled players can correct an inherent imbalance through clever play. For example, in Settlers of Catan, much of the game revolves around trading resources with other players. If a single player has a slight gameplay advantage due to an improved starting position, the other players can agree to simply not trade with that player for a time (or only offer unfair trades at the expense of that player) until such time as the starting positions equalize. This would not happen in casual games, as the players would be unable to recognize a slight early-game advantage; at the tournament level, however, players would be more likely to spot an inherent imbalance in the game, and act accordingly.

In short, game balance is not an easy or obvious task. (But you probably could have figured that out, given that I’m going to talk for ten straight weeks on the subject!)

Towards a critical vocabulary

Just like last summer, we need to define a few key terms that we’ll use as we talk about different kinds of balance.

Determinism

For our purposes, I define a “deterministic” game as one where if you start with a given game state and perform a particular action, it will always produce the same resulting new game state.

Chess and Go and Checkers are all deterministic. You never have a situation where you move a piece, but due to an unexpected combat die roll the piece gets lost somewhere along the way, or something. (Unless you’re playing a nondeterministic variant, anyway.)

Candyland and Chutes & Ladders are not deterministic. Each has a random mechanism for moving players forward, so you never know quite how far you’ll move next turn.

Poker is not deterministic, either. You might play several hands where you appear to have the same game state (your hand and all face-up cards on the table are the same), but the actual results of the hand may be different because you never know what the opponents’ cards are.

Rock-Paper-Scissors is not deterministic, in the sense that any given throw (like Rock) will sometimes win, sometimes lose, and sometimes draw, depending on what the opponent does.

Note that there are deterministic elements to all of these games. For example, once you have rolled your die in Chutes & Ladders, called the hand in Poker, or made your throw in Rock-Paper-Scissors, resolving the turn is done by the (deterministic) rules of the game. If you throw Rock and your opponent throws Paper, the result is always the same.

Non-determinism

The opposite of a deterministic game is a non-deterministic game. The easiest way to illustrate the difference is by comparing the arcade classic Pac-Man with its sequel Ms. Pac-Man.

The original Pac-Man is entirely deterministic. The ghosts follow an AI that is purely dependent on the current game state. As a result, following a pre-defined sequence of controller inputs on a given level will always produce the exact same results, every time. Because of this deterministic property, some players were able to figure out patterns of movements; the game changed from one of chasing and being chased to one of memorizing and executing patterns.

This ended up being a problem: arcade games required that players play for 3 minutes or less, on average, in order to remain profitable. Pattern players could play for hours. In Ms. Pac-Man, an element of non-determinism was added: sometimes the ghosts would choose their direction randomly. As a result, Ms. Pac-Man returned the focus of gameplay from pattern execution to quick thinking and reaction, and (at the championship levels, at least) the two games play quite differently.

Now, this is not to say that a non-deterministic game is always “better.” Remember, Chess and Go are deterministic games that have been played for thousands of years; as game designers today, we count ourselves lucky if our games are played a mere two or three years from the release date. So my point is not that one method is superior to the other, but rather that analyzing game balance is done differently for deterministic versus non-deterministic games.

Deterministic games can theoretically undergo some kind of brute-force analysis, where you look at all the possible moves and determine the best one. The number of moves to consider may be so large (as with the game Go) that a brute-force solve is impossible, but in at least some cases (typically early-game and end-game positions) you can do a bit of number-crunching to figure out optimal moves.

Non-deterministic games don’t work that way. They require you to use probability to figure out the odds of winning for each move, with the understanding that any given playthrough might give a different actual result.

Solvability

This leads to a discussion of whether a game is solvable. When we say a game is solvable, in general, we mean that the game has a single, knowable “best” action to take at any given point in play, and it is possible for players to know what that move is. In general, we find solvability to be an undesirable trait in a game. If the player knows the best move, they aren’t making any interesting decisions; every decision is obvious.

That said, there are lots of kinds of solvability, and some kinds are not as bad as others.

Trivial solvability

Normally, when we say a game is solvable in a bad way, we mean that it is trivially solvable: it is a game where the human mind can completely solve the game in real-time. Tic-Tac-Toe is a common example of this; young children who haven’t solved the game yet find it endlessly fascinating, but at some point they figure out all of the permutations, solve the game, and no longer find it interesting.

We can still talk about the balance of trivially solvable games. For example, given optimal play on both sides, we know that Tic-Tac-Toe is a draw, so we could say in this sense that the game is balanced.

However, we could also say that if you look at all possible games of Tic-Tac-Toe that could be played, you’ll find that there are more ways for X to win than O, so you could say it is unbalanced because there is a first-player advantage (although that advantage can be negated through optimal play by both players). These are the kinds of balance considerations for a trivially solvable game.

Theoretical complete solvability

There are games like Chess and Go which are theoretically solvable, but in reality there are so many permutations that the human mind (and even computers) can’t realistically solve the entire game. Here is a case where games are solvable but still interesting, because their complexity is beyond our capacity to solve them.

It is hard to tell if games like this are balanced, because we don’t actually know the solution and don’t have the means to actually solve it. We must rely on our game designer intuition, the (sometimes conflicting) opinions of expert players, or tournament stats across many championship-level games, to merely get a good guess as to whether the game is balanced. (Another impractical way to balance these games is to sit around and wait for computers to become powerful enough to solve them within our lifetimes, knowing that this may or may not happen.)

Solving non-deterministic games

You might think that only deterministic games can be solved. After all, non-deterministic games have random or unknown elements, so “optimal” play does not guarantee a win (or even a draw). However, I would say that non-deterministic games can still be “solved,” it’s just that the “solution” looks a lot different: a solution in this case is a set of actions that maximize your probability of winning.

The card game Poker provides an interesting example of this. You have some information about what is in your hand, and what is showing on the table. Given this information, it is possible to compute the exact odds of winning with your hand, and in fact championship players are capable of doing this in real-time. Because of this, all bets you make are either optimal, or they aren’t. For example, if you compute you have a 50/50 chance of winning a $300 pot, and you are being asked to pay $10 to stay in, that is clearly an optimal move for you; if you lost $10 half of the time and won $300 the other half, you would come out ahead. In this case, the “solution” is to make the bet.

You might wonder, if Poker is solvable, what stops it from becoming a boring grind of players computing odds with a calculator and then betting or not based on the numbers? From a game balance perspective, such a situation is dangerous: not only do players know what the best move is (so there are only obvious decisions), but sometimes optimal play will end in a loss, effectively punishing a player for their great skill at odds computation! In games like this, you need some kind of mechanism to get around the problem of solvability-leading-to-player-frustration.

The way Poker does this, and the reason it’s so interesting, is that players may choose to play suboptimally in order to bluff. Your opponents’ behavior may influence your decisions: if the guy sitting across from you is betting aggressively, is it because he has a great hand and knows something you don’t know? Or is he just bad at math? Or is he good at math, and betting high with a hand that can’t really win, but he’s trying to trick you into thinking his hand is better than it really is? This human factor is not solvable, but the solvable aspects of the game are used to inform players, which is why at the highest levels Poker is a game of psychology, not math. It is these psychological elements that prevent Poker from turning into a game of pure luck when played by skilled individuals.

Solving intransitive games

Intransitive games are a fancy way of saying “games like Rock-Paper-Scissors.” Since the outcome depends on a simultaneous choice between you and your opponent, there does not appear to be an optimal move, and therefore there is no way to solve it. But in fact, the game is solvable… it’s just that the solution looks a bit different from other kinds of games.

The solution to Rock-Paper-Scissors is a ratio of 1:1:1, meaning that you should throw about as many of each type as any other. If you threw more of one type than the others (say, for example, you favored Paper), your opponent could throw the thing that beats your preferred throw (Scissors) more often, which lets them win slightly more than average. So in general, the “solution” to RPS is to throw each symbol with equal frequency in the long term.

Suppose we made a rules change: every win with Rock counts as two wins instead of one. Then we would have a different solution where the ratios would be uneven. There are mathematical ways to figure out exactly what this new ratio would be, and we will talk about how to do that later in this course. You might find this useful, for example, if you’re making a real-time strategy game with some units that are strong against other unit types (in an intransitive way), but you want certain units to be more rare and special in gameplay than others. So, you might change the relative capabilities to make certain units more cost-efficient or more powerful overall, which in turn would change the relative frequencies of each unit type appearing (given optimal play).

Perfect information

A related concept to solvability is that of information availability. In a game with perfect or complete information, all players know all elements of the game state at all times. Chess and Go are obvious examples.

You might be able to see, then, that any deterministic game with perfect information is at least theoretically, completely solvable.

Other games have varying degrees of incomplete information, meaning that each player does not know the entire game state. Card games like Hearts or Poker work this way; in these games, each player has privileged information where they know some things the opponents don’t, and in fact part of the game is trying to figure out the information that the other players know. With Hearts in particular, the sum of player information is the game state; if players combined their information, the game would have perfect information.

Yet other games have information that is concealed from all of the players. An example of this is the card game Rummy. In this game, all players know what is in the discard pile (common information), each player knows what is in his or her own hand but no one else’s hand (privileged information), and no player knows what cards remain in the draw deck or what order those cards are placed in (hidden information).

Trading-card games like Magic: the Gathering offer additional layers of privileged information, because players have some privileged information about the possibility space of the game. In particular, each player knows the contents of cards in their own deck, but not their opponent’s, although neither player knows the exact order of cards in their own draw pile. Even more interesting, there are some cards that can give you some limited information on all of these things (such as cards that let you peek at your opponent’s hand or deck), and part of the challenge of deck construction is deciding how important it is to gain information versus how important it is to actually attack or defend.

Symmetry

Another concept that impacts game balance is whether a game is symmetric or asymmetric. Symmetric games are those where all players have exactly the same starting position and the same rules. Chess is almost symmetric, except for that pesky little detail about White going first.

Could you make Chess symmetric with a rules change? Yes: for example, if both players wrote down their moves simultaneously, then revealed and resolved the moves at the same time, the game would be completely symmetric (and in fact there are variants along these lines). Note that in this case, symmetry requires added complexity; you need extra rules to handle cases where two pieces move into or through the same square, or when one piece enters a square just as another piece exits the square.

In one respect, you could say that perfectly symmetric games are automatically balanced. At the very least, you know that no player is at an advantage or disadvantage from the beginning, since they have the exact same starting positions. However, symmetry alone does not guarantee that the game objects or strategies within the game are balanced; there may still be certain pieces that are much more powerful than others, or certain strategies that are clearly optimal, and symmetry doesn’t change that. Perfect symmetry is therefore not an “easy way out” for designers to make a balanced game.

The Metagame

The term metagame literally means “the game surrounding the game” and generally refers to the things players do when they’re not actively playing the game, but their actions are still affecting their chances to win their next game. Trading card games like Magic: the Gathering are a clear example of this: in between games, players construct a deck, and the contents of that deck affect their ability to win. Another example would be championship-level Poker or even world-tournament Rock-Paper-Scissors, players analyze the common behaviors and strategies of their opponents. Professional sports have all kinds of things going on in between games: scouting, drafting, trading, training, and so on.

For games that have a strong metagame, balance of the metagame is an important consideration. Even if the game itself is balanced, a metagame imbalance can destroy the balance of the game. Professional sports are a great example. Here is a positive feedback loop that is inherent in any professional sport: teams that win more games, get more money; more money lets them attract better players, which further increases their chance of winning more games. (With apologies to anyone who lives in New York, this is the reason everyone else hates the Yankees.)

Other sports have metagame mechanics in place to control this positive feedback. American Football includes the following:

  • Drafts. When a bunch of players leave their teams to be picked up by other teams, the weakest team gets to choose first. Thus, the weakest teams pick up the strongest players each year.
  • Salary caps. If there is a limit to how much players can make, it prevents a single team from being able to throw infinite money at the problem. Even weaker teams are able to match the max salary for a few of their players.
  • Player limits. There are a finite number of players allowed on any team; a good team can’t just have an infinite supply of talent.

These metagame mechanics are not arbitrary or accidental. They were put in place on purpose, by people who know something about game balance, and it’s part of the reason why any given Sunday, the weakest team in the NFL might be able to beat the strongest team.

From this, you might think that fixing the metagame is a great way to balance the game. Trading card games offer two examples of where this tactic fails.

First, let’s go back to the early days of Magic: the Gathering. Some cards are rarer than others. Thus, some rare cards ended up being flat-out better than their more common counterparts. Richard Garfield clearly thought that rarity itself was a way to balance the game. (In his defense, this was not an unreasonable assumption at the time. He had no way of knowing that some people would spend thousands of dollars on cards just to get a full set of rares, nor did he know that people would largely ignore the rules for “ante” which served as an additional balancing factor.) Today, trading card game designers are more aware of this problem; while one does occasionally see games where “more rare = more powerful,” players are (thankfully) less willing to put up with those kinds of shenanigans.

Second, TCGs have a problem that video games don’t have: once a set of cards is released, it is too late to fix it with a “patch” if some kind of gross imbalance is discovered. In drastic cases they can restrict or outright ban a card, or issue some kind of errata, but in most cases this is not practical; the designers are stuck. Occasionally you might see a designer that tries to balance an overpowered card in a previous set by creating a “counter-card” in the next set. This is a metagame solution: if all the competitive decks use Card X, then a new Card Y that punishes the opponent for playing Card X gives players a new metagame option… but if Card Y does nothing else, it is only useful in the context of the metagame. This essentially turns the metagame into Rock (dominant deck) – Paper (deck with counter-card) – Scissors (everything else). This may be preferable to a metagame with only one dominant strategy, but it’s not much better, and it mostly shifts the focus from the actual play of the game to the metagame: you may as well just show your opponent your deck and determine a winner that way.

This is admittedly an extreme example, and there are other ways to work around an imbalance like this. The counter-card might have other useful effects. The game overall might be designed such that player choices during the game contribute greatly to the outcome, where the deck is more of an influence on your play style than a fixed strategy. Still, some games have gone so far as to print a card that says “When your opponent plays [specific named card], [something really bad happens to them]” with no other effect, so I thought this was worth bringing up.

Game balance versus metagame balance

In professional sports, metagame fixes make the game more balanced. In TCGs, metagame fixes feel like a hack. Why the difference?

The reason is that in sports, the imbalance exists in the metagame to begin with, so a metagame fix for this imbalance is appropriate. In TCGs, the imbalance is either part of the game mechanics or individual game objects (i.e. specific cards); the metagame imbalances that result from this are a symptom and not the root cause. As a result, a metagame fix for a TCG is a response to a symptom, while the initial problem continues unchecked.

The lesson here is that a game balance problem in one part of a game can propagate to and manifest in other areas, so the problems you see during playtesting are not always the exact things that need to be fixed. When you identify an imbalance, before slapping a fix on it, ask yourself why this imbalance is really happening, what is actually causing it… and then, what is causing that, and what is causing that, and so on as deep as you can go.

Game Balance Made Easy, For Lazy People

I’m going to try to leave you each week with some things you can do right now to improve the balance of a game you’re working on, and then some “homework” that you can do to improve your skills. Since we just talked about vocabulary (symmetry, determinism, solvability, perfect information, and the metagame) this week, there’s not a lot to do, so instead I’m going to start by saying what not to do.

If you’re having trouble balancing a game, the easiest way to fix it is to get your players to do this for you. One way to do this is auction mechanics. There is nothing wrong with auctions as a game mechanic, mind you – they are often very compelling and exciting – but they can be used as a crutch to cover up a game imbalance, and you need to be careful of that.

Let me give an example of how this works. Suppose you’re a designer at Blizzard working on Warcraft IV, and you have an Orcs-vs-Humans two-player game that you want to balance, but you think the Orcs are a little more powerful than the humans (but not much). You decide the best way to balance this is to reduce the starting resources of the Orcs; if the Humans start with, say, 100 Gold… maybe the Orcs start with a little less. How much less? Well, that’s what game balance is all about, but you have no idea how much less.

Here’s a solution: make players bid their starting Gold on the right to play the Orcs at the start of the game. Whoever bids the most, loses their bid; the other player starts with the full 100 Gold and plays the weaker Humans. Eventually, players will reach a consensus and start bidding about the same amount of Gold, and this will make things balanced. I say this is lazy design because there is a correct answer here, but instead of taking the trouble to figure it out, you instead shift that burden to the players and make them balance the game for you.

Note that this can actually be a great tool in playtesting. Feel free to add an auction in a case like this, let your testers come to a consensus of how much something is worth, then just cost it accordingly in the final version (without including the auction).

Here’s another way to get players to balance your game for you: in a multiplayer free-for-all game, include mechanics that let the players easily gang up on the leader. That way, if one player finds a game imbalance, the other players can cooperate to bring them down. Of course, this brings other gameplay problems with it. Players may “sandbag” (play suboptimally on purpose) in order to not attract too much attention. Players who do well (even without rules exploits) may feel like the other players are punishing them for being good players. Kill-the-leader mechanics serve as a strong negative feedback loop, and negative feedback has other consequences: the game tends to take longer, early-game skill is not as much a factor as late-game, and some players may feel that the outcome of the game is more decided on their ability to not be noticed than their actual game skill. Again, there is nothing inherently wrong with giving players the ability to form alliances against each other… but doing it for the sole purpose of letting players deal with your poor design and balancing skills should not be the first and only solution.

Okay, is there anything you can do right now to improve the balance of a game you’re working on? I would say, examine your game to see if you are using your players as a game balance crutch (through auctions, kill-the-leader mechanics, or similar). Try removing that crutch and seeing what happens. You might find out that these mechanics are covering up game imbalances that will become more apparent when they’re removed. When you find the actual imbalances that used to be obscured, you can fix them and make the game stronger. (You can always add your auctions or kill-the-leader mechanics back in later, if they are important to the gameplay.)

Homework

I’ll go out on a limb and guess that if you’re reading this, you are probably playing at least one game in your spare time. If you work in the game industry as a designer, you may be playing a game at your day job for research. Maybe you have occasion to watch other people play, either while playtesting your own game, or on television (such as watching a game show or a professional sports match).

As you play (or watch) these games this week, don’t just play/watch for fun. Instead, think about the actions in the game and ask yourself if you think the game is balanced or not. Why do you think that? If you feel it’s not, where are the imbalances? What are the root causes of those imbalances, and how would you change them if you wanted to fix them? Write down your thoughts if it helps.

The purpose of this is not to actually improve the game you’re examining, but to give you some practice in thinking critically about game balance. It’s emotionally easier to find problems in other people’s games than your own (even if the actual process is the same), so start by looking at the balance or imbalance in other people’s games first.