Archive for August, 2010

Level 8: Metrics and Statistics

August 25, 2010

Readings/Playings

See “additional resources” at the end of this blog post for a number of supplemental readings.

This Week

One of the reasons I love game balance is that different aspects of balance touch all these other areas of game development. When we were talking about pseudorandom numbers, that’s an area where you get dangerously close to programming. Last week we saw how the visual design of a level can be used as a game reward or to express progression to the player, which is game design but just this side of art. This week, we walk right up to the line where game design intersects business.

This week I’m covering two topics: statistics, and metrics. For anyone who isn’t familiar with what these mean, ‘metrics’ just means measurements, so it means you’re actually measuring or tracking something about your game; leaderboards and high score lists are probably the best-known metrics because they are exposed to the players, but we also can use a lot of metrics behind the scenes to help design our games better. Once we collect a lot of metrics, once we take these measurements, they don’t do anything on their own until we actually look at them and analyze them to learn something. ‘Statistics’ is just one set of tools we can use to get useful information from our metrics. Even though we collect metrics first and then use statistics to analyze them, I’m actually going to talk about statistics first because it’s useful to know how your tools work before you decide what data to capture.

Statistics

People who have never done statistics before think of it as an exact science. It’s math, math is pure, and therefore you should be able to get all of the right answers all the time. In reality, it’s a lot messier, and you’ll see that game designers (and statisticians) disagree about the core principles of statistics even more than they disagree about the core principles of systems design, if such a thing is possible.

What is statistics, and how is it different from probability?

In probability, you’re given a set of random things, and told exactly how random they are and what the nature of that randomness is, and your goal is to try to predict what the data will look like when you set those random things in motion. Statistics is kind of the opposite: here you’re given the data up front, and you’re trying to figure out the nature of the randomness that caused that data.

Probability and statistics share one important thing in common: neither one is guaranteed. Probability can tell you there’s a 1/6 chance of rolling a given number on 1d6, but it does not tell you what the actual number will be when you roll the die for real. Likewise, statistics can tell you from a bunch of die rolls that there is probably a uniform distribution, and that you’re 95% sure, but there’s a 5% chance that you’re wrong. That chance never goes to zero.

Statistical Tools

This isn’t a graduate-level course in statistical analysis, so all I’ll say is that there are a lot more tools than this that are outside the scope of this course. What I’m going to put down here is the bare minimum I think every game designer should know to be useful when analyzing metrics in their games.

Mean: when someone asks for the “average” of something, they’re probably talking about the mean average (there are two other kinds of average that I know of, and probably a few more that I don’t). To get the mean of a bunch of values, you add them all up and then divide by the number of values. This is sort of like “expected value” in probability, except that you’re computing it based on real-world die-rolls and not a theoretically balanced set of die-rolls. Calculating the mean is incredibly useful; it tells you what the ballpark expected value is of something in your game. You can think of the mean as a Monte Carlo calculation of expected value, except you’re using real-world playtest data rather than a computer simulation.

Median: this is another kind of average. To calculate it, take all your values and sort them from smallest to largest, then pick the one in the center. So, if you have five values, the third one is the median. (If you have an even number of values so that there are two in the middle rather than one, you’re supposed to take the mean of those, in case you’re curious.) On its own, the median isn’t all that useful, but it tells you a lot when you compare it with the mean, about whether your values are all weighted to one side, or if they’re basically symmetric. For example, in the US, the median household income is a lot lower than the mean, which basically means we’ve got a lot of people making a little, and a few people making these ridiculously huge incomes that push up the mean. In a classroom, if the median is lower than the mean, it means most of the students are struggling and one or two brainiacs are wrecking the curve (although more often it’s the other way around, where most students are clustered around 75 or 80 and then you’ve got some lazy kid who’s getting a zero which pulls down the mean a lot). If you’re making a game with a scoreboard of some kind and you see a median that’s a lot lower than the mean, it probably means you’ve got a small minority of players that are just obscenely good at the game and getting these massive scores, while everyone else who is just a mere mortal is closer to the median.

Standard deviation: this is just geeky enough to make you sound like you’re good at math if you use it in normal conversation. You calculate it by taking each of your data points, subtracting it from the mean, squaring the result (that is, multiply the result by itself), add all of those squares together, divide by the total number of data points, then take the square root of the whole thing. For reasons that you don’t really need to know, going through this process gives you a number that represents how spread out the data is. Basically, about two-thirds of your data is within a single standard deviation from the mean, and nearly all of your data is within two standard deviations, so how big your SD is ends up being relative to how big your mean is. A mean of 50, SD of 25 looks a lot more spread out than a mean of 5000, SD of 25. A relatively large SD means your data is all over the place, while a really small SD means your data is all clustered together.

Examples

To give you an example, let’s consider two random variables: 2d6, and 1d11+1. Like we talked about in the week on probability, both of these will give you a number from 2 to 12. But they have a very different nature; the 2d6 clusters around the center, while the 1d11+1 is spread out among all outcomes evenly. Now, statistics doesn’t have anything to say about this, but let’s just assume that I happen to roll the 2d6 thirty-six times and get one of each result, and I roll 1d11 eleven times and get one of each result… which is wildly unlikely, but it does allow us to use statistical tools to analyze probability.

The mean of both of these is 7, which means if you’re trying to balance either of these numbers in your game, you can use 7 as the expected value. What about the range? The median is also 7 for both, which means you’re just as likely to be above or below the mean, which makes sense because both of these are symmetric. However, you’ll see the standard deviations are a lot different: for 2d6, the SD is about two-and-a-half, meaning that most of the time you’ll get a result in between 5 and 9; for 1d11+1, the SD is about three-and-a-half, so you’ll get about as many rolls in the 4 to 10 range here, as you did in the 5 to 9 range for 2d6. Which doesn’t actually sound like that big a deal, until you start rolling.

As a different example, maybe you’re looking at the time it takes playtesters to get through your first tutorial level in a video game you’re designing. Your target is that it should take about 5 minutes. You measure the mean at 5 minutes, median at 6 minutes, standard deviation at 2 minutes. What does that tell us? Most people take between 3 and 7 minutes, which might be good or bad depending on just how much of the level is under player control, but in a lot of games the tutorial is meant to be a pretty standardized, linear experience so this would actually feel like a pretty huge range. The other cause for concern is the high median, which suggests most people actually take longer than 5 minutes, you just have a few people who get through the level really fast and they bring down the mean. This is good news in that you know you’re not having anyone taking four hours to complete it or whatever (otherwise the mean would be a lot higher than the median instead!), but it’s potentially bad news in that some players might have found an unintentional shortcut or exploit, or else they’re just skipping through all your intro dialogue or something which is going to get them stuck and frustrated in level 2, or something else.

This suggests another lesson: statistics can tell us that something is happening, but it can’t tell us why, and sometimes there are multiple explanations for the why. This is one area where statistics is often misused or flat out abused, by finding one logical explanation for the numbers and ignoring that there could be other explanations as well. In this case, we have no way of knowing why the median is shorter than the mean, or its implications of game design… but we could spend some time thinking about all the possible answers, and then we could collect more data that would help us differentiate between them. For example, if one fear is that players are skipping through the intro dialogue, we could actually measure the time spent reading dialogues in addition to the total level time. We’ll come back to this concept of metrics design later today.

There’s also a third lesson here: I didn’t tell you how many playtesters it took to get this data! The more tests you have, the more accurate your final analysis will be. If you only had three tests, these numbers are pretty meaningless if you’re trying to predict general trends. If there were a few thousand tests, that’s a lot better. (How many tests are required to make sure your analysis is good enough? Depends what “good enough” means to you. The more you have, the more sure you can be, but it’s never actually 100% no matter how many tests you do. People who do this for a living have “confidence intervals” where they’ll tell you a range of values and then say something like they’re 95% sure that the actual mean in reality is within such-and-such a range. This is a lot more detail than most of us need for our day-to-day design work.)

Outliers

When you have a set of data with some small number of points that are way above or below the mean, the name for those is outliers (pronounced like the words “out” and “liars”). Since these tend to throw off your mean a lot more than the median, if you see the mean and median differing by a lot it’s probably because of an outlier.

When you’re doing a statistical analysis, you might wonder what to do with the outliers. Do you include them? Do you ignore them? Do you put them in their own special group? As with most things, it depends.

If you’re just looking for normal, usual play patterns, it is generally better to discard the outliers because by definition, those are not arising from normal play. If you’re looking for edge cases then you want to leave them in and pay close attention; for example, if you’re trying to analyze the scores people get so you know how to display them on the leaderboards, realize that your top-score list is going to be dominated by outliers at the top.

In either case, if you have any outliers, it is usually worth investigating further to figure out what happened. Going back to our earlier example of level play times, if most players take 5 to 7 minutes to complete your tutorial but you notice a small minority of players that get through in 1 or 2 minutes, that suggests those players may have found some kind of shortcut or exploit, and you want to figure out what happened. If most players take 5 to 7 minutes and you have one player that took 30 minutes, that is probably because the player put it on pause or had to walk away for awhile, or they were just having so much fun playing around in the sandbox that they didn’t care about advancing to the next level or whatever, and you can probably ignore that if it’s just one person. But if it’s three or four people (still in the vast minority) who did that, you might investigate further, because there might be some small number of people who are running into problems… or, players who find one aspect of your tutorial really fun, which is good to know as you’re designing the other levels.

Population samples

Here’s another way statistics can go horribly wrong: it all comes down to what and who you’re sampling.

I already mentioned one frequent problem, which is not having a large enough sample. The more data points you have, the better. I’ll give you an example: back when I played Magic: the Gathering regularly, this one time I put together a tournament deck for a friend, for a tournament that I couldn’t play in but they could. To tell if I had the right ratio of land to spells, I shuffled and dealt an opening hand and played a few mock turns to see if I was getting enough. I’d do this a bunch of times going through most of the deck, then I’d take some land out or put some in depending on how many times I had too much or too little, and then I’d reshuffle and do it again. At the time I figured this was a pretty good quick-and-dirty way to figure out how much land I needed. But it just so happened that I wasn’t noticing that the land was actually very evenly distributed and not clustered, so most of the time it seemed like I was doing okay by the end… but I never actually stopped to count. After the tournament, which my friend lost badly, they reported to me that they were consistently not drawing enough land, and when we actually went through the deck and counted, there were only 16 lands in a deck of 60 cards! I took a lot of flak from my friend for that, and rightly so. The real problem here was that I was trying to analyze the number of land through statistical methods, but my sample size was way too small to draw any meaningful conclusions.

Here’s another example: suppose you’re making a game aimed at the casual market. You have everyone on the development team play through the game to get some baseline data on how long it takes to play through each level and how challenging each level is. Problem: the people playing the game are probably not casual gamers, so this is not really a representative sample of your target market. I’m sure this has happened before at some point in the past.

A more recent example: in True Crime: Hong Kong, publisher Activision allegedly demanded that the developers change the main character from male to female, because their focus group said they preferred a male protagonist. The problem: the focus group was made up of all males, or the questions were inherently biased by the person setting it up, as a deliberate attempt to further their agenda rather than to actually find out the real-world truth. Activision denies all of this, of course, but that hasn’t stopped it from being the subject of many industry conversations… not just about the role of women in games, but about the use of focus groups and statistics in game design. You also see things like this happening in the rest of the world, particularly in governmental politics, where a lot of people have their own personal agenda and they’re willing to warp a study and use statistics as a way of proving their point.

Basically, when you’re collecting playtest data, you want to do your best to recruit playtesters who are as similar as possible to your target market, and you want to have as many playtests as possible so that the random noise gets filtered out. Your analysis is only as good as your data!

Even if you use statistics “honestly” there’s still problems every game designer runs into, depending on the type of game.

  • For video games, you are at the mercy of your programmers, and there’s nothing you can do about that. The programmers are the ones who need to spend time coding the metrics you ask for. Programming time is always limited, so at some point you’ll have to make the call between having your programming team implement metrics collection… or having them implement, you know, the actual game mechanics you’ve designed. And that’s if the decision isn’t made for you by your producer or your publisher. This is easier in some companies than others, but in some places “metrics” falls into the same category as audio, and localization, and playtesting: tasks that are pushed off towards the end of the development cycle until it’s too late to do anything useful.
  • For tabletop games, you are at the mercy of your playtesters. The more data points you collect, the better, of course. But in reality, a video game company can release an early beta and get hundreds or thousands of plays, while you might realistically be able to do a fraction of that with in-person tabletop tests. With a smaller sample, your playtest data is a lot more suspect.
  • For any kind of game, you need to be very clear ahead of time what it is you need measured, and in what level of detail. If you run a few hundred playtests and only find out afterwards that you need to actually collect certain data from the game state that you weren’t collecting before, you’ll have to do those tests over again. The only thing to do about this is to recognize that just like design itself, playtesting with metrics is an iterative process, and you need to build that into your schedule.
  • Also for any kind of game, you need to remember that it’s very easy to mess things up accidentally and get the wrong answer, just like probability. Unlike probability, there aren’t as many sanity checks to make the wrong numbers look wrong, since by definition you don’t always know exactly what you’re looking for or what you expect the answer to be. So you need to proceed with caution, and use every method you can find of independently verifying your numbers. It also helps if you try to envision in advance what the likely outcomes of your analysis might be, and what they’ll look like.

Correlation and causality

Finally, one of the most common errors with statistics is when you notice some kind of correlation between two things. “Correlation” just means that when one thing goes up, another thing always seems to go up (which is a positive correlation) or down (a negative correlation) at the same time. Recognizing correlations is useful, but a lot of times people assume that just because two things are correlated, that one causes the other, and that is something that you cannot tell from statistics alone.

Let’s take an example. Say you notice when playing Puerto Rico that there’s a strong positive correlation between winning, and buying the Factory building; say, out of 100 games, in 95 of them the winner bought a Factory. The natural assumption is that the Factory must be overpowered, and that it’s causing you to win. But you can’t draw this conclusion by default, without additional information. Here are some other equally valid conclusions, based only on this data:

  • Maybe it’s the other way around, that winning causes the player to buy a Factory. That sounds odd, but maybe the idea is that a Factory helps the player who is already winning, so it’s not that the Factory is causing the win, it’s that being strongly in the lead causes the player to buy a Factory for some reason.
  • Or, it could be that something else is causing a player both to win and to buy a Factory. Maybe some early-game purchase sets the player up for buying the Factory, and that early-game purchase also helps the player to win, so the Factory is just a symptom and not the root cause.
  • Or, the two could actually be uncorrelated, and your sample size just isn’t large enough for the Law of Large Numbers to really kick in. We actually see this all the time in popular culture, where two things that obviously have no relation are found to be correlated anyway, like the Redskins football game predicting the next Presidential election in the US, or an octopus that predicts the World Cup winner, or a groundhog seeing its shadow supposedly predicting the remaining length of Winter. As we learned when looking at probability, if you take a lot of random things you’ll be able to see patterns; one thing is that you can expect to see unlikely-looking streaks, but another is that if you take a bunch of sets of data, some of them will probably be randomly correlated. If you don’t believe me, try rolling two separate dice a few times and then computing the correlation between those numbers; I bet it’s not zero!

Statistics in Excel

Here’s the good news: while there are a lot of math formulas here, you don’t actually need to know any of them. Excel will do this for you, it has all these formulas already. Here are a few useful ones:

  • AVERAGE: given a range of cells, this calculates the mean. You could also take the SUM of the cells and then divide by the number of cells, but AVERAGE is easier.
  • MEDIAN: given a range of cells, this calculates the median, as you might guess.
  • STDEV: given a range of cells, this gives you the standard deviation.
  • CORREL: you give this two ranges of cells, not one, and it gives you the correlation between the two sets of data. For example, you could have one column with a list of final game scores, and another column with a list of scores at the end of the first turn, to see if early-game performance is any kind of indicator of the final game result (if so, this might suggest a positive feedback loop in the game somewhere). The number Excel gives you from the CORREL function ranges between -1 (perfect negative correlation) to 0 (uncorrelated) to +1 (perfect positive correlation).

Is there any good news?

At this point I’ve spent so much time talking about how statistics are misused, that you might be wondering if they’re actually useful for anything. And the answer is, yes. If you have a question that can’t be answered with intuition alone, and it can’t be answered just through the math of your cost or progression curves, statistics let you draw useful conclusions… if you ask the right questions, and if you collect the right data.

Here’s an example of a time when statistics really helped a game I was working on. I worked for a company that made this online game, and we found that our online population was falling and people weren’t playing as many games, because we hadn’t released an update in awhile. (That part was expected. With no updates, I’ve found that an online game loses about half of its core population every 6 months or so, at least that was my experience.)

But what we didn’t expect, was one of our programmers got bored one day and made a trivia bot, just this little script that would log into our server with its own player account, send a trivia question every couple of minutes, and then parse the incoming public chat to see if anyone said the right answer. And it was popular, as goofy and stupid and simple as it was, because it was such a short, immediate casual experience.

Now, the big question is: what happened to the player population, and what happened to the actual, real game that players were supposed to be playing (you know, the one where they would log in to the chat room to find someone to challenge, before they got distracted by the trivia bot)?

Some players loved the trivia bot. It gave them something to do in between games. Others hated the trivia bot; they claimed that it was harder to find a game, because everyone who was logged in was too busy answering dumb trivia questions to actually play a real game. Who was right? Intuition failed, because everyone’s intuition was different. Listening to the players failed, because the vocal minority of the player base was polarized, and there was no way to poll those who weren’t in the vocal minority. Math failed, because the trivia bot wasn’t part of the game, let alone part of the cost curve. Could we answer this with statistics? We sure could, and we did!

This was simple enough that it didn’t even require much analysis. Measure the total number of logins per day. Measure total number of actual games played. Since our server tracked every player login, logout and game start already, we had this data, all we had to do was some very simple analysis, tracking how these things changed over time. As expected, the numbers were all falling gradually since the time of the last real release, but the trivia bot actually caused a noticeable increase in both total logins and number of games played. It turned out that players were logging in and playing with the trivia bot, but as long as they were there, they were also playing games with each other! That was a conclusion that would have been impossible to reach in any kind of definitive way, without analysis of the hard data. And it taught us something really important about online games: more players online, interacting with each other, is better… even if they’re interacting in nonstandard ways.

Metrics

Here’s a common pattern in artistic and creative fields, particularly things like archaeology or art preservation or psychology or medicine where it requires a certain amount of intuition but at the same time there is still a “right answer” or “best way” to do things. The progression goes something like this:

  1. Practitioners see their field as a “soft science”; they don’t know a whole lot about best principles or practices. They do learn how things work, eventually, but it’s mostly through trial and error.
  2. Someone creates a technology that seems to solve a lot of these problems algorithmically. Practitioners rejoice. Finally, we’re a hard science! No more guesswork! Most younger practitioners abandon the “old ways” and embrace “science” as a way to solve all their field’s problems. The old guard, meanwhile, sees it as a threat to how they’ve always done things, and eyes it skeptically.
  3. The limitations of the technology become apparent after much use. Practitioners realize that there is still a mysterious, touchy-feely element to what they do, and that while some day the tech might answer everything, that day is a lot farther off than it first appeared. Widespread disillusionment occurs as people no longer want to trust their instincts because theoretically technology can do it better, but people don’t want to trust the current technology because it doesn’t work that great yet. The young turks acknowledge that this wasn’t the panacea they thought; the old guard acknowledge that it’s still a lot more useful than they assumed at first. Everyone kisses and makes up.
  4. Eventually, people settle into a pattern where they learn what parts can be done by computer algorithms, and what parts need an actual creative human thinking, and the field becomes stronger as the best parts of each get combined. But learning which parts go best with humans and which parts are best left to computers is a learning process that takes awhile.

Currently, game design seems to be just starting Step 2. We’re hearing more and more people anecdotally saying why metrics and statistical analysis saved their company. We hear about MMOs that are able to solve their game balance problems by looking at player patterns, before the players themselves learn enough to exploit them. We hear of Zynga changing the font color from red to pink which generates exponentially more click-throughs from players to try out other games. We have entire companies that have sprung up solely to help game developers capture and analyze their metrics. The industry is falling in love with metrics, and I’ll go on record predicting that at least one company that relies entirely on metrics-driven design will fail, badly, by the time this whole thing shakes out, because they will be looking so hard at the numbers that they’ll forget that there are actually human players out there who are trying to have fun in a way that can’t really be measured directly. Or maybe not. I’ve been wrong before.

At any rate, right now there seems to be three schools of thought on the use of metrics:

  • The Zynga model: design almost exclusively by metrics. Love it or hate it, 60 Million monthly active unique players laugh at your feeble intuition-based design.
  • Rebellion against the Zynga model: metrics are easy to misunderstand, easy to manipulate, and are therefore dangerous and do more harm than good. If you measure player activity and find out that more players use the login screen than any other in-game action, that doesn’t mean you should add more login screens to your game out of some preconceived notion that if a player does it, it’s fun. If you design using metrics, you push yourself into designing the kinds of games that can be designed solely by metrics, which pushes you away from a lot of really interesting video game genres.
  • The moderate road: metrics have their uses, they help you tune your game to find local “peaks” of joy. They help you take a good game and make it just a little bit better, by helping you explore the nearby design space. However, intuition also has its uses; sometimes you need to take broad leaps in unexplored territory to find the global “peaks,” and metrics alone will not get you there, because sometimes you have to make a game a little worse in one way before it gets a lot better in another, and metrics won’t ever let you do that.

Think about it for a bit and decide where you stand, personally, as a designer. What about the people you work with on a team (if you work with others on a team)?

How much  to measure?

Suppose you want to take some metrics in your game so you can go back and do statistical analysis to improve your game balance. What metrics do you actually take – that is, what exactly do you measure?

There are two schools of thought that I’ve seen. One is to record anything and everything you can think of, log it all, mine it later. The idea is that you’d rather collect too much information and not use it, than to not collect a piece of critical info and then have to re-do all your tests.

Another school of thought is that “record everything” is fine in theory, but in practice you either have this overwhelming amount of extraneous information from which you’re supposed to find this needle in a haystack of something useful, or potentially worse, you mine the heck out of this data mountain to the point where you’re finding all kinds of correlations and relationships that don’t actually exist. By this way of thinking, instead you should figure out ahead of time what you’re going to need for your next playtest, measure that and only that, and that way you don’t get confused when you look at the wrong stuff in the wrong way later on.

Again, think about where you stand on the issue.

Personally, I think a lot depends on what resources you have. If it’s you and a few friends making a small commercial game in Flash, you probably don’t have time to do much in the way of intensive data mining, so you’re better off just figuring out the useful information you need ahead of time, and add more metrics later if a new question occurs to you that requires some data you aren’t tracking yet. If you’re at a large company with an army of actuarial statisticians with nothing better to do than find data correlations all day, then sure, go nuts with data collection and you’ll probably find all kinds of interesting things you’d never have thought of otherwise.

What specific things do you measure?

That’s all fine and good, but whether you say “just get what we need” or “collect everything we can,” neither of those is an actual design. At some point you need to specify what, exactly, you need to measure.

Like game design itself, metrics is a second-order problem. Most of the things that you want to know about your game, you can’t actually measure directly, so instead you have to figure out some kind of thing that you can measure that correlates strongly with what you’re actually trying to learn.

Example: measuring fun

Let’s take an example. In a single-player Flash game, you might want to know if the game is fun or not, but there’s no way to measure fun. What correlates with fun, that you can measure? One thing might be if players continue to play for a long time, or if they spend enough time playing to finish the game and unlock all the achievements, or if they come back to play multiple sessions (especially if they replay even after they’ve “won”), and these are all things you can measure. Now, keep in mind this isn’t a perfect correlation; players might be coming back to your game for some other reason, like if you’ve put in a crop-withering mechanic that punishes them if they don’t return, or something. But at least we can assume that if a player keeps playing, there’s probably at least some reason, and that is useful information. More to the point, if lots of players stop playing your game at a certain point and don’t come back, that tells us that point in the game is probably not enjoyable and may be driving players away. (Or if the point where they stopped playing was the end, maybe they found it incredibly enjoyable but they beat the game and now they’re done, and you didn’t give a reason to continue playing after that. So it all depends on when.)

Player usage patterns are a big deal, because whether people play, how often they play, and how long they play are (hopefully) correlated with how much they like the game. For games that require players to come back on a regular basis (like your typical Facebook game), the two buzzwords you hear a lot are Monthly Active Uniques and Daily Active Uniques (MAU and DAU). The “Active” part of that is important, because it makes sure you don’t overinflate your numbers by counting a bunch of old, dormant accounts belonging to people who stopped playing. The “Unique” part is also important, since one obsessive guy who checks FarmVille ten times a day doesn’t mean he counts as ten users. Now, normally you’d think Monthly and Daily should be equivalent, just multiply Daily by 30 or so to get Monthly, but in reality the two will be different based on how quickly your players burn out (that is, how much overlap there is between different sets of daily users). So if you divide MAU/DAU, that tells you something about how many of your players are new and how many are repeat customers.

For example, suppose you have a really sticky game with a small player base, so you only have 100 players, but those players all log in at least once per day. Here your MAU is going to be 100, and your average DAU is also going to be 100, so your MAU/DAU is 1. Now, suppose instead that you have a game that people play once and never again, but your marketing is good, so you get 100 new players every day but they never come back. Here your average DAU is still going to be 100, but your MAU is around 3000, so your MAU/DAU is about 30 in this case. So that’s the range, MAU/DAU goes between 1 (for a game where every player is extremely loyal) to 28, 30 or 31 depending on the month (representing a game where no one ever plays more than once).

A word of warning: a lot of metrics, like the ones Facebook provides, might use different ways of computing these numbers so that one set of numbers isn’t comparable to another. For example, I saw one website that listed the “worst” MAU/DAU ratio in the top 100 applications as 33-point-something, which should be flatly impossible, so clearly the numbers somewhere are being messed with (maybe they took the Dailies from a different range of dates than the Monthlies or something). And then some people compute this as a %, meaning on average, what percentage of your player pool logs in on a given day, which should range from a minimum of about 3.33% (1/30 of your monthly players logging in each day) to 100% (all of your monthly players log in every single day). This is computed by taking DAU/MAU (instead of MAU/DAU) and multiplying by 100 to get a percentage. So if you see any numbers like this from analytics websites, make sure you’re clear on how they’re computing the numbers so you’re not comparing apples to oranges.

Why is it important to know this number? For one thing, if a lot of your players keep coming back, it probably means you’ve got a good game. For another, it means you’re more likely to make money on the game, because you’ve got the same people stopping by every day… sort of like how if you operate a brick-and-mortar storefront, an individual who just drops in to window-shop may not buy anything, but if that same individual comes in and is “just looking” every single day, they’re probably going to buy something from you eventually.

Another metric that’s used a lot, particularly on Flash game portals, is to go ahead and ask the players themselves to rate the game (often in the form of a 5-star rating system). In theory, we would hope that higher ratings mean a better game. In theory, we’d also expect that a game with high player ratings would also have a good MAU/DAU ratio, that is, that the two would be correlated. I don’t know of any actual studies that have checked this, though I’d be interested to see the results, but if I had to guess I’d assume that there is some correlation but not a lot. Users that give ratings are not a representative sample; for one thing, they tend to have strong opinions or else they wouldn’t bother rating (seriously, I always had to wonder about those opinion polls that would say something like 2% of poll respondents said they had no opinion… like, who calls up a paid opinion poll phone line just to say they have no opinion?), so while actual quality probably falls along a bell curve you tend to have more 5-star and 1-star ratings than 3-star, which is not what you’d expect if everyone rated the game fairly. Also, there’s the question of whether player opinion is more or less meaningful than actual play patterns; if a player logs into a game every day for months on end but rates it 1 out of 5 stars, what does that mean? Or if a player admits they haven’t even played the game, but they’re still giving it 4 out of 5 stars based on… I don’t know… its reputation or something? Also, players tend to not rate a game while they’re actively playing, only (usually) after they’re done, which probably skews the ratings a bit (depending on why they stopped playing). So it’s probably better to pay attention to usage patterns than player reporting, especially if that reporting isn’t done during the game from within the game in a way that you can track.

Now, I’ve been talking about video games, in fact most of this is specific to online games. The equivalent in tabletop games is a little fuzzier, but as the designer you basically want to be watching people’s facial expressions and posture to see where in the game they’re engaged and where they’re bored or frustrated. You can track how these correlate to certain game events or board positions. Again, you can try to rely on interviews with players, but that’s dangerous because player memory of these things is not good (and even if it is, not every playtester will be completely honest with you). For video games that are not online, you can still capture metrics based on player usage patterns, but actually uploading them anywhere is something you want to be very clear to your players about, because of privacy concerns.

Another example: measuring difficulty

Player difficulty, like fun, is another thing that’s basically impossible to measure directly, but what you can measure is progression, and failure to progress. Measures of progression are going to be different depending on your game.

For a game that presents skill-based challenges like a retro arcade game, you can measure things like how long it takes the player to clear each level, how many times they lose a life on each level, and importantly, where and how they lose a life. Collecting this information makes it really easy to see where your hardest points are, and if there are any unintentional spikes in your difficulty curve. I understand that Valve does this for their FPS games, and that they actually have a visualizer tool that will not only display all of this information, but actually plot it overlaid on a map of the level, so you can see where player deaths are clustered. Interestingly, starting with Half-Life 2 Episode 2 they actually have live reporting and uploading from players to their servers, and they have displayed their metrics on a public page (which probably helps with the aforementioned privacy concerns, because players can see for themselves exactly what is being uploaded and how it’s being used).

Yet another example: measuring game balance

What if instead you want to know if your game is fair and balanced? That’s not something you can measure directly either. However, you can track just about any number attached to any player, action or object in the game, and this can tell you a lot about both normal play patterns, and also the relative balance of strategies, objects, and anything else.

For example, suppose you have a strategy game where each player can take one of four different actions each turn, and you have a way of numerically tracking each player’s standing. You could record each turn, what action each player takes, and how it affects their respective standing in the game.

Or, suppose you have a CCG where players build their own decks, or a Fighting game where each player chooses a fighter, or an RTS where players choose a faction, or an MMO or tabletop RPG where players choose a race/class combination. Two things you can track here are which choices seem to be the most and least popular, and also which choices seem to have the highest correlation with actually winning. Note that this is not always the same thing; sometimes the big, flashy, cool-looking thing that everyone likes because it’s impressive and easy to use is still easily defeated by a sufficiently skilled player who uses a less well-known strategy. Sometimes, dominant strategies take months or even years to emerge through tens of thousands of games played; the Necropotence card in Magic: the Gathering saw almost no play for six months or so after release, until some top players figured out how to use it, because it had this really complicated and obscure set of effects… but once people started experimenting with it, they found it to be one of the most powerful cards ever made. So, both popularity and correlation with winning are two useful metrics here.

If a particular game object sees a lot more use than you expected, that can certainly signal a potential game balance issue. It may also mean that this one thing is just a lot more compelling to your target audience for whatever reason – for example, in a high fantasy game, you might be surprised to find more players creating Elves than Humans, regardless of balance issues… or maybe you wouldn’t be that surprised. Popularity can be a sign in some games that a certain play style is really fun compared to the others, and you can sometimes migrate that into other characters or classes or cards or what have you in order to make the game overall more fun.

If a game object sees less use than expected, again that can mean it’s underpowered or overcosted. It might also mean that it’s just not very fun to use, even if it’s effective. Or it might mean it is too complicated to use, it has a high learning curve relative to the rest of the game, and so players aren’t experimenting with it right away (which can be really dangerous if you’re relying on playtesters to actually, you know, playtest, if they leave some of your things alone and don’t play with them).

Metrics have other applications besides game objects. For example, one really useful area is in measuring beginning asymmetries, a common one being the first-player advantage (or disadvantage). Collect a bunch of data on seating arrangements versus end results. This happens a lot with professional games and sports; for example, I think statisticians have calculated the home-field advantage in American Football to be about 2.3 points, and depending on where you play the first-move advantage in Go is 6.5 or 7.5 points (in this latter case, the half point is used to prevent tie games). Statistics from Settlers of Catan tournaments have shown a very slight advantage to playing second in a four-player game, on the order of a few hundredths of a percent; normally we could discard that as random variation, but the sheer number of games that have been played gives the numbers some weight.

One last example: measuring money

If you’re actually trying to make money by selling your game, in whole or part, then at the end of the day this is one of your most important considerations. For some people it’s the most important consideration: they’d rather have a game that makes lots of money but isn’t fun or interesting at all, than a game that’s brilliant and innovative and fun and wonderful but is a “sleeper hit” which is just a nice way of saying it bombed in the market but didn’t deserve to. Other game designers would rather make the game fun first, so one thing for each of you to consider is, personally, which side of the fence you’re on… because if you don’t know that about yourself, someone else is going to make the call for you some day.

At any rate, money is something that just about every commercial game should care about in some capacity, so it’s something that’s worth tracking. Those sales tell you something related to how good a job you did with the game design, along with a ton of other factors like market conditions, marketing success, viral spread, and so on.

With traditional games sold online or through retail, this is a pretty standard curve: big release-day sales that fall off over time on an exponentially decreasing curve, until they get to the point where the sales are small enough that it’s not worth it to sell anymore. With online games you don’t have to worry about inventory or shelf space so you can hold onto it a bit longer, which is where this whole “long tail” thing came from, because I guess the idea is that this curve looks like it has a tail on the right-hand side. In this case the thing to watch for is sudden spikes, when those are, and what caused them, because they don’t usually happen on their own.

Unfortunately, that means sales metrics for traditional sales models aren’t all that useful to game designers. We see a single curve that combines lots of variables, and we only get the feedback after the game is released. If it’s one game in a series it’s more useful because we can see how the sales changed from game to game and what game mechanics changed, so if the game took a major step in a new direction and that drastically increased or reduced sales, that gives you some information there.

If instead your game is online, such as an MMO, or a game in a Flash portal or on Facebook, the pattern can be a bit different: sales start slow (higher if you do some marketing up front), then if the game is good it ramps up over time as word-of-mouth spreads, so it’s basically the same curve but stretched out a lot longer. The wonderful thing about this kind of release schedule is that you can manage the sales curve in real-time: make a change to your game today, measure the difference in sales for the rest of the week, and keep modifying as you go. Since you have regular incremental releases that each have an effect on sales, you’re getting constant feedback on the effects that minor changes have on the money your game brings in. However, remember that your game doesn’t operate in a vacuum; there are often other outside factors that will affect your sales. For example, I bet if there’s a major natural disaster that’s making international headlines, that most Facebook games will see a temporary drop in usage because people are busy watching the news instead. So if a game company made a minor game change the day before the Gulf oil spill and they noticed a sudden decrease in usage from that geographical area, the designers might mistakenly think their game change was a really bad one if they weren’t paying attention to the real world.

Ideally, you’d like to eliminate these factors, so you know what you’re measuring, controlling for outside factors. One way of doing this, which works in some special cases, is to actually have two separate versions of your game that you roll out simultaneously to different players, and then you compare the two groups. One important thing about this is that you do need to select the players randomly (and not, say, giving one version to the earliest accounts created on your system and the other version to the most recent adopters). Of course, if the actual gameplay itself is different between the two groups, that’s hard to do without some players getting angry about it, especially if one of the two groups ends up with an unbalanced design that can be exploited. So it’s better to do this with things that don’t affect balance: banner ads, informational popup dialog text, splash screens, the color or appearance of the artwork in your game, and other things like that. Or, if you do this with gameplay, do it in a way that is honest and up front with the players; I could imagine assigning players randomly to a faction (like World of Warcraft’s Alliance/Horde split, except randomly chosen when an account is created) and having the warring factions as part of the backstory of the game, so it would make sense that each faction would have some things that are a little bit different. I don’t know of any game that’s actually done this, but it would be interesting to see in action.

For games where players can either play for free or pay – this includes shareware, microtransactions, subscriptions, and most other kinds of payment models for online games – you can look at not just how many users you have, or how much money you’re getting total, but also where that money is coming from on a per-user basis. This is very powerful, but there are also a lot of variables to consider.

First, what counts as a “player”? If some players have multiple accounts (with or without your permission) or if old accounts stay around while dormant, the choice of whether to count these things will change your calculations. Typically companies are interested in looking at revenue from unique, active users, because dormant accounts tend to not be spending money, and a single player with several accounts should really be thought of as one entity (even if they’re spending money on each account).

Second, there’s a difference between players who are playing for free and have absolutely no intention of paying for your game ever, versus players who spend regularly. Consider a game where you make a huge amount of money from a tiny minority of players; this suggests you have a great game that attracts and retains free players really well, and that once players can be convinced to spend any money at all they’ll spend a lot, but it also says that you have trouble with “conversion” – that is, convincing players to take that leap and spend their first dollar with you. In this case, you’d want to think of ways to give players incentive to spend just a little bit. Now consider a different game, where most people that play spend something but that something is a really small amount. That’s a different problem, suggesting that your payment process itself is driving away players, or at least that it’s giving your players less incentive to spend more, like you’re hitting a spending ceiling somewhere. You might be getting the same total cash across your user base in both of these scenarios, but the solutions are different.

Typically, the difference between them is shown with two buzzwords, ARPU (Average Revenue Per User) and ARPPU (Average Revenue Per Paying User). I wish we called them players rather than users, but it wasn’t my call. At any rate, in the first example with a minority of players paying a lot when most people play for free, ARPPU will be really high; in the second case, ARPPU will be really low, even if ARPU is the same for both games.

Of course, total number of players is also a consideration, not just the average. If your ARPU and ARPPU are both great but you’ve got a player base of a few thousand when you should have a few million, then that’s probably more of a marketing problem than a game design problem. It depends on what’s happening to your player base over time, and where you are in the “tail” of your sales curve. So these three things, sales, ARPU and ARPPU, can give you a lot of information about whether your problem is with acquisition (that is, getting people to try your game the first time), conversion (getting them to pay you money the first time), or retention (getting players to keep coming back for more). And when you overlap these with the changes you make in your game and the updates you offer, a lot of times you can get some really useful correlations between certain game mechanics and increased sales.

Another interesting metric to look at is the graph of time-vs-money for the average user. How much do people give you on the day they start their account? What about the day after that, and the day after that? Do you see a large wad of cash up front and then nothing else? A decreasing curve where players try for free for awhile, then spend a lot, then spend incrementally smaller amounts until they hit zero? An increasing curve where players spend a little, then a bit more, then a bit more, until a sudden flameout where they drop your game entirely? Regular small payments on a traditional “long tail” model? What does this tell you about the value you’re delivering to players in your early game à mid-game à late game à elder game progression?

While you’re looking at revenue, don’t forget to take your costs into account. There are two kinds of costs: up-front development, and ongoing costs. The up-front costs are things like development of new features, including both the “good” ones that increase revenue and also the “bad” ones that you try out and then discard; keep in mind that your ratio of good-to-bad features will not be perfect, so you have to count some portion of the bad ideas as part of the cost in developing the good ones (this is a type of “sunk cost” like we discussed in Week 6 when we talked about situational balance). Ongoing costs are things like bandwidth and server costs and customer support, which tend to scale with the number of players. Since a business usually wants to maximize its profits (that is, the money it takes in minus the money it spends) and not its revenue (which is just the money it takes in), you’ll want to factor these in if you’re trying to optimize your development resources.

A word of warning (gosh, I seem to be giving a lot of warnings this week): statistics are great at analyzing the past, but they’re a lot trickier if you try to use them to predict the future. For example, a really hot game that just launched might have what initially looks like an exponentially-increasing curve. It’s tempting to assume, especially if it’s a really tight fit with an exponential function, that the trend will continue. But common sense tells us this can’t continue indefinitely: the human population is finite, so if your exponential growth is faster than human population growth it has to level off eventually. Business growth curves are usually not exponential, but instead what is called “S-shaped” where it starts as an exponentially increasing curve and eventually transitions to a logarithmically (that is, slowly) increasing curve, and then eventually levels off or starts decreasing. A lot of investors get really burned when they mistake an S curve for an exponential increase, as we saw (more or less) with the dot-com crash about 10 years ago. Illegal pyramid schemes also tend to go through this kind of growth curve, with the exception that once they reach the peak of the “S” there’s usually a very sudden crash.

A Note on Ethics

This is the second time this Summer when talking about game balance that I’ve brought up an issue of professional ethics. It’s weird how this comes up in discussions of applied mathematics, isn’t it? Anyway…

The ethical consideration here is that a lot of these metrics look at player behavior but they don’t actually look at the value added (or removed) from the players’ lives. Some games, particularly those on Facebook which have evolved to make some of the most efficient use of metrics of any games ever made, have also been accused (by some people) of being blatantly manipulative, exploiting known flaws in human psychology to keep their players playing (and giving money) against their will. Now, this sounds silly when taken to the extreme, because we think of games as something inherently voluntary, so the idea of a game “holding us prisoner” seems strange. On the other hand, any game you’ve played for an extended period of time is a game you are emotionally invested in, and that emotional investment does have cash value. If it seems silly to you that I’d say a game “makes” you spend money, consider this: suppose I found all of your saved games and put them in one place. Maybe some of these are on console memory cards or hard disks. Maybe some of them are on your PC hard drive. For online games, your “saved game” is on some company’s server somewhere. And then suppose I threatened to destroy all of them… but not to worry, I’d replace the hardware. So you get free replacements of your hard drive and console memory cards, a fresh account on every online game you subscribe to, and so on. And then suppose I asked you, how much would you pay me to not do that. And I bet when you think about it, the answer is more than zero, and the reason is that those saved games have value to you! And more to the point, if one of these games threatened to delete all your saves unless you bought some extra downloadable content, you would at least consider it… not because you wanted to gain the content, but because you wanted to not lose your save.

To be fair, all games involve some kind of psychological manipulation, just like movies and books and all other media (there’s that whole thing about suspending our disbelief, for example). And most people don’t really have a problem with this; they still see the game experience itself as a net value-add to their life, by letting them live more in the hours they spend playing than they would have lived had they done other activities.

But just like difficulty curves, the difference between value added and taken away is not constant; it’s different from person to person. This is why we have things like MMOs that enhance the lives of millions of subscribers, while also causing horrendous bad events in the lives of a small minority that lose their marriage and family to their game obsession, or that play for so long without attending to basic bodily needs that they keel over and die at the keyboard.

So there is a question of how far we can push our players to give us money, or just to play our game at all, before we cross an ethical line… especially in the case where our game design is being driven primarily by money-based metrics. As before, I invite you to think about where you stand on this, because if you don’t know, the decision will be made for you by someone else who does.

If You’re Working on a Game Now…

If you’re working on a game now, as you might guess, my suggestion for any game you’re working on is to ask yourself what game design questions could be best answered through metrics:

  • What aspects of your design (especially relating to game balance) do you not know the answers to, at this point in time? Make a list.
  • Of those open questions, which ones could be solved through playtesting, taking metrics, and analyzing them?
  • Choose one question from the remaining list that is, in your opinion, the most vital to your gameplay. Figure out what metrics you want to use, and how you will use statistics to draw conclusions. What are the different things you might see? What would they mean? Make sure you know how you’ll interpret the data in advance.
  • If you’re doing a video game, make sure the game has some way of logging the information you want. If it’s a board game, run some playtests and start measuring!

Homework

This is going to be mostly a thought experiment, more than practical experience, because I couldn’t think of any way to force you to actually collect metrics on a game that isn’t yours.

Choose your favorite genre of game. Maybe an FPS, or RTS, CCG, tabletop RPG, Euro board game, or whatever. Now choose what you consider to be an archetypal example of such a game, one that you’re familiar with and preferably that you own.

Pretend that you were given the rights to do a remake of this game (not a sequel), that is, your intention was to keep the core mechanics basically the same but just to possibly make some minor changes for the purpose of game balance. Think of it as a “version 2.0” of the original. You might have some areas where you already suspect, from your designer’s instinct, that the game is unbalanced… but let’s assume you want to actually prove it.

Come up with a metrics plan. Assume that you have a ready supply of playtesters, or else existing play data from the initial release, and it’s just a matter of asking for the data and then analyzing it. Generate a list:

  • What game balance questions would you want answers to, that could be answered with statistical analysis?
  • What metrics would you use for each question? (It’s okay if there is some overlap here, where several questions use some of the same metrics.)
  • What analysis would you perform on your metrics to get the answers to each question? That is, what would you do to the data (such as taking means, medians and standard deviations, or looking for correlations)? If your questions are “yes” or “no,” what would a “yes” or “no” answer look like once you analyzed the data?

Additional Resources

Here are a few links, in case you didn’t get enough reading this week. Much of what I wrote was influenced by these:

http://chrishecker.com/Achievements_Considered_Harmful%3F

and

http://chrishecker.com/Metrics_Fetishism

Game designer Chris Hecker gave a wonderful GDC talk this year called “Achievements Considered Harmful” which talks about a different kind of metric – the Achievements we use to measure and reward player performance within a game – and why this might or might not be such a good idea. In the second article, he talks about what he calls “Metrics Fetishism,” basically going into the dangers of relying too much on metrics and not enough on common sense.

http://www.gamasutra.com/view/news/29916/GDC_Europe_Playfishs_Valadares_on_Intuition_Versus_Metrics_Make_Your_Own_Decisions.php

This is a Gamasutra article quoting Playfish studio director Jeferson Valadares at GDC Europe, suggesting when to use metrics and when to use your actual game design skills.

http://www.lostgarden.com/2009/08/flash-love-letter-2009-part-2.html

Game designer Dan Cook writes on the many benefits of metrics when developing a Flash game.

http://www.gamasutra.com/features/20070124/sigman_01.shtml

Written by the same guy who did the “Orc Nostril Hair” probability article, this time giving a basic primer on statistics rather than probability.

Level 7: Advancement, Progression and Pacing

August 18, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Question

If you want to check your answer from last week:

Well, I must confess I don’t know for sure if this is the right answer or not. In theory I could write a program to do a brute-force solution with all twelve towers – if each tower is either Swarm, Boost or Nothing (simply don’t build a tower in that location), then it’s “only” 3^12 possibilities – but I don’t have the time to do that at this moment. If someone finds a better solution, feel free to post here!

By playing around by hand in a spreadsheet, the best I came up with was the top and bottom rows both consisting of four Swarm towers, with the center row holding four Boost towers, giving a damage/cost ratio of 1.21.

The two Boost towers in the center give +50% damage to six Swarm towers surrounding them, thus providing a damage bonus of 1440 damage each, while the two Boost towers on the side support four Swarm towers for a damage bonus of 960 each. On average, then, Boost towers provide 1200 damage for a cost of 500, or a damage/cost ratio of 2.4.

Each Swarm tower provides 480 damage (x8 = 3840 damage, total). Each tower costs 640, for a damage/cost ratio of 0.75 for each one. While this is much less efficient than the Boost towers, the Swarm towers are still worth having; deleting any of them makes the surrounding Boost towers less effective, so in combination the Swarm towers are still more cost-efficient than having nothing at all.

However, the Boost towers are still much more cost-effective than Swarm towers (and if you look at the other tower types, Boost towers are the most cost-effective tower in the game, hands-down, when you assume many fully-upgraded towers surrounding them). The only thing that prevents Boost towers from being the dominant strategy at the top levels of play, I think, is that you don’t have enough cash to make full use of them. A typical game that lasts 40 levels might only give you a few thousand dollars or so, which is just not enough to build a killer array of fully-upgraded towers. Or, maybe there’s an opportunity for you to find new dominant strategies that have so far gone undiscovered…

This Week

In the syllabus, this week is listed as “advancement, progression and pacing for single-player games” but I’ve changed my mind. A lot of games feature some kind of advancement and pacing, even multiplayer games. There’s the multiplayer co-op games, like the tabletop RPG Dungeons & Dragons or the console action-RPG Baldur’s Gate: Dark Alliance or the PC game Left 4 Dead. Even within multiplayer competitive games, some of them have the players progressing and getting more powerful during play: players get more lands and cast more powerful spells as a game of Magic: the Gathering progresses, while players field more powerful units in the late game of Starcraft. Then there are MMOs like World of Warcraft that clearly have progression built in as a core mechanic of the game, even on PvP servers. So in addition to single-player experiences like your typical Final Fantasy game, we’ll be talking about these other things too: basically, how do you balance progression mechanics?

Wait, What’s Balance Again?

First, it’s worth a reminder of what “balance” even means in this context. As I said in the intro to this course, in terms of progression, there are three things to consider:

  1. Is the difficulty level appropriate for the audience, or is the game overall too hard or too easy?
  2. As the player progresses through the game, we expect the game to get harder to compensate for the player’s increasing skill level because they are getting better; does the difficulty increase at a good rate, or does it get too hard too fast (which leads to frustration), or does it get harder too slowly (leading to boredom while the player waits for the game to get challenging again)?
  3. If your avatar increases in power, whether that be from finding new game objects like better weapons or tools or other toys, gaining new special abilities, or just getting a raw boost in stats like Hit Points or Damage, are you gaining these at a good rate relative to the increase in enemy power? Or do you gain too much power too fast (making the rest of the game trivial after a certain point), or do you gain power too slowly (requiring a lot of mindless grinding to compensate, which artificially lengthens the game at the cost of forcing the player to re-play content that they’ve already mastered)?

We will consider each of these in turn.

Flow Theory

If you’re not familiar with the concept of “flow” then read up here from last summer’s course. Basically, this says that if the game is too hard for your level of skill you get frustrated, if it’s too easy you get bored, but if you’re challenged at the peak of your ability then you find the game engaging and usually more fun, and one of our goals as game designers is to provide a suitable level of challenge to our players.

There’s two problems here. First, not every player comes to the game with the same skill level, so what’s too easy for some players is too hard for others. How do you give all players the same experience but have it be balanced for all of them?

Second, as a player progresses through the game, they get better at it, so even if the game’s challenge level remains constant it will actually get easier for the player.

How do we solve these problems? Well, that’s most of what this week is about.

Why Progression Mechanics?

Before moving on, though, it’s worth asking what the purpose is behind progression mechanics to begin with. If we’re going to dedicate a full tenth of this course to progression through a game, progression mechanics should be a useful design tool that’s worth talking about. What is it useful for?

Ending the game

In most cases, the purpose of progression is to bring the game to an end. For shorter games especially, the idea is that progression makes sure the game ends in a reasonable time frame. So whether you’re making a game that’s meant to last 3 minutes (like an early-80s arcade game) or 30-60 minutes (like a family board game) or 3 to 6 hours (like a strategic wargame) or 30 to 300 hours (like a console RPG), the idea is that some games have a desired game length, and if you know what that length is, forced progression keeps it moving along to guarantee that the game will actually end within the desired time range. We’ll talk more about optimal game length later in this post.

Reward and training for the elder game

In a few specialized cases, the game has no end (MMOs, Sims, tabletop RPGs, or progression-based Facebook games), so progression is used as a reward structure and a training simulator in the early game rather than a way to end the game. This has an obvious problem which can be seen with just about all of these games: at some point, more progression just isn’t meaningful. The player has seen all the content in the game that they need to, they’ve reached the level cap, they’ve unlocked all of their special abilities in their skill tree, they’ve maxed their stats, or whatever. In just about all cases, when the player reaches this point, they have to find something else to do, and there is a sharp transition into what’s sometimes called the “elder game” where the objective changes from progression to something else. For players who are used to progression as a goal, since that’s what the game has been training them for, this transition can be jarring. The people who enjoy the early-game progression may not enjoy the elder game activities as much since they’re so different (and likewise, some people who would love the elder game never reach it because they don’t have the patience to go through the progression treadmill).

What happens in the elder game?

In Sim games and FarmVille, the elder game is artistic expression: making your farm pretty or interesting for your friends to look at, or setting up custom stories or skits with your sims.

In MMOs, the elder game is high-level raids that require careful coordination between a large group, or PvP areas where you’re fighting against other human players one-on-one or in teams, or exploring social aspects of the game like taking on a coordination or leadership role within a Guild.

In tabletop RPGs, the elder game is usually finding an elegant way to retire your characters and end the story in a way that’s sufficiently satisfying, which is interesting because in these games the “elder game” is actually a quest to end the game!

What happens with games that end?

In games where progression does end the game, there is also a problem: generally, if you’re gaining power throughout the game and this serves as a reward to the player, the game ends right when you’re reaching the peak of your power. This means you don’t really get to enjoy being on top of the world for very long. If you’re losing power throughout the game, which can happen in games like Chess, then at the end you just feel like you’ve been ground into the dirt for the entire experience, which isn’t much better.

Peter Molyneux has pointed out this flaw when talking about the upcoming Fable 3, where he insists you’ll reach the peak of your power early on, succeed in ruling the world, and then have to spend the rest of the game making good on the promises you made to get there… which is a great tagline, but really all he’s saying is that he’s taking the standard Console RPG progression model, shortening it, and adding an elder game, which means that Fable 3 will either live or die on its ability to deliver a solid elder-game experience that still appeals to the same kinds of players who enjoyed reaching that point in the first place. Now, I’m not saying it can’t be done, but he’s got his work cut out for him. In the interview I saw, it sounded like he was treating this like a simple fix to an age-old problem, but as we can see here it’s really just replacing one difficult design problem with another. I look forward to seeing if he solves it… because if he does, that will have major applications for MMOs and FarmVille and everything with an elder game in between.

Two Types of Progression

Progression tends to work differently in PvP games compared to PvE games. In PvP (this includes multi-player PvP like “deathmatch” and also single-player games played against AI opponents), you’re trying to win against another player, human or AI, so the meaning of your progression is relative to the progression of your opponents. In PvE games (this includes both single-player games and multi-player co-op) you are progressing through the game to try to overcome a challenge and reach some kind of end state, so for most of these games your progress is seen in absolute terms. So that is really the core distinction I’d like to make, games where the focus is either on relative power between parties, or absolute power with respect to the game’s core systems. I’m just using “PvP” and “PvE” as shorthand here, and if I slip up and refer to PvP as “multi-player” and PvE as “single-player” that is just because those are the most common design patterns.

Challenge Levels in PvE

When you’re progressing through a bunch of challenges within a game, how do you track the level of challenge that the player is feeling, so you know if it’s increasing too quickly or too slowly, and whether the total challenge level is just right?

This is actually a tricky question to answer, because the “difficulty” felt by the player is not made up of just one thing here, it’s actually a combination of four things, but the player experiences it only as a single “am I being challenged?” feeling. If we’re trying to measure the player perception of how challenged they are, it’s like if the dashboard of your car took the gas, current speed, and engine RPMs and multiplied them all together to get a single “happiness” rating, and you only had this one number to look at to try to figure out what was causing it to go up or down.

The four components of perceived difficulty

First of all, there’s the level of the player’s skill at the game. The more skilled the player is at the game, the easier the challenges will seem, regardless of anything else.

Second, there’s the player’s power level in the game. Even if the player isn’t very good at the game, doubling their Hit Points will still keep them alive longer, increasing their Attack stat will let them kill things more effectively, giving them a Hook Shot lets them reach new places they couldn’t before, and so on.

Third and fourth, there’s the flip side of both of these, which are how the game creates challenges for the player. The game can create skill-based challenges which require the player to gain a greater amount of skill in the game, for example by introducing new enemies with better AI that make them harder to hit. Or it can provide power-based challenges, by increasing the hit points or attack power or other stats of the enemies in the game (or just adding more enemies in an area) without actually making the enemies any more skilled.

Skill and power are interchangeable

You can substitute skill and power, to an extent, either on the player side or the challenge side. We do this all the time on the challenge side, adding extra hit points or resource generation or otherwise just using the same AI but inflating the numbers, and expecting that the player will need to either get better stats themselves or show a higher level of skill in order to compensate. Or a player who finds a game too easy can challenge themselves by not finding all of the power-ups in a game, giving themselves less power and relying on their high level of skill to make up for it (I’m sure at least some of you have tried beating the original Zelda with just the wooden sword, to see if it could be done). Creating a stronger AI to challenge the player is a lot harder and more expensive, so very few games do that (although the results tend to be spectacular when they do – I’m thinking of Gunstar Heroes as the prototypical example).

At any rate, we can think of the challenge level as the sum of the player’s skill and power, subtracted from the game’s skill challenges and power challenges. This difference gives us the player’s perceived level of difficulty. So, when any one of these things changes, the player will feel the game get harder or easier. Written mathematically, we have this equation:

PerceivedDifficulty = (SkillChallenge + PowerChallenge) – (PlayerSkill + PlayerPower)

Example: perceived challenge decreases naturally

How do we use this information? Let’s take the player’s skill, which generally increases over time. That’s significant, because it means that if everything else is equal, that is if the player’s power level, and the overall challenge in the game stay the same, over time the player will feel like the game is getting easier, and eventually it’ll be too easy. To keep the player’s attention once they get better, every game must get harder in some way. (Or at least, every game where the player’s skill can increase. There are some games with no skill component at all, and those are exempted here.)

Changing player skill

Now, you might think the player skill curve is not under our control. After all, players come to our game with different pre-existing skill levels, and they learn at different rates. However, as designers we actually do have some control over this, based on our mechanics:

  • If we design deep mechanics that interact in a lot of ways with multiple layers of strategy, so that mastering the basic game just opens up new ways to look at the game at a more abstract meta-level, the player’s skill curve will be increasing for a long time, probably with certain well-defined jumps when the player finally masters some new way of thinking, like when a Chess player first starts to learn book openings, or when they start understanding the tradeoffs of tempo versus board control versus total pieces on the board.
  • If our game is more shallow, or has a large luck component, we will expect to see a short increase in skill as the player masters what little they can, and then a skill plateau. There are plenty of valid design reasons to do this intentionally. One common example is educational games, where part of the core vision is that you want the player to learn a new skill from the game, and then you want them to stop playing so they can go on to learn other things. Or this might simply be the tradeoff for making your game accessible: “A minute to learn, a minute to master.”
  • You can also control how quickly the player learns, based on the number of tutorials and practice areas you provide. One common design pattern, popularized by Valve, is to give the player some new weapon or tool or toy in a safe area where they can just play around with it, then introduce them immediately to a relatively easy area where they are given a series of simple challenges that let them use their new toy and learn all the cool things it can do, and then you give them a harder challenge where they have to integrate the new toy into their existing play style and combine it with other toys. By designing your levels to teach the player specific skills in certain areas, you can ramp the player up more quickly so they can increase their skill faster.
  • What if you don’t want the player to increase in skill quickly, because you want the game to last longer? If you want the player to learn more slowly, you can instead use “skill gating” as I’ve heard it called. That is, you don’t necessarily teach the player how to play your game, or hold their hand through it. Instead, you simply offer a set of progressively harder challenges, so you are at least guaranteed that if a player completes one challenge, they are ready for the next: each challenge is essentially a signpost that says “you must be at least THIS GOOD to pass.”

Measuring the components of perceived challenge

Player skill is hard to measure mathematically on its own, because as I said earlier, it is combined with player power in any game that includes both. For now, I can say that the best way to get a handle on this is to use playtesting and metrics, for example looking at how often players die or are otherwise set back, where these failures happen, how long it takes players to get through a level the first time they encounter it, and so on. We’ll talk more about this next week.

Player power and power-based challenges are much easier to balance mathematically: just compare the player power curve with the game’s opposition power curve. You have complete control over both of these; you control when the player is gaining power, and also when their enemies are presenting a larger amount of power to counter them. What do you want these curves to look like? Part of it depends on what you expect the skill curve to be, since you can use power as a compensatory mechanism in either direction. As a general guideline, the most common pattern I’ve seen looks something like this: within a single area like an individual dungeon or level, you start with a sudden jump in difficulty since the player is entering a new space after mastering the old one. Over time, the player’s power increases, either through level-ups or item drops, until they reach the end of the level where there may be another sudden difficulty jump in the form of a boss, and then after that typically another sudden jump in player power when they get loot from the boss or reach a new area that lets them upgrade their character.

Some dungeons split themselves into several parts, with an easier part at the beginning, then a mid-boss, then a harder part, and then a final boss, but really you can just think of this as the same pattern repeated several times without a change of graphical scenery. String a bunch of these together and that’s the power progression in your game: the difficulty jumps initially in a new area, stays constant awhile, has a sudden jump at the end for the boss, then returns; meanwhile the player’s power has sudden jumps at the end of an area, with incremental gains along the way as they find new stuff or level up.

That said, this is not the only pattern of power progression, not even necessarily the best for your game! These will vary based on genre and intended audience. For Space Invaders, over the course of a single game, the game’s power challenges, player skill and player power are all constant; the only thing that increases is the game’s skill challenge (making the aliens start faster and lower to the ground in each successive wave) until eventually they present a hard enough challenge to overwhelm the player.

Rewards in PvE

In PvE games especially, progression is strongly related to what is sometimes called the “reward schedule” or “risk/reward cycle.” The idea is that you don’t just want the player to progress, you want them to feel like they are being rewarded for playing well. In a sense, you can think of progression as a reward itself: as the player continues in the game and demonstrates mastery, the ability to progress through the game shows the player they are doing well and reinforces that they’re a good player. One corollary here is that you do need to make sure the player notices you’re rewarding them (in practice, this is usually not much of a problem). Another corollary is that timing is important when handing out rewards:

  • Giving too few rewards, or spacing them out for too long so that the player goes for long stretches without feeling any sense of progression, is usually a bad thing. The player is demoralized and may start to feel like if they aren’t making progress, they’re playing the game wrong (even if they’re really doing fine).
  • Ironically, giving too many rewards can also be hazardous. One of the things we’ve learned from psychology is that happiness comes from experiencing some kind of gain or improvement, so many little gains produce a lot more happiness than one big gain, even if they add up to the same thing. Giving too many big rewards in a small space of time diminishes their impact.
  • Another thing we know from psychology is that a random reward schedule is more powerful than a fixed schedule. This does not mean that the rewards themselves should be arbitrary; they should be linked to the player’s progress through the game, and they should happen as a direct result of what the player did, so that the player feels a sense of accomplishment. It is far more powerful to reward the player because of their deliberate action in the game, than to reward them for something they didn’t know about and weren’t even trying for.

I’ll give a few examples:

  • Have you ever started a new game on Facebook and been immediately given some kind of trophy or “achievement unlocked” bonus just for logging in the first time? I think this is a mistake a lot of Facebook games make: they give a reward that seems arbitrary, and it actually waters down the player’s actual achievements later. It gives the impression that the game is too easy. Now, for some games, you may want them to seem easy if they are aimed at an extremely casual audience, but the danger is reducing the actual, genuine feelings of accomplishment the player gets later.
  • “Hidden achievements” in Xbox 360 games, or their equivalents on other platforms. If achievements are a reward for skill, how is the player to know what the achievement is if it’s hidden? Even more obnoxious, a lot of these achievements are for things that aren’t really under player control and that seem kind of arbitrary, like “do exactly 123 damage in a single attack” where damage is computed randomly. What exactly is the player supposed to feel rewarded for here?
  • A positive example would be random loot drops in a typical action game or action-RPG. While these are random, and occasionally the player gets a really cool item, this is still tied to the deliberate player action of defeating enemies, so the player is rewarded but on a random schedule. (Note that you can play around with “randomness” here, for example by having your game track the time between rare loot drops, and having it deliberately give a big reward if the player hasn’t seen one in awhile. Some games split the difference, with random drops and also fixed drops from specific quests/bosses, so that the player is at least getting some great items every now and then.)
  • Another common example: the player is treated to a cut-scene once they reach a certain point in a dungeon. Now, at first you might say this isn’t random – it happens at exactly the same point in the game, every time, because the level designer scripted that event to happen at exactly that place! And on multiple playthroughs you’d be correct… but the first time a player experiences the game, they don’t know where these rewards are, so from the player’s perspective it is not something that can be predicted; it may as well be random.

Now I’d like to talk about three kinds of rewards that all relate to progression: increasing player power, level transitions, and story progression.

Rewarding the player with increased power

Progression through getting a new toy/object/capability that actually increases player options is another special milestone. Like we said before, you want these spaced out, though a lot of times I see the player get all the cool toys in the first third or half of the game and then spend the rest of the game finding new and interesting ways to use them. This can be perfectly valid design; if the most fun toy in your game is only discovered 2/3rds of the way through, that’s a lot of time the player doesn’t get to have fun – Valve made this example famous through the Gravity Gun in Half-Life 2: as the story goes, they initially had you get this gun near the end of the game, but players had so much fun with it that they restructured their levels to give it to the player much earlier. Still, if you give the player access to everything early on, you need to use other kinds of rewards to keep them engaged through the longer final parts of the game where they don’t find any new toys. How can you do this? Here’s a few ways:

  • If your mechanics have a lot of depth, you can just present unique combinations of things to the player to keep them challenged and engaged. (This is really hard to do in practice.)
  • Use other rewards more liberally after you shut off the new toys: more story, more stat increases, more frequent boss fights or level transitions. You can also offer upgrades to their toys, although it’s debatable whether you can think of an “upgrade” as just another way of saying “new toy.”
  • Or you can, you know, make your game shorter. In this day and age, thankfully, there’s no shame in this. Portal and Braid are both well-known for two things: being really great games, and being short. At the big-budget AAA level, Batman: Arkham Asylum was one of the best games of last year (both in critical reception and sales), even though I hear it only lasts about ten hours or so.

Rewarding the player with level transitions

Progression through level transitions – that is, progression to a new area – is a special kind of reward, because it makes the player feel like they’re moving ahead (and they are!). You want these spaced out a bit so the player isn’t so overwhelmed by changes that they feel like the whole game is always moving ahead without them; a rule of thumb is to offer new levels or areas on a slightly increasing curve, where each level takes a little bit longer than the last. This makes the player feel like they are moving ahead more rapidly at the start of the game when they haven’t become as emotionally invested in the outcome; a player can tolerate slightly longer stretches between transitions near the end of the game, especially if they are being led up to a huge plot point. Strangely, a lot of this can be done with just the visual design of the level, which is admittedly crossing from game design into the territory of game art: for example, if you have a really long dungeon the players are traversing, you can add things to make it feel like each region of the dungeon is different, maybe having the color or texture of the walls change as the player gets deeper inside, to give the player a sense that they are moving forward.

Rewarding the player with story progression

Progression through plot advancement is interesting to analyze, because in so many ways the story is separate from the gameplay: in most games, knowing the characters’ motivations or their feelings towards each other has absolutely no meaning when you’re dealing with things like combat mechanics. And yet, in many games (originally this was restricted to RPGs, but we’re seeing story integrated into all kinds of games these days), story progression is one of the rewards built into the reward cycle.

Additionally, the story itself has a “difficulty” of sorts (we call it “dramatic tension”), so another thing to consider in story-based games is whether the dramatic tension of the story overlaps well with the overall difficulty of the game. Many games do not: the story climax is at the end, but the hardest part of the game is in the middle somewhere, before you find an uber-powerful weapon that makes the rest of the game easy. In general, you want rising tension in your story while the difficulty curve is increasing, dramatic climaxes (climaxen? climaces?) at the hardest part, and so on; this makes the story feel more integrated with the mechanics, all thanks to game balance and math. It’s really strange to write that you get a better story by using math, but there you are. (I guess another way of doing this would be to force the story writers to put their drama in other places to match the game’s difficulty curve, but in practice I think it’s easier to change a few numbers than to change the story.)

Combining the types of rewards into a single reward schedule

Note that a reward is a reward, so you don’t just want to space each category of rewards out individually, but also interleave them. In other words, you don’t need to have too many overlaps, where you have a level transition, plot advancement, and a power level increase all at once.

Level transitions are fixed, so you tend to see the power rewards sprinkled throughout the levels as rewards between transitions. Strangely, in practice, a lot of plot advancement tends to happen at the same time as level transitions, which might be a missed opportunity. Some games take the chance to add some backstory in the middle of levels, in areas that are otherwise uninteresting… although then the danger is that the player is getting a reward arbitrarily when they feel like they weren’t doing anything except walking around and exploring. A common design pattern I see in this case is to split the difference by scripting the plot advancement so it immediately follows a fight of some kind. Even if it’s a relatively easy fight, if it’s one that’s scripted, the reward of revealing some additional story immediately after can make the player feel like they earned it.

Challenge Levels in PvP

If PvE games are all about progression and rewards, PvP games are about gains and losses relative to your opponents. Either directly or indirectly, the goal is to gain enough power to win the game, and there is some kind of tug-of-war between the players as each is trying to get there first. I’ll remind you that when I’m saying “power” in the context of progression, I’m talking about the sum of all aspects of the player’s position in the game, so this includes having more pieces and cards put into play, more resources, better board position, taking more turns or actions, or really anything that affects the player’s standing (other than the player’s skill level at playing the game). The victory condition for the game is sometimes to reach a certain level of power directly; sometimes it is indirect, where the actual condition is something abstract like Victory Points, and it is the player’s power in the game that merely enables them to score those Victory Points. And in some cases the players don’t gain power, they lose power, and the object of the game is to get the opponent(s) to run out first. In any case, gaining power relative to your opponents is usually an important player goal.

Tracking player power as the game progresses (that is, seeing how power changes over time in a real-time game, or how it changes each turn in a turn-based game) can follow a lot of different patterns in PvP games. In PvE you almost always see an increase in absolute player power level over time (even if their power level relative to the challenges around them may increase or decrease, depending on the game). In PvP there are more options to play with, since everything is relative to the opponents and not compared with some absolute “you must be THIS GOOD to win the game” yardstick.

Positive-sum, negative-sum, and zero-sum games

This seems as good a time as any to talk about an important distinction in power-based progression that we borrow from the field of Game Theory: whether the game is zero-sum, positive-sum, or negative-sum. If you haven’t heard these terms before:

  • Positive-sum means that the overall power in the game increases over time. Settlers of Catan is an example of a positive-sum game: with each roll of the dice, resources are generated for the players, and all players can gain power simultaneously without any of their opponents losing power. Monopoly is another example of a positive-sum game, because on average every trip around the board will give the player $200 (and that money comes from the bank, not from other players). While there are a few spaces that remove wealth from the game and are therefore negative-sum (Income Tax, Luxury Tax, a few of the Chance and Community Chest cards, unmortgaging properties, and sometimes Jail), on average these losses add up to less than $200, so on average more wealth is created than removed over time. Some players use house rules that give jackpots on Free Parking or landing exactly on Go, which make the game even more positive-sum. While you can lose lots of money to other players by landing on their properties, that activity itself is zero-sum (one player is losing money, another player is gaining the exact same amount). This helps explain why Monopoly feels to most people like it takes forever: it’s a positive-sum game so the average wealth of players is increasing over time, but the object of the game is to bankrupt your opponents which can only be done through zero-sum methods. And the house rules most people play with just increase the positive-sum nature of the game, making the problem worse!
  • Zero-sum means that the sum of all power in the game is a constant, and can neither be created nor destroyed by players. In other words, the only way for me to gain power is to take it from another player, and I gain exactly as much as they lose. Poker is an example of a zero-sum game, because the only way to win money is to take it from other players, and you win exactly as much as the total that everyone else loses. (If you play in a casino or online where the House takes a percentage of each pot, it actually becomes a negative-sum game for the players.)
  • Negative-sum means that over time, players actually lose more power than they gain; player actions remove power from the game without replacing it. Chess is a good example of a negative-sum game; generally over time, your force is getting smaller. Capturing your opponent’s pieces does not give those pieces to you, it removes them from the board. Chess has no zero-sum elements, where capturing an enemy piece gives that piece to you (although the related game Shogi does work this way, and has extremely different play dynamics as a result). Chess does have one positive-sum element, pawn promotion, but that generally happens rarely and only in the end game, and serves the important purpose of adding a positive feedback loop to bring the game to a close… something I’ll talk about in just a second.

An interesting property here is that changes in player power, whether zero-sum, positive-sum, or negative-sum, are the primary rewards in a PvP game. The player feels rewarded because they have gained power relative to their opponents, so they feel like they have a better chance of winning after making a particularly good move.

Positive and negative feedback loops

Another thing I should mention here is how positive and negative feedback loops fit in with this, because you can have either kind of feedback loop with a zero-sum, positive-sum or negative-sum game, but they work differently. In case you’re not familiar with these terms, “positive feedback loop” means that receiving a power reward makes it more likely that you’ll receive more, in other words it rewards you for doing well and punishes you for doing poorly; “negative feedback loop” is the opposite, where receiving a power reward makes it less likely you’ll receive more, so it punishes you for doing well and rewards you for doing poorly. I went into a fair amount of detail about these in last summer’s course, so I won’t repeat that here.

One interesting property of feedback loops is how they affect the player’s power curve. With negative feedback, the power curve of one player usually depends on their opponent’s power: they will increase more when behind, and decrease more when ahead, so a single player’s power curve can look very different depending on how they’re doing relative to their opponents, and this will look different from game to game.

With positive feedback, you tend to have a curve that gets more sharply increasing or decreasing over time, with larger swings in the endgame; unlike negative feedback, a positive feedback curve doesn’t always take the opponent’s standings into account… it can just reward a player’s absolute power.

Now, these aren’t hard-and-fast rules… a negative feedback loop can be absolute, which basically forces everyone to slow down around the time they reach the end game; and a positive feedback loop can be relative, where you gain power when you’re in the lead. However, if we understand the game design purpose that is served by feedback loops, we’ll see why positive feedback is usually independent of the opponents, while negative feedback is usually dependent.

The purpose of feedback loops in game design

The primary purpose of positive feedback is to get the game to end quickly. Once a winner has been decided and a player is too far ahead, you don’t want to drag it out because that wastes everyone’s time. Because of this, you want all players on an accelerating curve in the end game. It doesn’t really matter who is ahead; the purpose is to get the game to end, and as long as everyone gets more power, it will end faster.

By contrast, the primary purpose of negative feedback is to let players who are behind catch up, so that no one ever feels like they are in a position where they can’t possibly win. If everyone is slowed down in exactly the same fashion in the endgame, that doesn’t fulfill this purpose; someone who was behind at the beginning can still be behind at the end, and even though the gap appears to close, they are slowed down as much as anyone else. In order to truly allow those who are behind to catch up, the game has to be able to tell the difference between someone who is behind and someone who is ahead.

Power curves

So, what does a player’s power curve look like in a PvP game? Here are a few ways you might see a player’s power gain (or loss) over time:

  • In a typical positive-sum game, each player is gaining power over time in some way. That might be on an increasing, linear, or decreasing curve.
  • In a positive-sum game with positive feedback, the players are gaining power over time and the more power they gain, the more they have, so it’s an increasing curve (such as a triangular or exponential gain in power over time) for each player. If you subtract one player’s curve from another (which shows you who is ahead, who is behind, and how often the lead is changing), usually what happens is one player gets an early lead and then keeps riding the curve to victory, unless they make a mistake along the way. Such a game is usually not that interesting for the players who are not in the lead.
  • In a positive-sum game with negative feedback, the players are still on an increasing curve, but that curve is altered by the position of the other players, reducing the gains for the leader and increasing them further for whoever’s behind, so if you look at all the players’ power curves simultaneously you’ll see a sort of tangled braid where the players are constantly overtaking one another. Subtracting one player’s power from another over time, you’ll see that the players’ relative power swings back and forth, which is pretty much how any negative feedback should work.
  • In a typical zero-sum game, players take power from each other, and the sum of all player power is a constant. In a two-player game, that means you could derive either player’s power curve just by looking at the other one.
  • In a zero-sum game with positive feedback, the game may end quickly as one player takes an early advantage and presses it to gain even more of an advantage, taking all of the power from their opponents quickly. Usually games that fall into this category also have some kind of early-game negative feedback built in to prevent the game from coming to an end too early, unless the game is very short.
  • In a zero-sum game with negative feedback, we tend to see swings of power that pull the leader back to the center. This keeps the players close, but also makes it very hard for a single player to actually win; if the negative feedback is too strong, you can easily end in a stalemate where neither player can win, which tends to be unsatisfying. A typical design pattern for zero-sum games in particular is to have some strong negative feedback mechanisms in the early game that taper off towards the end, while positive feedback increases towards the end of the game. This can end up as a pretty exciting game of back-and-forth where each player spends some time in the lead before one final, spectacular, irreversible triumph that brings the game to a close.
  • In a typical negative-sum game, the idea is generally not for a player to acquire enough power to win, but rather for a player to lose the least power relative to their opponents. In negative-sum games, players are usually eliminated when they lose most or all of their power, and the object is either to be the last one eliminated, or to be in the best standing when the first opponent is eliminated. A player’s power curve might be increasing, decreasing or constant, sort of an inverse of the positive-sum game, and pretty much everything else looks like a positive-sum game turned upside down.
  • In a negative-sum game with positive feedback, players who are losing will lose even faster. The more power a player has left over, the slower they’ll tend to lose it, but once they start that slide into oblivion it happens more and more rapidly.
  • In a negative-sum game with negative feedback, players who are losing will lose slower, and players who have more power tend to lose it faster, so again you’ll see this “braid” shape where the players will chase each other downward until they start crashing.

Applied power curves

Now, maybe you can visualize what a power curve looks like in theory, showing how a player’s power goes up or down over time… but how do you actually make one for a real game?

The easiest way to construct a power curve is through playtest data. The raw numbers can easily allow you to chart something like this. The hardest part is coming up with some kind of numeric formula for “power” in the game: how well a player is actually doing in absolute terms. This is easier with some games than others. In a miniatures game like HeroClix or Warhammer 40K, each figurine you control is worth a certain number of points, so it is not hard to add your points together on any given turn to get at least a rough idea of where each player stands. In a real-time strategy game like Starcraft, adding a player’s current resources along with the resource costs of all of their units and structures would also give a reasonable approximation of their power over time. For a game like Chess where you have to balance a player’s remaining pieces, board position and tempo, this is a bit trickier. But once you have a “power formula” you can simply track it for all players over time through repeated playtests to see what kinds of patterns emerge.

Ending the game

One of the really important things to notice as you do this is the amount of time it takes to reach a certain point. You want to scale the game so that it ends in about as much time as you want it to.

The most obvious way to do that is by hard-limiting time or turns which guarantees a specific game length (“this game ends after 4 turns”); sometimes this is necessary and even compelling, but a lot of times it’s just a lazy design solution that says “we didn’t playtest this enough to know how long the game would take if you actually played to a satisfying conclusion.”

An alternative is to balance your progression mechanics to cause the game to end within your desired range. You can do this by changing the nature of how positive or negative sum your game is (that is, the base rate of combined player power gain or loss), or by adding, removing, strengthening or weakening your feedback loops. This part is pretty straightforward, if you collect all of the numbers that you need to analyze it. For example, if you take an existing positive feedback loop and make the effect stronger, the game will probably end earlier, so that is one way to shorten the game.

Game phases

I should note that some PvP games have well-defined transitions between different game phases. The most common pattern here is a three-phase structure where you have an early game, a mid-game and an endgame, as made famous by Chess –which has many entire books devoted to just a single phase of the game. If you become aware of these transitions (or if you design them into the game explicitly), you don’t just want to pay attention to the player power curve throughout the game, but also how it changes in each phase, and the relative length of each phase.

For example, a common finding in game prototypes is that the endgame isn’t very interesting and is mostly a matter of just going through the motions to reach the conclusion that you already arrived at in mid-game. To fix this, you might think of adding new mechanics that come into play in the endgame to make it more interesting. Or, you might try to find ways to either extend the mid-game or shorten the endgame by adjusting your feedback loops and the positive, negative or zero-sum nature of your game during different phases.

Another common game design problem is a game that’s great once the players ramp up in mid-game, but the early game feels like it starts too slowly. One way to fix this is to add a temporary positive-sum nature to the early game in order to get the players gaining power and into the mid-game quickly.

In some games, the game is explicitly broken into phases as part of the core design. One example is the board game Shear Panic, where the scoring track is divided into four regions, and each region changes the rules for scoring which gives the game a very different feel in each of the game’s four phases. In this game, you transition between phases based on the number of turns taken in the game, so the length of each phase is dependent on how many turns each player has had. In a game like this, you could easily change the length of time spent in each phase by just increasing the length of that phase, so it lasts more turns.

Other games have less sharp transitions between different phases, and those may not be immediately obvious or explicitly designed. Chess is one example I’ve already mentioned. Another is Netrunner, an asymmetric CCG where one player (the Corporation) is trying to put cards in play and then spend actions to score the points on those cards, and the other player (the Runner) is trying to steal the points before they’re scored. After the game had been released, players at the tournament level realized that most games followed three distinct phases: the early game when the Runner is relatively safe from harm and could try to steal as much as possible; then the mid-game when the Corporation sets up its defenses and makes it prohibitively expensive, temporarily  for the Runner to steal anything; and finally the endgame where the Runner puts together enough resources to break through the Corporation’s defenses to steal the remaining points needed for the win. Looked at in this way, the Corporation is trying to enter the second phase as early in the game as possible and is trying to stretch it out for as long as possible, while the Runner is trying to stay in the first phase as long as it can and then if it doesn’t win early, the Runner tries to transition from the second phase into the endgame before the Corporation has scored enough to win.

How do you balance the progression mechanics in something like this? One thing you can do, as was done with Netrunner, is to put the progression of the game under partial control of the players, so that it is the players collectively trying to push the game forward or hold it back. That creates an interesting meta-level of strategic tension.

Another thing you can do is include some mechanics that actually have some method of detecting what phase you’re in, or at the very least, that tend to work a lot better in some phases than others. Netrunner does this as well; for example, the Runner has some really expensive attack cards that aren’t so useful early on when they don’t have a lot of resources, but that help it greatly to end the game in the final phase. In this way, as players use new strategies in each phase, it tends to give the game a very different feel and offers new dynamics as the game progresses. And then, of course, you can use some of these mechanics specifically to adjust the length of each phase in order to make the game progress at the rate you desire. In Netrunner, the Corporation has some cheap defenses it can throw up quickly to try to transition to the mid-game quickly, and it also has more expensive defenses it can use to put up a high bar that the Runner has to meet in order to transition to the endgame. By adjusting both the relative and absolute lengths of each phase in the game, you can make sure that the game takes about as long as you want it to, and also that it is broken up into phases that last a good amount of time relative to each other.

Ideal game length

All of this assumes you know how long the game (and each phase of the game) should take, but how do you know that? Part of it depends on target audience: young kids need short games to fit their attention span. Busy working adults want games that can be played in short, bite-sized chunks of time. Otherwise, I think it depends mainly on the level and depth of skill: more luck-based or casual games tend to be shorter, while deeper strategic games can be a bit longer. Another thing to consider is at what point a player is far enough ahead that they’ve essentially won: you want this point to happen just about the time when the game actually ends so it doesn’t drag on.

For games that never end, like MMOs or Facebook games, you can think of the elder game as a final infinite-length “phase” of the game, and you’ll want to change the length of the progression portion of your game so that the transition happens at about the time you want it to. How long that is depends on how much you want to support the progression game versus the elder game. For example, if your progression game is very different from your elder game and you see a lot of “churn” (that is, a lot of players that leave the game) when they hit the elder game, and you’re using a subscription-based model where you want players to keep their accounts for as long as possible, you’ll probably want to do two things: work on softening the transition to elder game so you lose fewer people, and also find ways of extending the early game (such as issuing expansion sets that raise the level cap, or letting players create multiple characters with different race/class combinations so they can play through the progression game multiple times).

Another interesting case is story-based RPGs, where the story often outlasts the mechanics of the game. We see this all the time with console RPGs, where it says “100 hours of gameplay” right on the box. And on the surface that sounds like the game is delivering more value, but in reality if you’re just repeating the same tired old mechanics and mindlessly grinding for 95 of those hours, all the game is really doing is wasting your time. Ideally you want the player to feel like they’re progressing through learning new mechanics and progressing through the story at any given time; you don’t want the gameplay to drag on any more than you want filler plot that makes the story feel like it’s dragging on. These kinds of games are challenging to design because you want to tune the length of the game to match both story and gameplay, and often that either means lengthening the story or adding more gameplay, both of which tend to be expensive in development. (You can also shorten the story or remove depth from the gameplay, but when you’ve got a really brilliant plot or really inspired mechanics it can be hard to rip that out of the game just to save a few bucks; also, in this specific case there’s often a consumer expectation that the game is pretty long to give it that “epic” feel, so the tendency is to just keep adding on one side or the other.)

Flow Theory, Revisited

With all that said, let’s come back to flow. At the start of this blog post, I said there were two problems here that needed to be solved. One is that the player skill is increasing throughout the game, which tends to shift them from being in the flow to being bored. This is mostly a problem for longer PvE games, where the player has enough time and experience in the game to genuinely get better.

The solution, as we’ve seen when we talked about PvE games, is to have the game compensate by increasing its difficulty through play in order to make the game seem more challenging – this is the essence of what designers mean when they talk about a game’s “pacing.” For PvP games, in most cases we want the better player to win, so this isn’t seen as much of a problem; however, for games where we want the less-skilled player to have a chance and the highly-skilled player to still be challenged, we can implement negative feedback loops and randomness to give an extra edge to the player who is behind.

There was another problem with flow that I mentioned, which is that you can design your game at one level of difficulty, but players come to your game with a range of initial difficulty levels, and what’s easy for one player is hard for another.

With PvE games, as you might guess, the de facto standard is to implement a series of difficulty levels, with higher levels granting the AI power-based bonuses or giving the player fewer power-based bonuses, because that is relatively cheap and easy to design and implement. However, I have two cautions here:

  1. If you keep using the same playtesters they will become experts at the game, and thus unable to accurately judge the true difficulty of “easy mode”; easy should mean easy and it’s better to err on the side of making it too easy, than making it challenging enough that some players will feel like they just can’t play at all. The best approach is to use a steady stream of fresh playtesters throughout the playtest phase of development (these are sometimes referred to as “Kleenex testers” because you use them once then throw them away). If you don’t have access to that many testers, at least reserve a few of them for the very end of development, when you’re tuning the difficulty level of “easy.”
  2. Take care to set player expectations up front about higher difficulties, especially if the AI actually cheats. If the game pretends on the surface to be a fair opponent that just gets harder because it is more skilled, and then players find out that it’s actually peeking at information that’s supposed to be hidden, it can be frustrating. If you’re clear that the AI is cheating and the player chooses that difficulty level anyway, there are less hurt feelings: the player is expecting an unfair challenge and the whole point is to beat that challenge anyway. Sometimes this is as simple as choosing a creative name for your highest difficulty level, like “Insane.”

There are, of course, other ways to deal with differing player skill levels. Higher difficulty levels can actually increase the skill challenge of the game instead of the power challenge. Giving enemies a higher degree of AI, as I said before, is expensive but can be really impressive if pulled off correctly. A cheaper way to do this in some games is simply to modify the design of your levels by blocking off easier alternate paths, forcing the player to go through a harder path to get to the same end location when they’re playing at higher difficulty.

Then there’s Dynamic Difficulty Adjustment (DDA), which is a specialized type of negative feedback loop where the game tries to figure out how the player is doing and then adjusts the difficulty on the fly. You have to be very careful with this, as with all negative feedback loops, because it does punish the player for doing well and some players will not appreciate that if it isn’t set up as an expectation ahead of time.

Another way to do this is to split the difference, by offering dynamic difficulty changes under player control. Like DDA, try to figure out how the player is doing… but then, give the player the option of changing the difficulty level manually. One example of this is the game flOw, where the player can go to the next more challenging level or the previous easier level at just about any time, based on how confident they are in their skills. Another example, God of War did this and probably some other games as well, is if you die enough times on a level it’ll offer you the chance to drop the difficulty on the reload screen (which some players might find patronizing, but on the other hand it also gives the player no excuse if they die again anyway). Sid Meier’s Pirates actually gives the player the chance to increase the difficulty when they come into port after a successful mission, and actually gives the player an incentive: a higher percentage of the booty on future missions if they succeed.

The equivalent in PvP games is a handicapping system, where one player can start with more power or earn more power over the course of the game, to compensate for their lower level of skill. In most cases this should be voluntary, though; players entering a PvP contest typically expect the game to be fair by default.

Case Studies

With all of that said, let’s look at a few examples to see how we can use this to analyze games in practice.

Space Invaders (and other retro-arcade games)

This game presents the same skill-based challenge to you, wave after wave, and increasing the skill by making the aliens move and shoot faster and start lower. The player has absolutely no way to gain power in the game; you start with three lives and that’s all you get, and there are no powerups. On the other hand, you also don’t really lose power in the game, in a sense: whether you have one life remaining or all three, your offensive and defensive capabilities are the same. The player’s goal is not to win, but to survive as long as possible before the increasing challenge curve overwhelms them. Interestingly, the challenge curve does change over the course of a wave; early on there are a lot of aliens and they move slowly, so it’s very easy to hit a target. Later on you have fewer targets and they move faster, which makes the game more challenging, and of course if they ever reach the ground you lose all of your lives which makes this a real threat. Then the next wave starts and it’s a little harder than the last one, but the difficulty is still decreased initially. (You’d think that there would also be a tradeoff in that fewer aliens would have less firepower to shoot at you, but in the actual game I think the overall rate of fire was constant, it’s just how spread out it is, so this didn’t actually change much as each round progressed.)

Chess (and other squad-based wargames)

If I had to put Chess in a genre, yes, I’d call a squad-based wargame… which is kind of odd since we normally think of this as an entire army and not a squad. I mean this in the sense that you start with a set force and generally do not receive reinforcements, nor do you have any mechanics about resources, production, supply or logistics which you tend to see in more detailed army-level games.

Here, we are dealing with a negative-sum game that actually has a mild positive-feedback loop built into it: if you’re ahead in pieces, trades tend to be beneficial to you (other things being equal), and once you reach the endgame certain positions let you basically take an automatic win if you’re far enough ahead. This can be pretty demoralizing for the player who is losing, especially if you have two players with extremely unequal skill at the game, because they will tend to start losing early and just keep losing more and more as the game goes on.

The only reason this works is because against two equally-skilled players, there does tend to be a bit of back-and-forth as players trade off pieces for board control or tempo, so a player that appears to be losing has a number of opportunities to turn that around later before the endgame. Against well-matched opponents you will tend to see a variable rate of decrease as they trade pieces, based on how well they are playing, and if they play about as well we’ll see a game where the conclusion is uncertain until the endgame (and even then, if the players are really well-matched, we’ll see a stalemate).

Settlers of Catan (and other progression-based board games)

Here is a game where progression is gaining power, so it is positive-sum. There are only very limited cases where players can actually lose their progress; mostly, when you build something, that gain is permanent. Catan contains a pretty powerful positive feedback loop, in that building more settlements and cities gives you more resources which lets you build even more, and building is the primary victory condition. At first you’d think this means that the first player to get an early-game advantage automatically wins, and if players couldn’t trade with each other that would almost certainly be the case. The ability to trade freely with other players balances this aspect of the game, as trading can be mutually beneficial to both players involved in the trade; if players who are behind trade fairly with each other and refuse to trade with those in the lead at all (or only at exorbitant rates of exchange), they can catch up fairly quickly. If I were to criticize this game at all, it would be that the early game doesn’t see a lot of rapid progression because the players aren’t generating that many resources yet – and in fact, other games in the series fix this by giving players extra resources in the early game.

Mario Kart (and other racing games)

Racing games are an interesting case, because players are always progressing towards the goal of the finish line. Most racing video games include a strong negative feedback loop that keeps everyone feeling like they still have a chance, up to the end – usually through some kind of “rubber-banding” technique that causes the computer-controlled cars to speed up or slow down based on how the players are doing. Games like Mario Kart take this a step further, offering pickups that are weighted so that if you’re behind, you’re likely to get something that lets you catch up, while if you’re ahead you’ll get something less valuable, making it harder for you to fight for first. On the one hand, this provides an interesting tension: players in the lead know that they just have to keep the lead for a little bit longer, while players who are behind realize that time is running out and they have to close the gap quickly. On the other hand, the way most racing games do this feels artificial to a lot of players, because it feels like a player’s standing in the race is always being modified by factors outside their control. Since the game’s length is essentially capped by the number of laps, the players are trying to exchange positions before the race ends, so you get an interesting progression curve where players are all moving towards the end at about the same rate.

Notice that this is actually the same progression pattern as Catan: both games are positive-sum with negative feedback. And yet, it feels very different as a player. I think this is mostly because in Catan, the negative feedback is under player control, while in Mario Kart a lot of it is under computer control.

Interestingly, this is also the same pattern in stock car racing in real life. In auto racing, there’s also a negative feedback loop, but it feels a lot more fair: the person in the lead is running into a bunch of air resistance so they’re burning extra fuel to maintain their high speed, which means they need more pit stops; meanwhile, the people drafting behind them are much more fuel-efficient and can take over the lead later. This isn’t arbitrary, it’s a mechanic that affects all players equally, and it’s up to each driver how much of a risk they want to take by breaking away from the pack. So again, this is something that feels more fair because the negative feedback is under player control.

Final Fantasy (and other computer/console RPGs)

In these games, the player is mostly progressing through the game by increasing their power level more than their skill level. Older games on consoles like NES tended to be even more based in stats and less in skill than today’s games (i.e. they required a lot more grinding than players will put up with today). Most of these games made today do give the player more abilities as they progress through the experience levels, giving them more options and letting them increase their tactical/strategic skills. Progression and rewards also come from plot advancement and reaching new areas. Usually these games are paced on a slightly increasing curve, where each area takes a little more time than the last. As we discussed in an earlier week, there’s usually a positive feedback loop in that winning enough combats lets you level up, which in turn makes it easier for you to win even more combats, and that is counteracted by the negative feedback loop that your enemies also get stronger, and that you need more and more victories to level up again if you stay in the same area too long, which means the actual gains are close to linear.

World of Warcraft (and other MMORPGs)

Massively multiplayer online games have a highly similar progression to CRPGs, except they then transition at the end to this elder game state, and at that point the concept of “progression” loses a lot of meaning. So our analysis looks much the same as it does with the more traditional computer RPGs, up until that point.

Nethack (and other roguelikes)

There are the so-called “rogue-like” games, which are kind of like this weird fusion with the leveling-up and stat-based progression of an RPG, and the mercilessness of a retro arcade game. A single successful playthrough in Nethack looks similar to that of an RPG, with the player gaining power to meet the increasing level of challenge in the game, but actually reaching the level of skill to complete the game takes much, much longer. If you’ve never played these games, one thing you should know is that most of them have absolutely no problem killing you dead if you make the slightest mistake. And when I say “dead,” I mean they will literally delete your save file, permanently, and then you have to start over from scratch with a new character. So, like an arcade game, the player’s goal is to stay alive as long as possible and progress as far as possible, so progress is both a reward and a measure of skill. While there is a win condition, a lot of players simply never make it that far; keep in mind that taking a character all the way from the start to the end of the game may take dozens of hours, similar to a modern RPG, but with a ton of hard resets that collectively raise the player’s skill as they die (and then learn how not to die that way in the future).

Therefore, Nethack looks like an increasing player power and power/skill challenge over the course of a single playthrough… but over a player’s lifetime, you see these repeated increases punctuated by total restarts, with a slowly increasing player skill curve over time that lets the player survive for longer in each successive game.

FarmVille (and other social media games)

If roguelikes are the harsh sadists of the game design world, then the cute fluffy bunnies of the game world would be progression-based “social media” games like FarmVille. These are positive-sum games where you basically click to progress, it’s nearly impossible to lose any progress at all, and you’re always gaining something. More player skill simply means you progress at a faster rate. Eventually, you transition to the elder game, but from most of the games I’ve played this is a more subtle transition than with MMOs. FarmVille doesn’t have a level cap that I know of (if it does have one, it’s so ridiculously high that most people will never see it), it’ll happily let you keep earning experience and leveling up… although after a certain point you don’t really get any interesting rewards for doing so. But after awhile, the reward loop starts rewarding you less and less frequently, you finish earning all the ribbons or trophies or achievements or whatever, so it’s not that you can’t progress any further, but that the game doesn’t really reward you for progression as much, so at some point the player decides that further progression just isn’t worth it to them, and they either stop playing or they start playing in a different way. If they start playing differently that’s where the elder game comes in. Interestingly, the player’s actions in the elder game still do cause progression.

If You’re Working On a Game Now…

If you’re working on a game right now, and that game has progression mechanics, I want you to ask yourself some design questions about the nature of that progression:

  • What is the desired play length of this game? Why? Really challenge yourself here – could you justify a reason to make it twice as long or half as long? If your publisher (or whoever) demanded that you do so anyway, what else would need to change about the nature of the game in order to compensate?
  • Does the game actually play that long? How do you know?
  • If the game is split up into phases, areas, locations or whatever, how long are they? Do they tend to get longer over time, or are there some that are a lot longer (or shorter) than those that come immediately before or after? Is this intentional? Is it justifiable?
  • Is your game positive-sum, negative-sum, or zero-sum? Do any phases of the game have positive or negative feedback loops? How do these affect total play time?

Homework

Back in week 2, your “homework” was to analyze a progression mechanic in a game. In particular, you analyzed the progression of player power to the power level of the game’s challenges, over time, in order to identify weak spots where the player would either go through an area too quickly because they’re too powerful by the time they get there, or they’re stuck grinding because they’re behind the curve and have to catch up

It’s time to revisit that analysis with what we now know about pacing. This week, I’d like you to analyze the reward structure in the game. Consider all kinds of rewards: power-based, level progression, and plot advancement, and anything else you can identify as it applies to your game:

  • Which of these is truly random (such as randomized loot drops), and which of them only seem random to the player on their first playthrough (they’re placed in a specific location by the level designer, but the player has no way of knowing ahead of time how quickly they’ll find these things)… and are there any rewards that happen on a fixed schedule that’s known to the player?
  • How often do rewards happen? Do they happen more frequently at the beginning? Are there any places where the player goes a long time with relatively few rewards? Do certain kinds of rewards seem to happen more frequently than others, or at certain times?

Now, look back again at your original predictions, where you felt that the game either went too quickly, or more likely seemed to drag on forever, at certain points (based on your gut reaction and memory). Do these points coincide with more or fewer rewards in the game? Now ask yourself if the problem could have been solved by just adding a new power-up at a certain place, instead of fixing the leveling or progression curve.

Level 6: Situational Balance

August 11, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Question

If you want to check your answer from last week:

Analyzing Card Shuffles

For a 3-card deck, there are six distinct shuffling results, all equally likely. If the cards are A, B, and C, then these are: ABC, ACB, BAC, BCA, CAB, CBA. Thus, for a truly random shuffler, we would expect six outcomes (or a multiple of six), with each of these results being equally likely.

Analyzing Algorithm #1:

First you choose one of three cards in the bottom slot (A, B, or C). Then you choose one of the two remaining cards to go in the middle (if you already chose A for the bottom, then you would choose between B and C). Finally, the remaining card is put on top (no choice involved). These are separate, (pseudo)random, independent trials, so to count them we multiply: 3x2x1 = 6. If you actually go through the steps to enumerate all six possibilities, you’ll find they correspond to the six outcomes above. This algorithm is correct, and in fact is one of the two “standard” ways to shuffle a deck of cards. (The other algorithm is to generate a pseudorandom number for each card, then put the cards in order of their numbers. This second method is the easiest way to randomly order a list in Excel, using RAND(), RANK() and VLOOKUP().)

Analyzing Algorithm #2:

First of all, if a single shuffle is truly random, then repeating it 50 times is not going to make it any more random, so this is just a waste of computing resources. And if the shuffle isn’t random, then repeating may or may not make it any better than before, and you’d do better to fix the underlying algorithm rather than covering it up.

What about the inner loop? First we choose one of the three cards to go on bottom, then one of the three to go in the middle, and then one of the three to go on top. As before these are separate independent trials, so we multiply 3x3x3 = 27.

Immediately we know there must be a problem, since 6 does not divide evenly into 27. Therefore, without having to go any further at all, we know that some shuffles must be more likely than others. So it would be perfectly valid to stop here and declare this algorithm “buggy.” If you’re sufficiently determined, you could actually trace through this algorithm all 27 times to figure out all outcomes, and show which shuffles are more or less likely and by how much. A competitive player, upon learning the algorithm, might actually run such a simulation for a larger deck in order to gain a slight competitive advantage.

This Week

This is a special week. We spent two weeks near the start of the course talking about balancing transitive games, and then two more weeks talking about probability. This week we’re going to tie the two together, and put a nice big bow on it.

This week is about situational balancing. What is situational balancing? What I mean is that sometimes, we have things that are transitive, sort of, but their value changes over time or depends on the situation.

One example is area-effect damage. You would expect something that does 500 damage to multiple enemies at once is more valuable than something that does 500 damage just to a single target, other things being equal. But how much more valuable is it? Well, it depends. If you’re only fighting a single enemy, one-on-one, it isn’t any more valuable. If you’re fighting fifty enemies all clustered together in a swarming mass, it’s 50x more valuable. Maybe at some points in your game you have swarms of 50 enemies, and other times you’re only fighting a single lone boss. How do you balance something like that?

Or, consider an effect that depends on what your opponent does. For example, there’s a card in Magic: the Gathering called Karma, that does 1 damage to your opponent each turn for each of their Swamps in play. Against a player who has 24 Swamps in their deck, this single card can probably kill them very dead, very fast, all on its own. Against a player with no Swamps at all, the card is totally worthless. (Well, it’s worthless unless you have other cards in your deck that can turn their lands into Swamps, in which case the value of Karma is dependent on your ability to combine it with other card effects that you may or may not draw.) In either case, the card’s ability to do damage changes from turn to turn and game to game.

Or, think of healing effects in most games, which are completely worthless if you’re fully healed already, but which can make the difference between winning and losing if you’re fighting something that’s almost dead, and you’re almost dead, and you need to squeeze one more action out of the deal to kill it before it kills you.

In each of these cases, finding the right cost on your cost curve depends on the situation within the game, which is why I call it situational balancing. So it might be balanced, or underpowered or overpowered, all depending on the context. How do we balance something that has to have a fixed cost, even though it has a benefit that changes? The short answer is, we use probability to figure out the expected value of the thing, which is why I’ve spent two weeks building up to all of this. The long answer is… it’s complicated, which is why I’m devoting an entire tenth of this course to the subject.

Playtesting: The Ultimate Solution?

There are actually a lot of different methods of situational balancing. Unfortunately, since the answer to “how valuable is it?” is always “it depends!” the best way to approach this is thorough playtesting to figure out where various situations land on your cost curve. But as before, we don’t always have unlimited playtest budgets in the Real World, and even if we do have unlimited budgets we still have to start somewhere, so we at least need to make our best guess, and there are a few ways to do that. So “playtest, playtest, playtest” is good advice, and a simple answer, but not a complete answer.

A Simple Example: d20

Let’s start with a very simple situation. This is actually something I was asked once on a job interview (and yes, I got the job), so I know it must be useful for something.

What follows is a very, very oversimplified version of the d20 combat system, which was used in D&D 3.0 and up. Here’s how it works: each character has two stats, their Base Attack Bonus (or “BAB,” which defaults to 0) and their Armor Class (or “AC,” which defaults to 10). Each round, each character gets to make one attack against one opponent. To attack, they roll 1d20, add their BAB, and compare the total to the target’s AC. If the attacker’s total is greater or equal, they hit and do damage; otherwise, they miss and nothing further happens. So, by default with no bonuses, you should be hitting about 55% of the time.

Here’s the question: are BAB and AC balanced? That is, if I gave you an extra +1 to your attack, is that equivalent to +1 AC? Or is one of those more powerful than the other? If I were interviewing you for a job right now, what would you say? Think about it for a moment before reading on.

What’s the central resource?

Here’s my solution (yours may vary). First, I realized that I didn’t know how much damage you did, or how many hit points you had (that is, how many times could you survive being hit, and how many times would you have to hit something else to kill it). But assuming these are equal (or equivalent), it doesn’t actually matter. Whether you have to hit an enemy once or 5 times or 10 times to kill it, as long as you are equally vulnerable, you’re going to hit the enemy a certain percentage of the time. And they’re going to hit you a certain percentage of the time. What it comes down to is this: you want your hit percentage to be higher than theirs. Hit percentage is the central resource that everything has to be balanced against.

If both me and the enemy have a 5% chance of hitting each other, on average we’ll both hit each other very infrequently. If we both have a 95% chance of hitting each other, we’ll hit each other just about every turn. But either way, we’ll exchange blows about as often as not, so there’s no real advantage to one or the other.

Using the central resource to derive balance

So, are AC and BAB balanced? +1 BAB gives me +5% to my chance to hit, and +1 AC gives me -5% to my opponent’s chance to hit, so if I’m fighting against a single opponent on my own, one on one, the two are indeed equivalent. Either way, our relative hit percentages are changed by exactly the same amount. (One exception is if either hit percentage goes above 100% or below 0%, at which point extra plusses do nothing for you. This is probably why the default is +0 BAB, 10 AC, so that it would take a lot of bonuses and be exceedingly unlikely to ever reach that point. Let’s ignore this extreme for the time being.)

What if I’m not fighting one on one? What if my character is alone, and there are four enemies surrounding me? Now I only get to attack once for every four times the opponents attack, so +1 AC is much more powerful here, because I’m making a roll that involves my AC four times as often as I make a roll involving my BAB.

Or what if it’s the other way around, and I’m in a party of four adventurers ganging up on a lone giant? Here, assuming the giant can only attack one of us at a time, +1 BAB is more powerful because each of us is attacking every round, but only one of us is actually getting attacked.

In practice, in most D&D games, GMs are fond of putting their adventuring party in situations where they’re outnumbered; it feels more epic that way. (This is from my experience, at least.) This means that in everyday use, AC is more powerful than BAB; the two stats are not equivalent on the cost curve, even though the game behaves like they should be.

Now, as I said, this is an oversimplification; it does not reflect on the actual balance of D&D at all. But we can see something interesting even from this very simple system: the value of attacking is higher if you outnumber the opponent, and the value of defending is higher if you’re outnumbered. And if we attached numerical cost and benefit values to hit percentage, we could even calculate how much more powerful these values are, as a function of how much you outnumber or are outnumbered.

Implications for Game Design

If you’re designing a game where you know what the player will encounter ahead of time – say, an FPS or RPG with hand-designed levels – then you can use your knowledge of the upcoming challenges to balance your stats. In our simplified d20 system, for example, if you know that the player is mostly fighting combats where they’re outnumbered, you can change AC on your cost curve to be more valuable and thus more costly.

Another thing you can do, if you wanted AC and BAB to be equivalent and balanced with each other, is to change the mix of encounters in your game so that the player is outnumbered about half the time, and they outnumber the enemy about half the time. Aside from making your stats more balanced, this also adds some replay value to your game: going through the game with high BAB is going to give a very different experience than going through the game with a high AC; in each case, some encounters are going to be a lot harder than others, giving the player a different perspective and different level of challenge in each encounter.

The Cost of Switching

What if D&D worked in such a way that you could freely convert AC to BAB at the start of a combat, and vice versa? Now all of a sudden they are more or less equivalent to each other, and suddenly a +1 bonus to either one is much more powerful and versatile relative to any other bonuses in the rest of the game.

Okay, maybe you can’t actually do that in D&D, but there are plenty of games where you can swap out one situational thing for another. First-Person Shooters are a common example, where you might be carrying several weapons at a time: maybe a rocket launcher against big slow targets or clusters of enemies, a sniper rifle to use at a distance against single targets, and a knife for close-quarters combat. Each of these weapons is situationally useful some of the time, but as long as you can switch from one to another with minimal delay, it’s the sum of weapon capabilities that matters rather than individual weapon limitations.

That said, suppose we made the cost of switching weapons higher, maybe a 10-second delay to put one weapon in your pack and take out another (which when you think about it, seems a lot more realistic – I mean, seriously, if you’re carrying ten heavy firearms around with you and can switch without delay, where exactly are you carrying them all?). Now all of a sudden the limitations of individual weapons play a much greater role, and a single general-purpose weapon may end up becoming more powerful than a smorgasbord of situational weapons. But if instead you have instant real-time weapon change, a pile of weapons where each is the perfect tool for a single situation is much better than a single jack-of-all-trades, master-of-none weapon.

What’s the lesson here? We can mess with the situational balance of a game simply by modifying the cost of switching between different tools, weapons, stat distributions, or overall strategies.

That’s fine as a general theory, but how do the actual numbers work? Let’s see…

Example: Inability to Switch

Let’s take one extreme case, where you can’t switch strategies at all. An example might be an RPG where you’re only allowed to carry one weapon and equip one armor at a time, and whenever you acquire a new one it automatically gets rid of the old. Here, the calculation is pretty straightforward, because this is your only option, so we have to look at it across all situations. It’s a lot like an expected value calculation.

So, you ask: in what situations does this object have a greater or lesser value, and by how much? How often do you encounter those situations? Multiply and add it all together.

Here’s a simple, contrived example to illustrate the math: suppose you have a sword that does double damage against Dragons. Suppose 10% of the meaningful combats in your game are against Dragons. Let’s assume that in this game, damage has a linear relationship to your cost curve, so doubling the damage of something makes it exactly twice as good.

So, 90% of the time the sword is normal, 10% of the time it’s twice as good. 90%*1.0 + 10%*2.0 = 110% of the cost. So in this case, “double damage against dragons” is a +10% modifier to the base cost.

Here’s another example: you have a sword that is 1.5x as powerful as the other swords in its class, but it only does half damage against Trolls. And let’s further assume that half damage is actually a huge liability; it takes away your primary way to do damage, so you have to rely on other sources that are less efficient, and it greatly increases the chance you’re going to get yourself very killed if you run into a troll at a bad time. So in this case, let’s say that “half damage” actually makes the sword a net negative. But let’s also say that trolls are pretty rare, maybe only 5% of the encounters in the game are against trolls.

So if a typical sword at this level has a benefit of 100 (according to your existing cost curve), a 1.5x powerful sword would have a benefit of 150, and maybe a sword that doesn’t work actually has a cost of 250, because it’s just that deadly to get caught with your sword down, so to speak. The math says: 95%*150 + 5%*(-250) = 130. So this sword has a benefit of 130, or 30% more than a typical sword.

Again, you can see that there are actually a lot of ways you can change this, a lot of design “knobs” you can turn to mess with the balance here. You can obviously change the cost and benefit of an object, maybe adjusting the damage in those special situations to make it better or worse when you have those rare cases where it matters, or adjusting the base abilities to cover every other situation, just as you normally would with transitive mechanics. But with situational balance, you can also change the frequency of situations, say by increasing the number of trolls or dragons the player encounters – either in the entire game, or just in the area surrounding the place where they’d get that special sword. (After all, if the player is only expected to use this sword that loses to trolls in one region in the game that has no trolls, even if the rest of the game is covered in trolls, it’s not really much of a drawback, is it?)

Another Example: No-Cost Switching

Now let’s take the other extreme, where you can carry as many situational objects as you want and use or switch between them freely. In this case, the limitations don’t matter nearly as much as the strengths of each object, because there is no opportunity cost to gaining a new capability. In this case, we look at the benefits of all of the player’s objects collected so far, and figure out what this new object will add that can’t be done better by something else. Multiply the extra benefit by the percentage of time that the benefit is used, and there is your additional benefit from the new object. So it is a similar calculation, except in most cases we ignore the bad parts, because you can just switch away from those.

In practice, it’s usually not that simple. In a lot of games, the player may be able to use suboptimal strategies if they haven’t acquired exactly the right thing for this one situation (in fact, it’s probably better for most games to be designed that way). Also, the player may pick up new objects in a different order on each playthrough. End result: you don’t actually know how often something will be used, because it might be used more or less often depending on what other tools the player has already acquired, and also how likely they are to use this new toy in situations where it’s not perfect (they haven’t got the perfect toy for that situation yet) but it’s at least better than their other toys.

Let’s take an example. Maybe you have a variety of swords, each of which does major extra damage against a specific type of monster, a slight bump in damage against a second type of monster, and is completely ineffective against a third type of monster. Suppose there are ten of these swords, and ten monster types in your game, and the monster types are all about as powerful and encountered about as frequently. It doesn’t take a mathematician to guess that these swords should all cost the same.

However, we run into a problem. Playing through this game, we would quickly realize that they do not actually give equal value to the player at any given point in time.

For example, say I’ve purchased a sword that does double damage against Dragons and 1.5x damage against Trolls. Now there’s a sword out there that does double damage against Trolls, but that sword is no longer quite as useful to me as it used to be; I’m now going from a 1.5x multiplier to 2x, not 1x to 2x, so there’s less of a gain there. If I fully optimize, I can probably buy about half the swords in the game and have at least some kind of improved multiplier against most or all monsters, and from that point, extra swords have diminishing returns.

How do you balance a system like that? There are a few methods for this that I could think of, and probably a few more I couldn’t. It all depends on what’s right for your game.

  • Give a discount: One way is to actually change costs on the fly. Work it into your narrative that the more swords you buy from this merchant, the more he discounts additional swords because you’re such a good customer (you could even give the player a “customer loyalty card” in the game and have the merchant put stamps on it; some players love that kind of thing).
  • Let the player decide: Or, you could balance everything assuming the player has nothing, which means that yes, there will be a law of diminishing returns here, and it’s up to the player to decide how many is enough; consider that part of the strategy of the game.
  • Let the increasing money curve do the “discount” work for you: Maybe if the player is getting progressively more money over time, keeping the costs constant will itself be a “diminishing” cost to compensate, since each sword takes the player less time to earn enough to buy it. Tricky!
  • Discount swords found later in the game: Or, you can spread out the locations in the game where the player gets these swords, so that you know they’ll probably buy certain ones earlier in the game and other ones later. You can then cost them differently because you know that when the player finds certain swords, they’ll already have access to other ones, and you can reduce the costs of the newer ones accordingly.

Obviously, for games where you can switch between objects but there’s some cost to switching (a time cost, a money cost, or whatever), you’ll use a method that lies somewhere between the “can’t change at all” and “can change freely and instantly” extreme scenarios.

The Cost of Versatility

Now we’ve touched on this concept of versatility from a player perspective, if they are buying multiple items in the game that make their character more versatile and able to handle more situations. What about when the objects themselves are versatile? This happens a lot in real-time and turn-based strategy games, for example, where individual units may have several functions. So, maybe archers are really strong against fliers and really weak against footmen (a common RTS design pattern), but maybe you want to make a new unit type who are strong against both fliers and footmen, but not as strong as archers. So maybe an archer can take down a flier and take next to no damage, but this new unit would go down to about half HP in combat with a flier (it would win, but at a cost). This new unit isn’t as good against fliers, but it is good for other things, so they’re more versatile.

Taking another example, in a first-person shooter, knives and swords are usually the best weapons when you’re standing next to an opponent, while sniper rifles are great from a distance, but a machine gun is moderately useful at most ranges (but not quite as good as anything else). So you’ll never get caught with a totally ineffective weapon if you’ve got a machine gun, but you’ll also never have the perfect weapon for the job if you’re operating at far or close range a lot.

How much are these kinds of versatility worth?

Here’s the key: versatility has value in direct proportion to uncertainty. If you know ahead of time you’re playing on a small map with tight corridors and lots of twists and turns, knives are going to be a lot more useful than sniper rifles. On a map with large, open space, it’s the other way around. If you have a single map with some tight spaces and some open areas, a versatile weapon that can serve both roles (even if only mediocre) is much more valuable.

Suppose instead you have a random map, so there’s a 50/50 chance of getting either a map optimized for close quarters or a map optimized for distance attacks. Now what’s the best strategy? Taking the versatile weapon that’s mildly useful in each case but not as good as the best weapon means you’ll win against people who guessed poorly and lose against people who guessed well. There is no best strategy here; it’s a random guess. This kind of choice is actually not very interesting: the players must choose blindly ahead of time, and then most of the game comes down to who guessed right. Unless they’re given a mechanism for changing weapons during play in order to adjust to the map, or they can take multiple weapons with them, or something – ah, so we come back to the fact that versatility comes in two flavors:

  • The ability of an individual game object to be useful in multiple situations
  • The ability of the player to swap out one game object for another.

The more easily a player can exchange game objects, the less valuable versatility in a single object becomes.

Shadow Costs

Now, before we move on with some in-depth examples, I want to write a little bit about different kinds of costs that a game object can have. Strictly speaking, I should have talked about this when we were originally talking about cost curves, but in practice these seem to come up more often in situational balancing than other areas, so I’m bringing it up now.

Broadly speaking, we can split the cost of an object into two categories: the resource cost, and Everything Else. If you remember when doing cost curves, I generally said that any kind of drawback or limitation is also a cost, so that’s what I’m talking about here. Economists call these shadow costs, that is, they are a cost that’s hidden behind the dollar cost. If you buy a cheap clock radio for $10, there is an additional cost in time (and transportation) to go out and buy the thing, and if it doesn’t go off one morning when you really need it to because the UI is poorly designed and you set it for PM instead of AM then missing an appointment because of the poor design costs you additional time and money. If it then breaks in a few months because of its cheap components and you have to go replace or return it then that is an extra time cost, and so on… so it looks like it costs $10 but the actual cost is more because it has these shadow costs that a better-quality clock radio might not have.

In games, there are two kinds of shadow costs that seem to come up a lot in situational balance: sunk costs and opportunity costs. Let me explain each.

Sunk Costs

By sunk costs, I’m talking about some kind of setup cost that has to be paid first, before you gain access to the thing you want to buy in the first place. One place you commonly see these is in tech trees in RTSs, MMOs and RPGs. For example, in an RTS, in order to build certain kinds of units, you first typically need to build a structure that supports them. The structure may not do anything practical or useful for you, other than allowing you to build a special kind of unit. As an example, each Dragoon unit in StarCraft costs 125 minerals and 50 gas (that is its listed cost), but you had to build a Cybernetics Core to build Dragoons and that cost 200 minerals, and that cost is in addition to each Dragoon. Oh, and by the way, you can’t build a Cybernetics Core without also building a Gateway for 150 minerals, so that’s part of the cost as well. So if you build all these structures, use them for nothing else, and then create a single Dragoon, that one guy costs you a total of 475 minerals and 50 gas, which is a pretty huge cost compared to the listed cost of the unit itself!

Of course, if you build ten Dragoons, then the cost of each is reduced to 160 minerals and 50 gas each, a lot closer to the listed cost, because you only have to pay the build cost for those buildings once (well, under most cases anyway). And if you get additional benefits from those buildings, like them letting you build other kinds of units or structures or upgrades that you take advantage of, then effectively part of the cost of those buildings goes to other things so you can consider it to not even be part of the Dragoon’s cost.

But still, you can see that if you have to pay some kind of cost just for the privilege of paying an additional cost, you need to be careful to factor that into your analysis. When the cost may be “amortized” (spread out) over multiple purchases, the original sunk cost has to be balanced based on its expected value: how many Dragoons do you expect to build in typical play? When costing Dragoons, you need to factor in the up-front costs as well.

You can also look at this the other way, if you’re costing the prerequisite (such as those structures you had to build in order to buy Dragoon units): not just “what does this do for me now” but also “what kinds of options does this enable in the future”? You tend to see this a lot in tech trees. For example, in some RPGs or MMOs with tech trees, you might see some special abilities you can purchase on level-up that aren’t particularly useful on their own, maybe they’re even completely worthless… but they’re prerequisites for some really powerful abilities you can get later. This can lead to interesting kinds of short-term/long-term decisions, where you could take a powerful ability now, or a less powerful ability now to get a really powerful ability later.

You can see sunk costs in other kinds of games, too. I’ve seen some RPGs where the player has a choice between paying for consumable or reusable items. The consumables are much cheaper of course, but you only get to use them once. So for example, you can either buy a Potion for 50 Gold, or a Potion Making Machine for 500 Gold, and in that case you’d buy the machine if you expect to create more than ten Potions. Or you pay for a one-way ticket on a ferry for 10 Gold, or buy a lifetime pass for 50 Gold, and you have to ask yourself whether you expect to ride the ferry more than five times. Or you consider purchasing a Shop Discount Card which gives 10% off all future purchases, but it costs you 1000 Gold to purchase the discount in the first place, so you have to consider whether you’ll spend enough at that shop to make the discount card pay for itself (come to think of it, the choice to buy one of those discount cards at the real-world GameStop down the street requires a similar calculation). These kinds of choices aren’t always that interesting, because you’re basically asking the player to estimate how many times they’ll use something… but without telling them how much longer the game is or how many times they can expect to use the reusable thing, so it’s a kind of blind decision. Still, as designers, we know the answer, and we can do our own expected-value calculation and balance accordingly. If we do it right, our players will trust that the cost is relative to the value by the time they have to make the buy-or-not decision in our games.

Opportunity Costs

The second type of hidden cost, which I’m calling an opportunity cost here, is the cost of giving up something else, reducing your versatility. An example, also from games with tech trees, might be if you reach a point where you have to choose one branch of the tech tree, and if you take a certain feat or learn a certain tech or whatever, it prevents you from learning something else. If you learn Fire magic, you’re immediately locked out of all the Ice spells, and vice versa. This happens in questing systems, too: if you don’t blow up Megaton, you don’t get the Tenpenny Tower quest. These can even happen in tabletop games: one CCG I worked on had mostly neutral cards, but a few that were “good guy” cards and a few that were “bad guy” cards, and if you played any “good guy” cards it prevented you from playing “bad guy” cards for the rest of the game (and vice versa), so any given deck basically had to only use good or bad but not both. Basically, any situation where taking an action in the game prevents you from taking certain other actions later on, is an opportunity cost.

In this case, your action has a special kind of shadow cost: in addition to the cost of taking the action right now, you also pay a cost later in decreased versatility (not just resources). It adds a constraint to the player. How much is that constraint worth as a cost? Well, that’s up to you to figure out for your particular game situation. But remember that it’s not zero, and be sure to factor this into your cost curve analysis.

Versatility Example

How do the numbers for versatility actually work in practice? That depends on the nature of the versatility and the cost and difficulty of switching.

Here’s a contrived example: you’re going into a PvP arena, and you know that your next opponent either has an Ice attack or a Fire attack, but never both and never neither – always one or the other. You can buy a Protection From Ice enchantment which gives you protection from Ice attacks, or a Protection From Fire enchantment which gives you protection from Fire attacks (or both, if you want to be sure, although that’s kind of expensive). Let’s say both enchantments cost 10 Gold each.

Now, suppose we offer a new item, Protection From Elements, which gives you both enchantments as a package deal. How much should it cost? (“It depends!”) Okay, what does it depend on?

If you’ve been paying attention, you know the answer: it depends on how much you know about your next opponent up front, and it depends on the cost of switching from one to the other if you change your mind later.

If you know ahead of time that they will be, say, a Fire attack, then the package should cost the same as Protection from Fire: 10 Gold. The “versatility” here offers no added value, because you already know the optimal choice.

If you have no way of knowing your next opponent’s attack type until it’s too late to do anything about it, and you can’t switch protections once you enter the arena, then Protection From Elements should cost 20 Gold, the same as buying both. Here, versatility offers you exactly the same added value as just buying both things individually. There’s no in-game difference between buying them separately or together.

Ah, but what if you have the option to buy one before the combat, and then if the combat starts and you realize you guessed wrong, you can immediately call a time-out and buy the other one? In this case, you would normally spend 10 Gold right away, and there’s a 50% chance you’ll guess right and only have to spend 10 Gold, and a 50% chance you’ll guess wrong and have to spend an additional 10 Gold (or 20 Gold total) to buy the other one. The expected value here is (50%*10) + (50%*20) = 15 Gold, so that is what the combined package should cost in this case.

What if the game is partly predictable? Say you may have some idea of whether your opponent will use Fire or Ice attacks, but you’re not completely sure. Then the optimal cost for the package will be somewhere between the extremes, depending on exactly how sure you are.

Okay, so that last one sounds kind of strange as a design. What might be a situation in a real game where you have some idea but not a complete idea of what your opponent is bringing against you? As one example, in an RTS, I might see some parts of the army my opponent is fielding against me, so that gives me a partial (but not complete) sense of what I’m up against, and I can choose what units to build accordingly. Here, a unit that is versatile offers some value (my opponent might have some tricks up their sleeve that I don’t know about yet) but not complete value (I do know SOME of what the opponent has, so there’s also value in building troops that are strong against their existing army).

Case Studies in Situational Balance

So, with all of that said, let’s look at some common case studies.

Single-target versus area-effect (AoE)

For things that do damage to multiple targets instead of just one at a time, other things being equal, how much of a benefit is that splash damage?

The answer is generally, take the expected number of things you’ll hit, and multiply. So, if enemies come in clusters from 1 to 3 in the game, evenly distributed, then on average you’ll hit 2 enemies per attack, doing twice the damage you would normally, so splash damage is twice the benefit.

A word of warning: “other things being equal” is really tricky here, because generally other things aren’t equal in this case. For example, in most games, enemies don’t lose offensive capability until they’re completely defeated, so just doing partial damage isn’t as important as doing lethal damage. In this case, spreading out the damage slowly and evenly can be less efficient than using single-target high-power shots to selectively take out one enemy at a time, since the latter reduces the offensive power of the enemy force with each shot, while the area-effect attack doesn’t do that for awhile. Also, if the enemies you’re shooting at have varying amounts of HP, an area-effect attack might kill some of them off but not all, reducing a cluster of enemies to a few corpses and a smaller group (or lone enemy), which then reduces the total damage output of your subsequent AoE attacks – that is, AoE actually makes itself weaker over time as it starts working! So this is something you have to be careful of as well: looking at typical encounters, how often enemies will be clustered together, and also how long they’ll stay that way throughout the encounter.

Attacks that are strong (or weak) against a specific enemy type

We did an example of this before, with dragons and trolls. Multiply the extra benefit (or liability) as if it were always in effect during all encounters, by the expected percentage of the time it actually will matter (that is, how often do you encounter the relevant type of enemy).

The trick here, as we saw in that earlier example, is you have to be very careful of what the extra benefit or liability is really worth, because something like “double damage” or “half damage” is rarely double or half the actual value.

Metagame objects that you can choose to use or ignore

Sometimes you have an object that’s sometimes useful and sometimes not, but it’s at the player’s discretion whether to use it – so if the situation doesn’t call for it, they simply don’t spend the resources.

Examples are situational weapons in an FPS that can be carried as an “alternate” weapon, specialized units in an RTS that can be build when needed and ignored otherwise, or situational cards in a CCG that can be “sideboarded” against relevant opponents. Note that in these cases, they are objects that depend on things outside of player control: what random map you’re playing on, what units your opponent is building, what cards are in your opponent’s deck.

In these cases, it’s tempting to cost them according to the likelihood that they will be useful. For example, if I have a card that does 10 damage against a player who is playing Red in Magic, and I know that most decks are 2 or 3 colors so maybe 40-50% of the time I’ll play against Red in open play, then we would cost this the same as a card that did 4 or 5 damage against everyone. If the player must choose to use it or not before play begins, with no knowledge of whether the opponent is playing Red or not, this would be a good method.

But in some of these cases, you do know what your opponent is doing. In a Magic tournament, after playing the first game to best-of-3, you are allowed to swap some cards into your deck from a “Sideboard”. You could put this 10-damage-to-Red card aside, not play with it your first game, and then bring it out on subsequent games only if your opponent is playing Red. Played this way, you are virtually assured that the card will work 100% of the time; the only cost to you is a discretionary card slot in your sideboard, which is a metagame cost. As we learned a few weeks ago, trying to cost something in the game based on the metagame is really tricky. So the best we can say is that it should cost a little less to compensate for the metagame cost, but it probably shouldn’t be half off like it would be if the player always had to use it… unless we want to really encourage its use as a sideboard card by intentionally undercosting it.

Likewise with a specialized RTS unit, assuming it costs you nothing to earn the capability of building it. If it’s useless most of the time, you lose nothing by simply not exercising your option to build it. But when it is useful, you will build it, and you’ll know that it is useful in that case. So again, it should be costed with the assumption that whatever situation it’s built for, actually happens 100% of the time. (If you must pay extra for the versatility of being able to build the situational unit in the first place, that cost is what you’d want to adjust based on a realistic percentage of the time that such a situation is encountered in real play.)

With an alternate weapon in an FPS, a lot depends on exactly how the game is structured. If the weapons are all free (no “resource cost”) but you can only select one main and one alternate, then you need to make sure the alternates are balanced against each other, i.e. that each one is useful in equally likely situations, or at least that if you multiply the situational benefit by the expected probability of receiving that benefit, that should be the same across all weapons (so you might have a weapon that’s the most powerful in the game but only in a really rare situation, versus a weapon that’s mediocre but can be used just about anywhere, and you can call those balanced if the numbers come out right).

Metagame “combos”

Now, we just talked about situations where the player has no control. But what if they do have control… that is, if something isn’t particularly useful on its own, but it forms a powerful combo with something else? An example would be dual-wielding in an FPS, “support” character classes in an MMO or multiplayer FPS, situational cards that you build your deck around in a CCG, support towers in a Tower Defense game that only improve the towers next to them, and so on. These are situational in a different way: they reward the player for playing the metagame in a certain way.

To understand how to balance these, we first have to return to the concept of opportunity costs from earlier. In this case, we have a metagame opportunity cost: you have to take some other action in the game completely apart from the thing we’re trying to balance, in order to make that thing useful. There are a few ways we could go about balancing things like this, depending on the situation.

One is to take the combo in aggregate and balance that, then try to divide that up among the individual components based on how useful they are outside of the combo. For example, Magic had two cards, Lich and Mirror Universe:

  • Lich reduced you to zero life points, but added additional rules that effectively turned your cards into your life – this card on its own was incredibly risky, because if it ever left play you would still have zero life, and thus lose the game immediately! Even without that risk, it was of questionable value, because it basically just helped you out if you were losing, and cards that are the most useful when you’re losing mean that you’re playing to lose, which isn’t generally a winning strategy.
  • Mirror Universe was a card that would swap life totals with your opponent – not as risky as Lich since you’re in control of when to use it, but still only useful when you’re losing and not particularly easy to use effectively.
  • But combined… the two cards, if uncountered, immediately win you the game by reducing your life to zero and then swapping totals with your opponent: an instant win!

How do you cost this? Now, this is a pretty extreme example, where two cards are individually all but useless, don’t really work that well in any other context, but combined with each other are all-powerful. The best answer for a situation like this might be to err on the side of making their combined cost equal to a similarly powerful game-winning effect, perhaps marked down slightly because it requires a two-card combination (which is harder to draw than just playing a single card). How do you split the cost between them – should one be cheap and the other expensive, or should they both be about the same? Weight them according to their relative usefulness. Lich does provide some other benefits (like drawing cards as an effect when you gain life) but with a pretty nasty drawback. Mirror Universe has no drawback, and a kind of psychological benefit that your opponent might hold off attacking you because they don’t want to almost kill you, then have you use it and plink them to death. These are hard to balance against one another directly, but looking at what actually happened in the game, their costs are comparable.

How about a slightly less extreme example? A support character class in an MMO can offer lots of healing and attribute bonuses that help the rest of their team. On their own they do have some non-zero value (they can always attack an enemy directly if they have to, if they can heal and buff themselves they might even be reasonably good at it, and in any case they’re still a warm body that can distract enemies by giving them something else to shoot at). But their true value shows up in a group, where they can take the best members of a group and make them better. How do you balance something like this?

Let’s take a simple example. Suppose your support character has a special ability that increases a single ally’s attack value by 10%, and that they can only have one of these active at a time, and this is part of their tech tree; you want to find the expected benefit of that ability so you can come up with an appropriate cost. To figure out what that’s worth, we might assume a group of adventurers of similar level, and find the character class in that group with the highest attack value, and find our expected attack value for that character class. In a group, this “attack buff” support ability would be worth about 10% of that value. Obviously it would be less useful if questing solo, or with a group that doesn’t have any good attackers, so you’d have to figure the percentage of time that you can expect this kind of support character to be traveling with a group where this attack boost is useful, and factor that into your numbers. In this case, the opportunity cost for including an attacker in your party is pretty low (most groups will have at least one of those anyway), so this support ability is almost always going to be operating at its highest level of effectiveness, and you can balance it accordingly.

What do the Lich/Mirror Universe example and the support class example have in common? When dealing with situational effects that players have control over, a rule of thumb is to figure out the opportunity costs for them setting up that situation, and factoring that in as a “cost” to counteract the added situational benefit. Beyond that, the cost should be computed under best case situations, not some kind of “average” case: if the players are in control of whether they use each part of the combo, we can assume they’re going to use it under optimal conditions.

Multi-class characters

As long as we’re on the subject of character classes, how about “multi-class” characters that are found in many tabletop RPGs? The common pattern is that you gain versatility, in that you have access to the unique specialties of several character types… but in exchange for that, you tend to be lower level and less powerful in all of those types than if you were dedicated to a single class. How much less powerful do you have to be so that multi-classing feels like a viable choice (not too weak), but not one that’s so overpowered that there’s no reason to single-class?

This is a versatility problem. The player typically doesn’t know what kinds of situations their character will be in ahead of time, so they’re trying to prepare for a little of everything. After all, if they knew exactly what to expect, they would pick and choose the most effective single character class and ignore the other! However, they do probably have some basic idea of what they’re going to encounter, or at least what capabilities their party is going to need that are not yet accounted for, so a Level 5 Fighter/Thief is probably not as good as a Level 10 Fighter or Level 10 Thief. Since the player must choose ahead of time what they want and they typically can’t change their class in the middle of a quest, they are more constrained, so you’ll probably do well with a starting guess of making a single class 1.5x as powerful as multi-class and then adjusting downward from there as needed. That is, a Level 10 single-class is usually about as powerful as a Level 7 or 8 dual-class.

Either-or choices from a single game object

Sometimes you have a single object that can do one thing or another, player’s choice, but not both (the object is typically either used up or irreversibly converted as part of the choice). Maybe you have a card in a CCG that can bring a creature into play or make an existing one bigger. Or you have a lump of metal in an RPG that can be fashioned into a great suit of armor or a powerful weapon. Or you’re given a choice to upgrade one of your weapons in an FPS. In these kinds of cases, assuming the player knows the value of the things they’ll get (but they can only choose one), the actual benefit is probably going to be more than either option individually, but less than all choices combined, depending on the situation. What does it depend on? This is a versatility problem, so it depends on the raw benefit of each choice, the cost/difficulty of changing their strategy in mid-game, and the foreknowledge of the player regarding the challenges that are coming up later.

The Difference Between PvE and PvP

Designing PvE games (“Player versus Environment,” where it’s one or more players cooperating against the computer, the system, the AI or whatever) is different than PvP games (“Player versus Player,” where players are in direct conflict with each other) when it comes to situational balance.

PvE games are much easier. As the game’s designer, you’re designing the environment, you’re designing the levels, you’re designing the AI. You already know what is “typical” or “expected” in terms of player encounters. Even in games with procedurally-generated content where you don’t know exactly what the player will encounter, you know the algorithms that generate it (you designed them, after all) so you can figure out the expected probability that the content generator will spit out certain kinds of encounters, and within what range.

Because of this, you can do expected-value calculations for PvE games pretty easily to come up with at least a good initial guess for your costs and benefits when you’re dealing with the situational parts of your game.

PvP is a little trickier, because players can vary their strategies. “Expected value” doesn’t really have meaning when you don’t know what to expect from your opponent. In these cases, playtesting and metrics are the best methods we have for determining typical use, and that’s something we’ll discuss in more detail over the next couple of weeks.

If You’re Working on a Game Now…

Choose one object in your game that’s been giving you trouble, something that seems like it’s always either too good or too weak, and which has some kind of conditional or situational nature to it. (Since situational effects are some of the trickiest to balance, if something has been giving you trouble, it’s probably in that category anyway.)

First, do a thorough search for any shadow costs you may have. What opportunities or versatility do you have to give up in order to gain this object’s capabilities? What other things do you have to acquire first before you even have the option of acquiring this object? Ask yourself what those additional costs are worth, and whether they are factored in to the object’s resource cost.

Next, consider the versatility of the object itself. Is it something that’s useful in a wide variety of situations, or only rarely? How much control does the player have over their situation – that is, if the object is only useful in certain situations, can the player do anything to make those situations more likely, thus increasing the object’s expected value?

How easy is it for the player to change their mind (the versatility of the player versus versatility of the object, since a more versatile player reduces the value of object-based versatility) – if the player takes this object but then wants to replace it with something else, or use other objects or strategies when they need to, is that even possible… and if so, is it easy, or is there a noticeable cost to doing so? How much of a liability is it if the player is stuck in a situation where the object isn’t useful? Now, consider how the versatility of the game’s systems and the versatility of the individual objects should affect their benefits and costs.

See if looking at that object in a new way has helped to explain why it felt too weak or too powerful. Does this give you more insight into other objects as well, or the game’s systems overall?

Homework

For your “homework”, we’re going to look at Desktop Tower Defense 1.5, which was one of the games that popularized the genre of tower defense games. (I’ll suggest you don’t actually play it unless you absolutely have to, because it is obnoxiously addicting and you can lose a lot of otherwise productive time just playing around with the thing.)

DTD 1.5 is a great game for analysis of situational game balance, because nearly everything in the game is situational! You buy a tower and place it down on the map somewhere, and when enemies come into range the tower shoots at them. Buying or upgrading towers costs money, and you get money from killing the enemies with your towers. Since you have a limited amount of money at any time in the game, your goal is to maximize the total damage output of your towers per dollar spent, so from the player’s perspective this is an efficiency problem.

The situational nature of DTD

So, on the surface, all you have to do is figure out how much damage a single tower will do, divide by cost, and take the tower with the best damage-to-cost ratio. Simple, right?

Except that actually figuring out how much damage your towers do is completely situational! Each tower has a range; how long enemies stay within that range getting shot at depends entirely on where you’ve placed your towers. If you just place a tower in the middle of a bunch of open space, the enemies will walk right by it and not be in danger for long; if you build a huge maze that routes everyone back and forth in range of the tower in question, its total damage will be a lot higher.

Furthermore, most towers can only shoot one enemy at a time, so if a cluster of enemies walks by, its total damage per enemy is a lot smaller (one enemy gets shot, the others don’t). Other towers do area-effect or “splash” damage, which is great on clusters but pretty inefficient against individual enemies, particularly those that are spaced out because they move fast. One of the tower types doesn’t do much damage at all, but slows down enemies that it shoots, which keep them in range of other towers for longer, so the benefit depends on what else is out there shooting. Some towers only work against certain types of enemies, or don’t work against certain enemy types, so there are some waves where some of your towers are totally useless to you even if they have a higher-than-normal damage output at other times. And then there’s one tower that does absolutely nothing on its own, but boosts the damage output of all adjacent towers… so this has a variable cost-to-benefit ratio depending on what other towers you place around it. Even more interesting, placing towers in a giant block (to maximize the effectiveness of this boost tower) has a hidden cost itself, in that it’s slightly less efficient in terms of usage of space on the board, since there’s this big obstacle that the enemies get to walk around rather than just having them march through a longer maze. So, trying to balance a game like this is really tough, because everything depends on everything else!

Your mission, should you choose to accept it…

Since this is a surprisingly deep game to analyze, I’m going to constrain this to one very small part of the game. In particular, I want you to consider two towers: the Swarm tower (which only works against flying enemies but does a lot of damage to them) and the Boost tower (that’s the one that increases the damage of the towers around it). Now, the prime spot to put these is right in the center of the map, in this little 4×3 rectangular block. Let’s assume you’ve decided to dedicate that twelve-tower area to only Swarm and Boost towers, in order to totally destroy the flying enemies that come your way. Assuming that you’re trying to minimize cost and maximize damage, what’s the optimal placement of these towers?

To give you some numbers, a fully-upgraded Swarm tower does a base of 480 damage per hit, and costs $640 in the game. A fully-upgraded Boost tower costs $500 and does no damage, but improves all adjacent towers (either touching at a side or a corner) by +50%, so in practical terms a Boost tower does 240 damage for each adjacent Swarm tower. Note that two Boost towers adjacent to each other do absolutely nothing for each other – they increase each other’s damage of zero by +50%, which is still zero.

Assume all towers will be fully upgraded; the most expensive versions of each tower have the most efficient damage-to-cost ratios.

The most certain way to solve this, if you know any scripting or programming, is to write a brute-force program that runs through all 3^12 possibilities (no tower, Swarm tower or Boost tower in each of the twelve slots). For each slot, count a damage of 480 if a Swarm tower, 240*(number of adjacent Swarm towers) for a Boost tower, or 0 for an empty slot; for cost, count 640 per Swarm tower, 500 for each Boost tower, and 0 for an empty slot. Add up the total damage and cost for each scenario, and keep track of the best damage-to-cost ratio (that is, divide total damage by total cost, and try to get that as high as possible).

If you don’t have the time or skills to write a brute-force program, an alternative is to create an Excel spreadsheet that calculates the damage and cost for a single scenario. Create a 4×3 block of cells that are either “B” (boost tower), “S” (swarm tower), or blank.

Below that block, create a second block of cells to compute the individual costs of each cell. The formula might be something like:

=IF(B2=”S”,640,IF(B2=”B”,500,0))

Lastly, create a third block of cells to compute the damage of each cell:

=IF(B2=”S”,480,IF(B2=”B”,IF(A1=”S”,240,0)+IF(A2=”S”,240,0)+IF(A3=”S”,240,0)+IF(B1=”S”,240,0)+IF(B3=”S”,240,0)+IF(C1=”S”,240,0)+IF(C2=”S”,240,0)+IF(C3=”S”,240,0),0))

Then take the sum of all the damage cells, and divide by the sum of all the cost cells. Display that in a cell of its own. From there, all you need to do is play around with the original cells, changing them by hand from S to B and back again to try to optimize that one final damage-to-cost value.

The final deliverable

Once you’ve determined what you think is the optimal damage-to-cost configuration of Swarm and Boost towers, figure out the actual cost and benefit from the Swarm towers only, and the cost and benefit contributed by the Boost towers. Assuming optimal play, and assuming only this one very limited situation, which one is more powerful – that is, on a dollars-for-damage basis, which of the two types of tower (Swarm or Boost) contributes more to your victory for each dollar spent?

That’s all you have to do, but if you want more, you can then take it to any level of analysis you want – as I said, this game is full of situational things to balance. Flying enemies only come every seventh round, so if you want to compute the actual damage efficiency of our Swarm/Boost complex, you’d have to divide by 7. Then, compare with other types of towers and figure out if some combination of ground towers (for the 6 out of 7 non-flying levels) and the anti-flying towers should give you better overall results than using towers that can attack both ground and air. And then, of course, you can test out your theories in the game itself, if you have the time. I look forward to seeing some of your names in the all-time high score list.

Level 5: Probability and Randomness Gone Horribly Wrong

August 4, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Questions

If you want to check your answers from last week:

Dragon Die

First off, note that the so-called “Dragon Die” is really just a “1d6+1” in disguise. If you think of the Dragon as a “7” since it always wins, the faces are 2-3-4-5-6-7, so really this is just asking how a +1 bonus to a 1d6 die roll affects your chance of rolling higher. It turns out the answer is: a lot more than most people think!

If you write out all 36 possibilities of 2-7 versus 1-6, you find there are 21 ways to lose to the House (7-1, 7-2, 7-3, 7-4, 7-5, 7-6, 6-1, 6-2, 6-3, 6-4, 6-5, 5-1, 5-2, 5-3, 5-4, 4-1, 4-2, 4-3, 3-1, 3-2, 2-1), 5 ways to draw (6-6, 5-5, 4-4, 3-3, 2-2), and 10 ways to win (5-6, 4-5, 4-6, 3-4, 3-5, 3-6, 2-3, 2-4, 2-5, 2-6). We ignore draws, since a draw is just a re-roll, which we would keep repeating until we got a win or loss event, so only 31 results end in some kind of resolution. Of those, 21 are a loss and 10 are a win, so this game only gives a 10/31 chance of winning. In other words, you win slightly less than 1 time in 3.

Chuck-a-luck

Since there are 216 ways to get a result on 3d6, it is easier to do this in Excel than by hand. If you count it up, though, you will find that if you choose the number 1 to pay out, there are 75 ways to roll a single win (1-X-X where “X” is any of the five losing results, so there are 25 ways to do this; then X-1-X and X-X-1 for wins with the other two dice, counting up to 75 total). There are likewise 15 ways to roll a double win (1-1-X, 1-X-1, X-1-1, five each of three different ways = 15), and only one way to roll a triple win (1-1-1). Since all numbers are equally likely from 1 to 6, the odds are the same no matter which of the six numbers you choose to pay out.

To get expected value, we multiply each of the 216 results by its likelihood; since all 216 die roll results are equally likely, we simply add all results together and divide by 216 to get the expected win or loss percentage.

(75 ways to win * $1 winnings) + (15 ways for double win * $2) + (1 way for triple win * $3) = $108 in winnings. Since we play 216 times and 108 is half of 216, at first this appears to be even-odds.

Not so fast! We still must count up all the results when we lose, and we lose more than 108 times. Out of the 216 ways to roll the dice, there are only 91 ways to win, but 125 ways to lose (the reason for the difference is that while doubles and triples are more valuable when they come up on your number, there are a lot more ways to roll doubles and triples on a number that isn’t yours). Each of those 125 losses nets you a $1 loss.

Adding everything up, we get an expected value of negative $17 out of 216 plays, or an expected loss of about 7.9 cents each time you play this game with a $1 bet. That may not sound like much (7.9 cents is literally pocket change), but keep in mind this is per dollar, so the House advantage of this game is actually 7.9%, one of the worst odds in the entire casino!

Royal Flush

In this game, you’re drawing 5 cards from a 52-card deck, sequentially. The cards must be 10-J-Q-K-A but they can be in any order. The first card can be any of those 5 cards of any suit, so there are 20 cards total on your first draw that make you potentially eligible for a royal flush (20/52). For the second card, it must match the suit of the first card you drew, so there are only 4 cards out of the remaining 51 in the deck that you can draw (4/51). For the third card, there are only 3 cards out of the remaining 50 that you need for that royal flush (3/50). The fourth card has 2 cards out of 49, and the final card must be 1 card out of 48. Multiplying all of these together, we get 480 / 311,875,200, or 1 in 649,740. If you want a decimal, this divides to 0.0000015 (or if you multiply by 100 to get a percentage, 0.00015%, a little more than one ten-thousandth of one percent). For most of us, this means seeing a “natural” Royal Flush from 5 cards is a once in a lifetime event, if that.

IMF Lottery

If you check the comments from last week, several folks found solutions for this that did not require a Monte Carlo simulation. The answer is 45 resources, i.e. the card will stay in play for an average of 10 turns. Since it has a 10% chance of leaving play each turn, this actually ends up being rather intuitive… but, as we’ve seen from probability, most things are not intuitive. So the fact that this one problem ends up with an answer that’s intuitive, is itself counterintuitive.

This Week’s Topic

Last week, I took a brief time-out from the rest of the class where we’ve been talking about how to balance a game, to draw a baseline for how much probability theory I think every designer needs to know, just so we can compute some basic odds. This week I’m going to blow that all up by showing you two places where the true odds can go horribly wrong: human psychology, and computers.

Human Psychology

When I say psychology, this is something I touched on briefly last week: most people are really terrible at having an intuition for true odds. So even if we actually make the random elements of our games perfectly fair, which as we’ll see is not always trivial, an awful lot of players will perceive the game as being unfair. Therefore, as game designers, we must be careful to understand not just true probability, but also how players perceive the probability in our games and how that differs from reality, so that we can take that into account when designing the play we want them to experience.

Computers

Most computers are deterministic machines; they are just ones and zeros, following deterministic algorithms to convert some ones and zeros into other ones and zeros. Yet somehow, we must get a nondeterministic value (a “random number”) from a deterministic system. This is done through some mathematical sleight-of-hand to produce what we call pseudorandom numbers: numbers that sorta kinda look random, even though in reality they aren’t. Understanding the difference between random and pseudorandom has important implications for video game designers, and even board game designers if they ever plan to make a computer version of their game (which happens often with “hit” board games), or if they plan to include electronic components in their board games that have any kind of randomness.

But First… Luck Versus Skill

Before we get into psychology and computers, there’s this implicit assumption that we’ve mostly ignored for the past week, that’s worth discussing and challenging. The assumption is that adding some randomness to a game can be good, but too much randomness is a bad thing… and maybe we have some sense that all games fall along some kind of continuum between two extremes of “100% skill-based” (like Chess or Go) and “100% luck-based” (like Chutes & Ladders or Candyland).

If these are the only games we look at, we might go so far as to think of a corresponding split between casual and hardcore: the more luck in a game, the more casual the audience; the more a game’s outcome relies on skill, the more we think of the game as hardcore.

This is not always the case, however. For example, Tic-Tac-Toe has no randomness at all, but we don’t normally think of it as a game that requires a lot of skill. Meanwhile, each hand of Poker is highly random, but we still think of it as a game where skill dominates. And yet, the game of Blackjack is also random, but aside from counting cards, we see that more as a game of chance than Poker.

Then we get into physical contests like professional sports. On the one hand, we see these as games of skill. Yet, enthusiasts track all kinds of statistics on players and games, we talk about percentage chances of a player making a goal or missing a shot or whatever, sports gamblers make cash bets on the outcomes of games, as if these were not games of skill but games of chance.

What’s going on here? There are a few explanations.

Poker vs. Blackjack

Why the difference in how we perceive Poker and Blackjack? The difference is in when the player makes their bet, and what kind of influence the player’s choices have on the outcome of the game.

In Poker, a successful player computes the odds to come up with a probability calculation that they have the winning hand, and they factor that into their bet along with their perceived reactions of their opponents. As more cards are revealed, the player adjusts their strategy. The player’s understanding of the odds and their ability to react to changes has a direct relation to their performance in the game.

In Blackjack, by contrast, you place a bet at the beginning of the hand before you know what cards you’re dealt, and you generally don’t have the option to “raise” or “fold” as you see more cards revealed.

Blackjack does have some skill, but it’s a very different kind of skill than Poker. Knowing when to hit or stand or split or double down based on your total, the dealer’s showing card, and (if you’re counting cards) the remaining percentage of high cards in the deck… these things are skill in the same sense as skill at Pac-Man. You are memorizing and following a deterministic pattern, but you are not making any particularly interesting decisions. You simply place your bet according to an algorithm, and expect that over time you do as well as you can, given the shuffle of the cards. It’s the same reason we don’t think of the casino as having a lot of “skill” at Craps or Roulette, just because it wins more than it loses.

Professional Sports

What about the sports problem, where a clearly skill-based game seems like it involves random die-rolls? The reason for the paradox is that it all depends on your frame of reference. If you are a spectator, by definition you have no control over the outcome of a game; as far as you’re concerned the outcome is a random event. If you are actually a player on a sports team, the game is won or lost partly by your level of skill. This is why on the one hand sports athletes get paid based on how much they win (it’s not a gamble for them), but sports fans can still gamble on the random (from their perspective) outcome of the game.

Action Games

Randomness works a little differently in action-based video games (like typical First-Person Shooter games), where the players are using skill in their movement and aiming to shoot their opponents and avoid getting shot. We think of these as skill-based games, and in fact these games seem largely incompatible with randomness. There is enough chaos in the system without random die-rolls: if I’m shooting at a moving target, I’ll either hit or miss, based on a number of factors that are difficult to have complete control of. Now, suppose instead the designer thought they’d be clever and add a small perturbation on bullet fire from a cheap pistol, to make it less accurate. You line up someone in your sights, pull the trigger… and miss, because it decided to randomly make the bullet fly too far to the left. How would the player react to that?

Well, they might not notice. The game is going so fast, you’re running, they’re running, you squeeze off a few shots, you miss, you figure you must have just not been as accurate as you thought (or they dodged well). Or if they’re standing still, you sneak up behind them, and you’re sure you have the perfect shot and still miss, you’ll feel like the game just robbed you; that’s not fun and it doesn’t make the game more interesting, it just makes you feel like you’re being punished arbitrarily for being a good shot.

Does that mean luck plays no role in action games? I think you can increase the luck factor to even the playing field, but you have to be very careful about how you do it. Here’s a common example of how a lot of FPSs increase the amount of luck in the game: headshots. The idea here is that if you shoot someone in the head rather than the rest of the body, it’s an instant kill, or something like that.

Now, you might be thinking… wait, isn’t that a skill thing? You’re being rewarded for accuracy, hitting a small target and getting bonus damage because you’re just that good… right? In some cases that is true, depending on the game, but in a lot of games (especially older ones) that kind of accuracy just really isn’t possible in most situations; you’re moving, they’re moving, the head has a tiny hit box, the gun doesn’t let you zoom in enough at a distance to be really sure if you aren’t off by one or two pixels in any direction… so from a distance, at least, a head shot isn’t something that most players can plan on. Sometimes it’ll happen anyway, just by accident if you’re shooting in the right general direction… so sometimes, through no fault of the player, they’ll get a headshot. This evens the playing field slightly; without headshots, if it takes many shots to score a frag, the more skilled players will almost always win because they are better at dodging, circle-strafing, and any other number of techniques that allow them to outmaneuver and outplay a weaker player. With headshots, the weaker player will occasionally just get an automatic kill by accident, so it makes it more likely that the weaker player will see some successes, which is usually what you want as a designer.

Shifting Between Luck and Skill

As we just saw, with action games, adding more of a luck-factor to the game is tricky but still possible, by creating these unlikely (but still possible to perform by accident) events.

With slower, more strategic games, adding luck (or skill) is more straightforward. To increase the level of luck in the game:

  • Change some player decisions into random events.
  • Reduce the number of random events in the game (this way the Law of Large Numbers doesn’t kick in as much, thus the randomness is less likely to be evenly distributed).
  • Increase the impact that random events have on the game state.
  • Increase the range of randomness, such as changing a 1d6 roll to a 1d20 roll.
  • If you want to increase the level of skill in the game, do the reverse of any or all of the above.

What is the “best” mix of luck and skill for any given game? That depends mostly on your target audience. Very young children may not be able to handle lots of choices and risk/reward or short/long term tradeoffs, but they can handle throwing a die or spinning a spinner and following directions. Competitive adults and core gamers often prefer games that skew more to the skill side of things so they can exercise their dominance, but (except in extreme cases) not so much that they feel like they can’t ever win against a stronger opponent. Casual gamers may see the game as a social experience more than a chance to exercise the strategic parts of their brains, so they actually prefer to think less and make fewer decisions so they can devote more of their limited brainpower to chatting with their friends. There is no single right answer here for all games; recognize that a certain mix of luck and skill is best for your individual game, and part of your job as a game designer is to listen to your game to find out where it needs to be. Sometimes that means adding more randomness, sometimes it means removing it, and sometimes it means keeping it there but changing the nature of the randomness. The tools we gained from last week should give you enough skills to be able to assess the nature of randomness in your game, at least in part, and make appropriate changes.

And Now… Human Psychology

As we saw last week, human intuition is generally terrible when it comes to odds estimation. You might have noticed this with Dragon Die or Chuck-a-Luck; intuitively, both games seem to give better odds of winning than they actually do. Many people also have a flawed understanding of how probability works, as we saw last week with the gambler’s fallacy (expecting that previous independent events like die-rolls have the power to influence future ones). Let’s dig into these errors of thought, and their implications to gamers and game designers.

Selection Bias

When asked to do an intuitive odds estimation, where does our intuition come from? The first heuristic most people use is to check their memory recall: how easy is it to recall different events? The easier it is to remember many examples, the more we assume that event is likely or probable. This usually gives pretty good results; if you’re rolling a weighted die a few hundred times and seem to remember the number 4 coming up more often, you’ll probably have a decent intuition for how often it actually comes up. As you might guess, this kind of intuition will fail whenever it’s easier to recall a rare event than a common one.

Why would it be easier to recall rare events than common events? For one thing, rare events that are sufficiently powerful tend to stick in our minds (I bet you can remember exactly where you were when the planes struck the towers). We sometimes see rare events happen more often than common ones due to media coverage. For example, many people are more afraid of dying in a plane crash than dying in an auto accident, even though an auto fatality is far more likely. There are a few reasons for this, but one is that any time a plane crashes anywhere, it’s international news; car crashes, by contrast, are so common that they’re not reported… so it is much easier to remember a lot of plane crashes than a lot of car crashes. Another example is the lottery; lotto winners are highly publicized while the millions of losers are not, leading us to assume that winning is more probable than it actually is.

What does any of this have to do with games? For one thing, we tend to remember our epic wins much more easily than we remember our humiliating losses (another trick our brains play on us just to make life more bearable). People tend to assume they’re above average at most things, so absent of actual hard statistics, players will tend to overestimate their own win percentage / skill. This is dangerous, in games where players can set their difficulty level or choose their opponents. In general, we want a player to succeed a certain percentage of the time, and tune the difficulty of our games accordingly; if a player chooses a difficulty that’s too hard for them, they’ll struggle a bit more and be more likely to give up in frustrations. By being aware of this tendency, we can try (for example) to force players into a good match for their actual skills – through automated matchmaking, dynamic difficulty adjustment, or other tricks.

Self-Serving Bias

There’s a certain point where an event is unlikely but still possible, where players will assume it is much more likely than it actually is. In Sid Meier’s GDC keynote this year, he placed this from experience at somewhere around 3:1 or 4:1… that is, if the player had a 75 to 80% chance of winning or greater, and they did win exactly that percentage of the time, it would feel wrong to them, like they were losing more than they should. His playtesters expected to win nearly all of the time, I’d guess around 95% of the time, if the screen displayed a 75% or 80% chance.

Players also have a self-serving bias, that probably ties into what I said before about how everyone thinks they’re above-average. So while players are not okay with losing a quarter of the time when they have a 75% win advantage, they are perfectly okay winning a quarter of the time if they are at a 1:3 disadvantage.

Attribution Bias

In general, players are much more likely to accept a random reward than a random setback or punishment. And interestingly, they interpret these random events very differently.

With a random reward, players have a tendency to internalize the event, to believe that they earned the reward through superior decision-making and strategy in play. Sure, maybe it was a lucky die roll, but they were the ones who chose to make the choices that led to the die roll, and their calculated risk paid off, so clearly this was a good decision on their part.

With a random setback, players tend to externalize the event; they blame the dice or cards, they say they were just unlucky. If it happens too much, they might go so far as to say that they don’t like the game because it’s unfair. If they’re emotionally invested enough in the game, such as a high-stakes gambling game, they might even accuse other players of cheating! With video games the logic and random-number generation is hidden, so we see some even stranger player behavior. Some players will actually believe that the AI is peeking at the game data or altering the numbers behind their back and cheating on purpose, because after all it’s the computer so it can theoretically do that.

Basically, people handle losing very differently from winning, in games and in life.

Anchoring

Another way people get odds wrong is this phenomenon called anchoring. The idea is, whatever the first number is that people see, they latch onto that and overvalue it. So for example, if you got to a casino and look at any random slot machine, probably the biggest, most attention-grabbing thing on there is the number of coins you can win with a jackpot. Because people look at that, and concentrate on it, and it gives them the idea that their chance of winning is much bigger than it actually is.

Sid Meier mentioned a curious aspect of that during his keynote. Playtesters – the same ones that were perfectly happy losing a third of the time when the had a 2:1 advantage just like they were supposed to – would feel the game was unfair if they lost a third of the time when they had a 20:10 advantage. Why? Because the first number they see is that 20, which seems like a big number so they feel like they’ve got a lot of power there… and it feels a lot bigger than 10, so they feel like they should have this overwhelming advantage. (Naturally, if they have a 10:20 disadvantage, they are perfectly happy to accept one win in three.)

It also means that a player who has, say, a small amount of base damage and then a bunch of bonuses may underestimate how much they do.

The Gambler’s Fallacy

Now we return to the gambler’s fallacy, which is that people expect random numbers to look random. Long streaks make people nervous and make them question whether the numbers are actually random.

One statistic I found from the literature is that if you ask a person to generate a “random” list of coin flips from their head, they tend to not be very random. Specifically, if a person’s previous item was Heads, they have a 60% chance of picking Tails for the next flip, and vice versa (this is assuming you’re simply asking them to say “Heads” or “Tails” from their head when they are instructed to come up with a “random” result, not when they’re actually flipping a real coin). In a string of merely 10 coin flips, it is actually pretty likely you’ll get 4 Heads or 4 Tails in a row (since the probability of 4 in a row is 1 in 8), but if you ask someone to give you ten “random” numbers that are either 0 or 1 from their head, they will probably not give you even 3 in a row.

This can lead players astray in less obvious ways. Here’s an example, also from Sid’s keynote. Remember how players feel like a 3:1 advantage means they’ll almost always win, but they’re okay with losing a 2:1 contest about a third of the time? It turns out that if they lose two 2:1 contests in a row, this will feel wrong to a lot of people; they don’t expect unlikely events to happen multiple times in a row, even though by the laws of probability they should.

Here’s another example, to show you why as game designers we need to be keenly aware of this. Suppose you design a video game that involves a series of fair coin flips as part of its core mechanics (maybe you use this to determine who goes first each turn, or something). Probability tells you that 1 out of every 32 plays, the first six coin flips will be exactly the same result. If a player sees this as their first introduction to your game, they may perceive this event as so unlikely that the “random” number generator in the game must be busted somehow. If the coins come up in their favor, they won’t complain… but when regression to the mean kicks in and they start losing half the time like they’re supposed to, they’ll start to feel like the game is cheating, and it will take them awhile to un-learn their (incorrect) first impression that they are more likely to win a flip. Worse, if the player sees six losses in a row, right from the beginning, you can bet that player is probably going to think your game is unfair. To see how much of a problem this can potentially become, suppose your game is a modest hit that sells 3.2 Million units. In that case, one hundred thousand players are going to experience a 6-long streak as their first experience on their first game. That’s a lot of players that are going to think your game is unfair!

The Gambler’s Fallacy is something we can exploit as gamers. People assume that long streaks do not appear random, so when trying to “play randomly” they will actually change values more often than not. Against a non-championship opponent, you can win more than half the time at Rock-Paper-Scissors by knowing this. Insist on playing to best 3 of 5, or 4 of 7, or something. Since you know your opponent is unlikely to repeat their last throw, on subsequent rounds you should throw whatever would have lost to your opponent’s last throw, because your opponent probably won’t do the same thing twice, so you probably won’t lose (the worst you can do is draw).

The Hot-Hand Fallacy

There’s a variant of the Gambler’s Fallacy that mostly applies to sports and other action games. The Hot-Hand fallacy is so called because in the sport of Basketball fans started getting this idea that if a player made two or three baskets in a row, they were “running hot” and more likely to score additional baskets and not miss. (We even see this in sports games like NBA Jam, where becoming “on fire” is actually a mechanic that gives the player a speed and accuracy advantage… and some cool effects like making the basket explode in a nuclear fireball.)

When probability theorists looked at this, their first reaction was that each shot is an independent event, like rolling dice, so there’s no reason why previous baskets should influence future ones at all. They expected that a player would be exactly as likely to make a basket, regardless of what happened in the players’ previous attempts.

Not so fast, said Basketball fans. Who says they’re completely independent events? Psychology plays a role in sports performance. Maybe the player has more confidence after making a few successful shots, and that causes them to play better. Maybe the fans cheering them on gives them a little extra mental energy. Maybe the previous baskets are a sign that the player is hyper-focused on the game and in a really solid flow state, making it more likely they’ll continue to perform well. Who knows?

Fair enough, said the probability theorists, so they looked at actual statistics from a bunch of games to see if previous baskets carried any predictive value for future performance.

As it turned out, both the theorists and sports fans were wrong. If a player made several baskets in a row, it slightly increased their chance of missing next time – the longer the streak, the greater the chance of a miss (relative to what would be expected by random chance). Why? I don’t think we know for sure, but presumably there is some kind of negative psychological effect. Maybe the player got tired. Maybe the other team felt that player was more of a threat, and played a more aggressive defense when that player had the ball. Maybe the crowd’s cheering broke the player’s flow state, or maybe the player gets overconfident and starts taking more unnecessary risks.

Whatever the case, this is something that works against us when players can build up a win streak in our games – especially if we tie that to social rewards, such as achievements, trophies, or leaderboards that give special attention to players with long streaks. Why is this dangerous? Because at best, even if each game is truly an independent random event, we know that a win streak is anomalous. If a player’s performance overall falls on some kind of probability curve (usually a bell curve) and they happen to achieve uncharacteristically high performance in a single game or play session or whatever, odds are their next game will fall lower on the curve. The player is probably erroneously thinking their skills have greatly improved; when they start losing again, they’ll feel frustrated, because they know they can do better. Thus, when the streak inevitably comes to an end, the whole thing is tainted in the player’s mind. It’s as if the designer has deliberately built a system that automatically punishes the player after every reward.

Houston, We Have a Problem…

To sum up, here are the problems we face as designers, when players encounter our random systems:

  • Selection bias: improbable but memorable events are perceived as more likely than they actually are.
  • Self-serving bias: an “unlikely loss” is interpreted as a “nearly impossible loss” when the odds are in the player’s favor. However, an “unlikely win” is still correctly interpreted as an “unlikely but possible win” when the odds are against the player.
  • Attribution bias: a positive random result is assumed to be due to player skill; a negative random result is assumed to be bad luck (or worse, cheating).
  • Anchoring: players overvalue the first or biggest number seen.
  • Gambler’s fallacy: assuming that a string of identical results reduces the chance that the string will continue.
  • Hot-hand fallacy: assuming that a string of identical results increases the chance that the string will continue.

The lesson here is that if you expose the actual probabilities of the game to your players, and your game produces fair, random numbers, players will complain because according to them and their flawed understanding of probability, the game feels wrong.

As designers, what do we do about this? We can complain to each other about how all our stupid players are bad at math. But is there anything we can do to take advantage of this knowledge, that will let us make better games?

When Designers Turn Evil

One way to react to this knowledge is to exploit it in order to extract large sums of money from people. Game designers who turn to the Dark Side of the Force tend to go into the gambling industry, marketing and advertising, or political strategy. (I say this with apologies to any honest designers who happen to work in these industries.)

Gambling

Lotteries and casinos regularly take advantage of selection bias by publicizing their winners, making it seem to people like winning is more likely than it really is. Another thing they can do if they’re more dishonest is to rig their machines to give a close but not quite result more often than would be predicted by random chance, such as having two Bars and then a blank come up on a slot machine, or having four out of five cards for a royal flush come up in a video poker game. These give the players a false impression that they’re closer to winning more often than they actually are, increasing their excitement and anticipation of hitting a jackpot and making it more likely they’ll continue to play.

Marketing and Advertising

Marketers use the principle of anchoring all the time to change our expectations of price. For example, your local grocery store probably puts big discount stickers all over the place to call your attention to the lower prices on select items, and our brains will assume that the other items around them are also less expensive by comparison… even if they’re actually not.

Another example of anchoring is a car dealership that might put two nearly identical models next to each other, one with a really big sticker price and one with a smaller (but still big) price. Shoppers see the first big price and anchor to that, then they see the smaller price and feel like by comparison they’re getting a really good deal… even though they’re actually getting ripped off.

Political Strategy

There are a ton of tricks politicians can use to win votes. A common one these days is to play up people’s fears of vastly unlikely but well-publicized events like terrorism or hurricanes, and make campaign promises that they’ll protect you and keep your family safe. Odds are, they’re right, because the events are so unlikely that they probably won’t happen again anyway.

Scam Artists

The really evil game designers use their knowledge of psychology to do things that are highly effective, and highly illegal. One scam I’ve heard involves writing a large number of people offering your “investment advice” and telling them to watch a certain penny stock between one day and the next. Half of the letters predict the stock will go up, the other half say it’ll go down. Then, whatever actually happens, you take that half where you guessed right, and make another prediction. Repeat that four or five times, and eventually you get down to a handful of people for whom you’ve predicted every single thing right. And those people figure there’s no way that could just be random chance, you must have a working system, and they give you tons of money, and you skip town and move to Fiji or something.

What About Good Game Designers?

All of that is fine and good for those of us who want to be scam artists (or for those of us who don’t want to get scammed). But what about those of us who still want to, you know, make games for entertainment?

We must remember that we’re crafting a player experience. If we want that experience to be a positive one, we need to take into account that our players will not intuitively understand the probabilities expressed in our game, and modify our designs accordingly.

Skewing the Odds

One way to do this is to tell our players one thing, and actually do something else. If we tell the player they have a 75% chance of winning, under the hood we can actually roll it as if it were a 95% chance. If the player gets a failure, we can make the next failure less likely, cumulatively; this makes long streaks of losses unlikely, and super-long streaks impossible.

Random Events

We can use random events with great care, especially those with major game-changing effects, and especially those that are not in the player’s favor. In general, we can avoid hosing the player to a great degree from a single random event; otherwise, the player may think that they did something wrong (and unsuccessfully try to figure out what), or they may feel annoyed that their strategy was torn down by a single bad die-roll and not want to play anymore. Being very clear about why the bad event happened (and what could be done to prevent it in future plays) helps to keep the player feeling in control.

Countering the Hot Hand

To counter the hot-hand problem (where a streak of wins makes it more likely that a player will screw up), one thing to do is to downplay the importance of “streaks” in our games, so that the players don’t notice when they’re on a streak in the first place (and therefore won’t notice, or feel as bad, when the streak ends).

If we do include a streak mechanism, one thing we can do is embed them in a positive feedback loop, giving the player a gameplay advantage to counteract the greater chance of a miss after a string of hits. For example, in Modern Warfare 2, players get certain bonuses if they continue a kill streak (killing multiple enemies without being killed themselves), including better weapons, air support, and even a nuclear strike. With each bonus, it’s more likely their streak will continue because they are now more powerful.

Summary

In short, we know that players have a flawed understanding of probability. If we understand the nature of these flaws, we can change our game’s behavior to conform to player expectations. This will make the game feel more fun and more fair to the players. This was, essentially, one of the big takeaways from Sid Meier’s GDC keynote this year.

A Question of Professional Ethics

Now, this doesn’t sit well with everyone. Post-GDC, there were at least a few people that said: wait, isn’t that dishonest? As game designers, we teach our players through the game’s designed systems. If we bend the rules of probability in our games to reinforce players’ flawed understanding of how probability works, are we not doing a disservice to our players? Are we not taking something that we already know is wrong and teaching it to players as if it were right?

One objection might be that if players aren’t having fun (regardless of whether that comes from our poor design, or their poor understanding of math), then the designer must design in favor of the player – especially if they are beholden to a developer and publisher that expect to see big sales. But is this necessarily valid? Poker, for example, is incredibly popular and profitable… even though it will mercilessly punish any player that dares harbor any flaws in how they think about probability.

To this day, I think it is still an open question, and something each individual designer must decide for themselves, based on their personal values and the specific games they are designing. Take a moment to think about these questions yourself:

  • Are the probabilities in our games something that we can frame as a question of professional ethics?
  • How do we balance the importance of giving the player enjoyment (especially when they are paying for it), versus giving them an accurate representation of reality?
  • What’s more important: how the game actually works, or how players perceive it as working?

One More Solution

I can offer one other solution that works in some specific situations, and that’s to expose not just the stated probabilities to the player, but the actual results as well. For example, if you ask players to estimate their win percentage of a game when it’s not tracked by the game, they’ll probably estimate higher than the real value. If their wins, losses and percentage are displayed to them every time they go to start a game, they have a much more accurate view of their actual skill.

What if you have a random number generator? See if this sounds familiar to you: you’re playing a game of Tetris and you become convinced at some point that the game is out to get you. You never seem to get one of those long, straight pieces until just after you need it, right? My favorite version of Tetris to this day, the arcade version, had a neat solution to this: in a single-player game, your game took up the left half of the screen, and it used the right half to keep track of how many of each type of brick fell down. So if you felt like you were getting screwed, you could look over there and see if the game was really screwing you, or if it was just your imagination. And if you kept an eye on those over time, you’d see that yes, occasionally you might get a little more of one piece than another on average over the course of a single level, but over time it would balance out, and most of the time you’d get about as much of each brick as any other. The game was fair, and it could prove it with cold, hard facts displayed to the player in real time.

How could this work in other games? In a Poker video game against AI opponents, you could let the player know after each hand if they actually held the winning hand or not, and keep ongoing track of their percentage of winning hands, so that they know the deck shuffling is fair. (This might be more controversial with human opponents, as it gives you some knowledge of your opponents’ bluffing patterns.) If you’re making a version of the board game RISK that has simulated die rolls in it, have the game keep track of how frequently each number or combination is rolled, and let the player access those statistics at any time. And so on.

These kinds of things are surprisingly reassuring to a player who can never know for sure if the randomness inside the computer is fair or not.

When Randomness Isn’t Random

This brings us to the distinction of numbers that are random versus numbers that are pseudorandom. Pseudorandom literally means “fake random” and in this case it means it’s not actually entirely random, it just looks that way.

Now, most of the things we use even in physical games for randomness are not perfectly random. Balls with numbers painted on them in a hopper that’s used for Bingo might be slightly more or less likely to come up because the paint gives them slightly different weights. 6-sided dice with recessed dots may be very slightly weighted towards one side or another since they’ve actually got matter that’s missing from them, so their center of gravity is a little off. Also, a lot of 6-sided dice have curved edges, and if those curves are slightly different then it’ll be slightly more likely to keep rolling when it hits certain faces, and thus a little more or less likely to land on certain numbers. 20-sided dice can be slightly oblong rather than perfectly round (due to how they’re manufactured), making it slightly less likely to roll the numbers on the edges. All this is without considering dice that are deliberately loaded, or the person throwing the dice has practiced how to throw what they want!

What about cards? All kinds of studies have shown that the way we shuffle a deck of cards in a typical shuffle is not random, and in fact if you shuffle a certain way for a specific number of times (I forget the exact number but it’s not very large), the deck will return almost exactly to its original state; card magicians use this to make it look like they’re shuffling a deck when they’re actually stacking it. Even an honest shuffle isn’t perfectly random; if you think about it, for example, depending on how you shuffle either the top or bottom card will probably stay in the same position after a single riffle-shuffle, so you have to shuffle a certain number of times before the deck is sufficiently randomized. Even without stacking the deck deliberately, the point is that all shuffles are not equally likely.

In Las Vegas, they have to be very careful about these things, which is why you’ll notice that casino dice are a lot different from those white-colored black-dotted d6s you have at home. One slightly unfair die can cost the casino its gambling license (or cost them money from sufficiently skilled players who know how to throw dice unfairly), which is worth billions of dollars just from their honest, long-term House advantage, which is why they want to be very sure that their dice are as close to perfectly random as possible.

Shuffling cards is another thing casinos have to be careful of. If the cards are shuffled manually, you can run into a few problems with randomness (aside from complaints of carpal tunnel from your dealers). A dealer who doesn’t shuffle enough to sufficiently randomize the deck – because they’re trying to deal more hands so they get lazy with the shuffling – can be exploited by a player who knows there’s a better-than-average chance of some cards following others in sequence. And that’s not even counting dealers who collude with players to do this intentionally.

Casinos have largely moved to automated shufflers to deal with these problems, but those have problems of their own; for example, mechanical shuffles are potentially less random than human shuffles, so a careful gambler can use a hidden camera to analyze the machine and figure out which cards are more likely to clump together from a fresh deck, giving themselves an advantage. These days, some of the latest automated shufflers don’t riffle-shuffle, they actually stack the deck according to a randomized computer algorithm, but as we’ll see shortly, even those algorithms can have problems, and those problems can cost the casinos a lot of money if they’re not careful.

The point here is, even the events in physical games that we think of as “random,” aren’t always as random as we give them credit for. There’s not necessarily a lot we can do about this, mind you, at least not without going to great expense to get super-high-quality game components. Paying through the nose for machine-shopped dice is a bit much for most of us that just want a casual game of Catan, so as gamers we have to accept that our games aren’t always completely random and fair… but they’re close enough, they affect all players equally, so we disregard.

Psuedorandom Numbers

Computers have similar problems, because as I mentioned in the introduction, a computer doesn’t have any kind of randomness inside it. It’s all ones and zeros, high and low voltage going through wires or being stored electromagnetically in memory or on a disk somewhere, it’s completely deterministic. And unless you’re willing to get some kind of special hardware that measures some kind of physical phenomenon that varies like some kind of Geiger counter tracking the directions of released radiation or something, which most of us aren’t, your computer is pretty much stuck with this problem that you have to use a deterministic machine to play a non-deterministic game of chance.

We do this through a little bit of mathematics that I won’t cover here (you can do a Google search for pseudorandom number algorithms on your own if you care). All you need to know is that there are some math functions that behave very erratically, without an apparent pattern, and so you just take one of the results from that function and call it a random number.

How do you know which result to take from the function? Well, you determine that randomly. Just kidding; as we said, you can’t really do that. So instead what you do is you have to tell the computer which one to take, and then it’ll start with that one, and then next time you need a random number it’ll take the next one in sequence, then the next, and so on. But because we told it where to start, this is no longer actually random, even though it might look that way to a casual player. The number you tell the computer to start with is called a random number seed, and once you give the computer one seed, it just starts picking random numbers with its formula sequentially from there. So you only have to seed it once. But… this is important… if you give it the same seed you’ll get exactly the same results. Remember, it’s deterministic!

Usually we get around this by picking a random number seed that’s hard for a player to intentionally replicate, like the number of milliseconds that have elapsed since midnight or something. You have to choose carefully though. If, for example, you pick a random number seed that’s just the fractional milliseconds in the system clock (from 0 to 999), then really your game only has 1000 ways of “shuffling”, which is enough that over repeated play a player might see two games that seem suspiciously identical. If your game is meant to be played competitively, a sufficiently determined player could study your game and reach a point where they could predict which random numbers happen and when, and use that to gain an unfair advantage. So we have to be careful when creating these systems.

Pseudorandomness in Online Games: Keeping Clients in Sync

You have to be extra careful with random numbers in an online game, if your players’ machines are generating their own random numbers. I’ve worked on games at two companies that were architected that way, for better or worse. What could happen was this: you would of course have one player (or the server) generate a random number seed, and that seed would be used for both players in a head-to-head game. Then, when either player needed a random number, both machines would have to generate that number so that their random number seed was kept in sync. Occasionally due to a bug, one player might generate a random number and forget to inform their opponent, and now their random number generators are out of sync. The game continues for a few turns until suddenly, one player takes an action that requires a random element, their machine rolls a success, the opponent’s machine (with a different random number) rolls a failure, the two clients compare checksums and fail because they now have a different game state, and both players become convinced the other guy is trying to cheat. Oops. (A better way to do this for PC games is to put all the game logic on the server and use thin clients; for networked handheld console or phone games where there is no server and it’s direct-connect, designate one player’s device to handle all these things and broadcast the game state to the other players.)

Pseudorandomness in Single-Player Games: Saving and Loading

You also have to be careful with pseudorandom numbers in a single-player game, because of the potential for exploits. This is a largely unsolved problem in game design. You can’t really win here, but you can at least pick your poison.

Save Anywhere

Suppose you have a game where the player can save anywhere, any time. Many games do this, because it is convenient for the player. However, nothing stops the player from saving just before they have to make a big random roll, maybe something where they’re highly unlikely to succeed but there’s a big payoff if they do, and keep reloading from save until they succeed. If you re-generate your random number seed each time they reload from save, they will eventually succeed, and they’re not really playing the game that you designed at that point… but on the other hand, they’re using the systems you designed, so they’re not really cheating either. Your carefully balanced probabilities suddenly become unbalanced when a player can keep rerolling until they win.

Save Anywhere, Saved Seed

Okay, so you say, let’s fix that: what if we save the random number seed in the saved game file? Then, if you try to save and reload, you’ll get the same result every time! First, that doesn’t eliminate the problem, it just makes it a little harder; the player just has to find one other random thing to do, like maybe drinking a potion that restores a random number of HP, or maybe choosing their combat actions in a different order or something, and keep trying until they find a combination of actions that works. Second, you’ve now created a new problem: after the player saves, they know exactly what the enemy AI will do on every turn, because once you start with the same random number seed the game now becomes fully deterministic! Sometimes this foreknowledge of exactly how an enemy will act in advance is even more powerful than being able to indefinitely reroll.

Save Points

So you say, okay, let’s limit where the player can save, so that they’ll have to go through some nontrivial amount of gameplay between saves. Now they can theoretically exploit the save system, but in reality they have to redo too much to be able to fully optimize every last action. And then we run into a problem with the opposite type of player: while this mechanism quells the cheating, the honest players now complain that your save system won’t let them walk away when they want, that the game holds them hostage between save points.

Quicksave

Maybe you think to try a save system where the player can save any time, but it erases their save when they start, so they can’t do the old save/reload trick. This seems to work… until the power goes out just as an honest player reaches the final boss, and now they have to start the entire game over from scratch. And they hire a hit man to kill you in your sleep because you deserve it, you evil, evil designer.

Save Anywhere, Limited Times

You give the player the ability to save anywhere, but limit the total number of saves. The original Tomb Raider did this, for example. This allows some exploits, but at least not on every last die-roll. Is this a good compromise?

Oh, by the way, I hope you gave the player a map and told them exactly how far apart they can save on average, and drew some BIG ARROWS on the map pointing to places where the big battles are going to happen, so a player doesn’t have to replay large sections of the map just because they didn’t know ahead of time where the best locations were to save. And then your players will complain that the game is too easy because it gives them all this information about where the challenge is.

Pick Your Poison

As I said, finding the perfect save system is one of those general unsolved problems in game design, just from the perspective of what kind of system is the most fun and enjoyable from the player, and that’s just for deterministic games! When you add pseudorandom numbers, you can see how the problem can get much thornier, so this is something you should be thinking about as a designer while designing the load/save system… because if you don’t, then it’ll be left to some programmer to figure out, God help you, and it’ll probably be based on whatever’s easiest to code and not what’s best for the game or the player.

When Pseudorandom Numbers Fail

Even if you choose a good random number seed, and even if you ignore player exploits and bugs, there are other ways that randomness can go wrong if you choose poor algorithms. For example, suppose you have a deck of cards and want to shuffle them. Here’s a naïve algorithm that most budding game programmers have envisioned at some point:

  1. Start with an unshuffled deck.
  2. Generate a pseudorandom number that corresponds to a card in the deck (so if the deck is 52 cards, generate a whole number between 1 and 52… or 0 and 51, depending on what language you’re using). Call this number A.
  3. Generate a second pseudorandom number, the same way. Call this number B.
  4. Swap the cards in positions A and B in the deck.
  5. Repeat steps 2-4 lots and lots of times.

The problem here is, first off, it takes an obnoxiously long time to get anything resembling a random shuffle. Second, because the card positions start out fixed, and you’re swapping random pairs one at a time, no matter how many times you repeat this there is always a slightly greater chance that you’ll find each card in its original position in the deck, than anywhere else. Think of it this way: if a card is ever swapped, it’ll swap to a random position, so all positions are equally likely for any card that has been swapped at all. Swapping multiple times makes no difference; you’re going from one random place to another, so you still end up equally likely to be in any position. So any card that’s swapped is equally likely to end up anywhere, with equal frequency, as a baseline (this is good). However, there is some non-zero chance that a card will not be swapped, in which case it will remain in its original position, so it is that much more likely to stay where it is. The more times you perform swaps, the lower the chance that a card will remain in its original position, but no matter how much you swap you can never get that probability all the way down to zero. Ironically, this means that the most likely shuffle from this algorithm is to see all cards in exactly the same position they started!

So you can see that even if the pseudorandom numbers generated for this shuffle are perfectly random (or close enough), the shuffle itself isn’t.

Are Your Pseudorandom Numbers Pseudorandom Enough?

There is also, of course, the question of whether your pseudorandom number generator function itself actually produces numbers that are pretty random, or if there are actually some numbers that are more or less likely than others, either due to rounding error or just a poor algorithm. A simple test for this is to use your generator to generate a few thousand pairs of random coordinates on a 2d graph, and plot that graph to see if there’s any noticeable patterns (like the numbers showing up in a lattice pattern, or with noticeable clusters of results or empty spaces). You can expect to see some clustering, of course, because as we learned that’s how random numbers work. But if you repeat the experiment a few times you should see clusters in different areas. This is a way of using Monte Carlo simulation to do a quick visual test for if your pseudorandom numbers are actually random. (There are other more mathematical ways to calculate the exact level of randomness from your generator, but that requires actual math; this is an easier, quick-and-dirty “good enough” test for game design purposes.)

Most of the time this won’t be an issue. Most programming languages and game libraries come with their own built-in pseudorandom number generation functions, and those all use established algorithms that are known to work, and that’s what most programmers use. But if your programmer, for some reason, feels the need to implement a custom pseudorandom number generation function, this is something you will want to test carefully!

Homework

In past weeks, I’ve given you something you can do right now to improve the balance of a game you’re working on, and also a task you can do later for practice. This week I’m going to reverse the order. Get some practice first, then apply it to your project once you’re comfortable with the idea.

For this week’s “homework” I’m going to go over two algorithms for shuffling cards. I have seen some variant of both of these used in actual working code in shipped games before (I won’t say which games, to protect the innocent). In both cases, we saw the perennial complaints from the player base that the deck shuffler was broken, and that there were certain “hotspots” where if you placed a card in that position in your deck, it was more likely to get shuffled to the top and show up in your opening hand. What I want you to do is think about both of these algorithms logically, and then figure out if they work. I’ll give you a hint: one works and one doesn’t. I’ll actually give you the source code, but I’ll also explain both for you in case you don’t know how to program. Keep in mind that this might look like a programming problem, but really it’s a probability calculation: how many ways can you count the different ways a shuffling algorithm shuffles, and do those ways line up evenly with the different permutations of cards in a deck?

Algorithm #1

The first algorithm looks like this:

  1. Start with an unshuffled deck.
  2. Choose a random card from all available cards (so if the deck has 60 cards, choose a pseudorandom number between 1 and 60). Take the card in that position, and swap with card #60. Essentially, this means choose one card randomly to put on the bottom of the “shuffled” deck, then lock it there in place.
  3. Now, take a random card from the remaining cards (between 1 and 59), and swap that card with position #59, putting it on top of the previous one.
  4. Then take another random card from the remaining ones (between 1 and 58), swap that with position #58, and so on.
  5. Keep repeating this until eventually you get down to position #1, which swaps with itself (so it does nothing), and then we’re done.

This is clearly different from how humans normally shuffle a deck, but remember, the purpose here isn’t to emulate a human shuffle; it’s to get a random shuffle, that is, a random ordering of cards.

Algorithm #2

The second algirhtm is similar, but with two minor changes.

  1. Start with an unshuffled deck.
  2. Choose a random card from all available cards (in a 60 card deck, that means a card from 1 to 60). Swap with position #60, putting it on the bottom of the deck.
  3. Choose a random card from all available cards, including those that have been chosen already (so, choose another random card from 1 to 60). Swap with position #59.
  4. Choose another random card from 1 to 60, and swap with position #58.
  5. Keep repeating this until eventually you choose a random number from 1 to 60, swap that card with position #1, and you’re done.
  6. Oh yeah – one last thing. Repeat this entire process (steps 2-5) fifty times. That’ll make it more random.

Hints

How do you approach this, when there are too many different shuffles in a 60-card deck to count? The answer is that you start much simpler. Assume a deck with only three cards in it (assume these cards are all different; call them A, B and C if you want).

First, figure out how many ways there are to order a 3-card deck. There are mathematical tricks for doing this that we haven’t discussed, but you should be able to do this just by trial and error.

Next, look at both algorithms and figure out how many different ways there are for each algorithm to produce a shuffle. So in the first case, you’re choosing from 3 cards, then choosing from 2, then choosing from 1. In the second case you’re choosing from 3, then choosing from 3, then choosing from 3. Compare the list of actual possible orderings of the deck, that is, all the different unique ways the deck can be shuffled… then compare to all the different ways a deck can be shuffled by these algorithms. You’ll find that one of them produces a random shuffle (well, as random as your pseudorandom number generator is, anyway) and one actually favors certain shuffles over others. And if it works that way for a 3-card deck, assume it’s similarly random (or not) for larger decks. If you want to go through the math with larger decks, be my guest, but you shouldn’t have to.

If You’re Working on a Game Now…

Video Games

Once you’ve done that, if you’re working on a game that involves computers, take a look at the pseudorandom numbers and how your program uses them. In particular, if you use a series of pseudorandom numbers to do something like deck shuffling, make sure you’re doing it in a way that’s actually random; and if you’re using a nonstandard way to generate pseudorandom numbers, check that by graphing a bunch of pairs of random coordinates to check for undesirable patterns. Lastly, examine how your random number seed is stored in the game state; if it’s a multiplayer game, is it stored separately in different clients or on a single server? If it’s a single-player game, does your save-game system work in such a way that a player can save, attempt a high-risk random roll, and keep reloading from save until it succeeds?

All Games, Digital or Non-Digital

Another thing to do, whether you’re designing a board game or video game, is to take a look at the random mechanics in your game (if there are any) and ask yourself some questions:

  • Is the game dominated more by skill or luck, or is it an even mix?
  • Is the level of skill and luck in the game appropriate to the game’s target audience, or should the game lean a bit more to one side or the other?
  • What kinds of probability fallacies are your players likely to observe when they play? Can you design your game differently to change the player perception of how fair and how random your game is? Should you?