Level 10: Final Boss

September 8, 2010

This Week

Welcome to the final week of the season. This week I didn’t know ahead of time what to do, so I intentionally left it as an unknown on the syllabus as a catch-all for anything interesting that might have come up over the summer. As it turns out, there are four main topics I wanted to cover today, making this the longest post of all of them, so if you have limited time I suggest bookmarking this and coming back later. First I’d like to talk a bit about economic systems in games, and how to balance a system where the players are the ones in control of it through manual wealth generation and trading. Then, I’ll talk about some common multiplayer game balance problems that just didn’t fit anywhere else in the previous nine weeks. Third, I’ll get a bit technical and share a few tips and tricks in Excel. Lastly, I’ll return to last summer with this whole concept of “fun” and how this whole topic of game balance fits into the bigger picture of game design, because for all of the depth that we’ve gone into still feels like a pretty narrow topic sometimes.

Economic Systems

What is an economic system?

First, we use the word “economy” a lot, even in everyday life, so I should define it to be clear. In games, I’ll use the word “economy” to describe any in-game resource. That’s a pretty broad term, as you could take it to mean that the pieces in Chess are a “piece economy” – and I’d argue that yes, they could be thought of that way, it’s just not a particularly interesting economy because the resources can’t really be created, destroyed or transferred in any meaningful way. Most economies that we think of as such, though, have one or more of these mechanics:

  • Resource generation, where players craft or receive resources over time
  • Resource destruction, where players either burn resources for some use in the game, or convert one type of resource to another.
  • Resource trading, where players can transfer resources among themselves, usually involving some kind of negotiation or haggling.
  • Limited zero-sum resources, so that one player generating a resource for themselves reduces the available pool of resources for everyone else.

Still, any of those elements individually might be missing and we’d still think of it as an economy.

Like the creation of level design tools, tabletop RPGs, or metrics, creating an economic system in your game is a third-order design activity, which can make it pretty challenging. You’re not just creating a system that your players experience. You’re creating a system that influences player behavior, but then the players themselves are creating another social system within your economic system, and it is the combination of the two that the players actually experience. For example, in Settlers of Catan, players are regularly trading resources between them, but the relative prices of each resource are always fluctuating based on what each individual player needs at any given time (and usually, all the players need different things, with different levels of desperation and different ability to pay higher prices). The good news is that with economies, at least, a lot of human behavior can be predicted. The other good news is that in-game economies have a lot of little “design knobs” for us designers to change to modify the game experience, so we have a lot of options. I’ll be going over those options today.

Supply and Demand

First, a brief lesson from Economics 101 that we need to be aware of is the law of supply and demand, which some of you have probably heard of. We assume the simplest case possible: an economy with one resource, a really large population of people who produce the resource and want to sell it, and another really large population of people who consume the resource and want to buy it. We’ll also assume for our purposes that any single unit of the resource is identical to any other, so consumers don’t have to worry about choosing between different “brands” or anything.

The sellers each have a minimum price at which they’re willing to part with their goods. Maybe some of them have lower production costs or lower costs of living than others, so they can accept less of a price and still stay in business. Maybe others have a more expensive storefront, or they’re just greedy, so they demand a higher minimum price. At any rate, we can draw a supply curve on a graph that says that for any given price (on the x-axis), a certain number or percentage of sellers are willing to sell at that price (on the y-axis). So, maybe at $1, only two sellers in the world can part with their goods at that price, but at $5 maybe there are ten sellers, and at $20 you’ve got a thousand sellers, and eventually if you go up to $100 every single seller would be willing to sell at that price. Basically, the only thing you need to know about supply curves is that as the price increases, the supply increases; if ten people would sell their good at $5, then at $5.01 you know that at least those ten sellers would still accept (if they sold at $5 then they would clearly sell at $5.01), and you might have some more sellers that finally break down and say, okay, for that extra penny we’re in.

Now, on the other side, it works the same but in reverse. The consumers all have a maximum price that they’re willing (or able) to pay, for whatever reason. And we can draw a demand curve on the same graph that shows for any given price, how many people are willing to buy at that price. And unlike the supply curve, the demand curve is always decreasing; if ten people would buy a good at $5, then at $5.01 you might keep all ten people if you’re lucky, or some of them might drop out and say that’s too rich for their blood, but you certainly aren’t going to find anyone who wouldn’t buy at a lower price but would buy at a more expensive price.

Now of course, in the real world, these assumptions aren’t always true. More teenagers would rather buy $50 shoes than $20 shoes, because price has social cred. And some sellers might not be willing to sell at exorbitantly high prices because they’d consider that unethical, and they’d rather sell for less (or go out of business) than bleed their customers dry. But for our purposes, we can assume that most of the time in our games, supply curves will increase and demand curves will decrease as the price gets more expensive. And here’s the cool part: wherever the two curves cross, will generally turn out to be the actual market price that the players all somehow collectively agree to. Even if the players don’t know the curves, the market price will go there as if by magic. It won’t happen instantly if the players have incomplete information, but it does happen pretty fast, because players who sell at below the current market price will start seeing other people selling at higher prices (because they have to) and say, hey, if they can sell for more than I should be able to also! And likewise, if a consumer pays a lot for something and then sees the guy sitting next to them who paid half what they did for the same resource, they’re going to demand to pay a whole lot less next time.

Now, this can be interesting in online games that have resource markets. If you play an online game where players can sell or trade in-game items for in-game money, see if either the developer or a fansite maintains a historical list of selling prices (not unlike a ticker symbol in the stock market). If so, you’ll notice that the prices change over time slightly. So you might wonder what the deal is: why do prices fluctuate? And the answer is that the supply and demand are changing slightly over time. Supply changes as players craft items and put them on sale, and that supply is constantly changing; and demand also changes, because at any given time a different set of players is going to be online shopping for any given item. You can see this with other games that have any kind of resource buying and selling. And because the player population isn’t infinite, these things aren’t perfectly efficient, so you get unequal amounts of each item being produced and consumed over time.

Now, that points us to another interesting thing about economies: the fewer the players, the more we’ll tend to see prices fluctuate, because a single player controls more and more of the production or consumption. This is why the prices you’ll see for one Clay in the Catan games change a lot from game to game (or even within a single game) relative to the price of some piece of epic loot in World of Warcraft.

Now, this isn’t really something you can control directly as the game designer, but at least you can predict it. It also means if you’re designing a trading board game for 3 to 6 players, you can expect to see more drastic price fluctuations with fewer players, and you might decide to add some extra rules with less players to account for that if a stable market is important to the functioning of your game.

Multiple resources

Things get more interesting when we have multiple goods, because the demand curves can affect one another. For example, suppose you have two resources, but one can be substituted for another – maybe one gives you +50 health, and the other gives you +5 mana which you can use as a healing spell to get +50 health, so there’s two different items (with similar uses) so if one is really expensive and one is really cheap you can just buy the cheap one. Even if the two aren’t perfect substitutes, players may be willing to accept an imperfect substitute if the price is sufficiently lower than the market value for the thing they actually want, and the price difference between what people will pay for the good and what they’ll pay for the substitute tells you how efficient that substitute is (that is, how “perfect” it substitutes for the original).

On the flip side, you can also have multiple goods where the demand for one increases demand for the other, because they work better if you buy them all as a set (this is sort of the opposite of substitutes). For example, in games where collecting a complete set of matching gear gives your character a stat bonus, or where you can turn in one of each resource for a bonus, a greater demand for one resource pulls up the demand for all of the others… and once a player has some of the resources in the set, their demand for the others will increase even more because they’re already part of the way there.

By creating resources that are meant to be perfect or imperfect substitutes, or several resources that naturally “go together” with each other, you can change the demand (and therefore the market price) of each of them.

Marginal pricing

As we discussed a long time ago with numeric systems, sometimes demand is a function of how much of a good you already have. If you have none of a particular resource, the first one might be a big deal for you, but if you have a thousand of that resource then one more isn’t as meaningful to you, so demand may actually be on a decreasing curve based on how many of the thing you already have. Or maybe if you can use lots of resources more efficiently to get larger bonuses, it might be that collecting one resource means your demand for more of that resource increases. The same is true on the supply side, where producing lots of a given resource might be more or less expensive per-unit than producing smaller amounts. So you can add these kinds of mechanics in order to influence the price; for example, if you give increasing returns for each additional good that a player possesses, you’ll tend to see the game quickly organize into players going for monopolies of individual goods, as once a player has the majority of a good in the game they’re going to want to buy the rest of them. As an example, if you have decreasing returns for players if they collect a lot of one good, then adding a decreasing cost for producing the good might make a lot of sense if you want the price of that good to be a little more stable.

Scarcity

I probably don’t need to tell you this, but if the total goods are limited, that increases demand. You see this exploited all the time in marketing, when a company wants you to believe that they only have limited quantities of something, so that you’ll buy now (even at a higher price) because you don’t want to miss your chance. So you can really change the feeling of a game just by changing whether a given resource is limited or infinite.

As an example, consider a first-person-view shooting video game where you have limited ammunition. First, imagine it is strictly limited: you get what you find, but that’s it. A game like that feels more like a survival-horror game, where the player only uses their ammo cautiously, because they never know when they’ll find extra or when they’ll run out. Compare to a game where you have enemies that respawn in each area, random item drops, and stores where you can sell the random drops and buy as much extra ammo as you need. In a game like that, a player is going to be a lot more willing to experiment with different weapons, because they know they’ll get all of their ammo back when they reach the next ammo shop, which makes the game feel more like a typical FPS. Now compare that with a game where you have completely unlimited ammo so it’s not even a resource or an economy anymore, where you can expect the player to be shooting more or less constantly, like some of the more recent action-oriented FPSs. None of these methods is “right” or “wrong” but they all give very different player experiences, so my point is just that you increase demand for a good (and decrease the desire to actually consume it now because you might need it later) the more limited it is.

If the resources of your game drive players towards a victory condition, making the resource limited is a great way to control game length. For example, in most RTS games, the board has a limited number of places where players can mine a limited amount of resources, which they then use to create units and structures on the map. Since the core resources that are required to produce units are themselves limited, eventually players will run out, and once it runs out the players will be unable to produce any more units, giving the game a natural “time limit” of sorts. By adjusting the amounts of resources on the map, you can control when this happens; if the players drain the board dry of resources in the first 5 minutes, you’re going to have pretty short games… but if it takes an hour to deplete even the starting resources near your start location, then the players will probably come to a resolution through military force before resource depletion forces the issue, and the fact that they’re limited at all is just there to avoid an infinite stalemate, essentially to place an upper limit on game length.

With multiplayer games in a closed economy, you also want to be very careful with strictly limited goods, because there is sometimes the possibility that a single player will collect all of a good, essentially preventing anyone else from using it, and you should decide as a designer if that should be possible, if it’s desirable, and if not what you can do to prevent it. For example, if resources do no good until a player actually uses them (and using them puts them back in the public supply), then this is probably not going to be a problem, because the player who gets a monopoly on the good has incentive to spend them, which in turn removes the monopoly.

Open and closed economies

In systems, we say a system is “open” if it can be influenced by things from outside the system itself, and it is “closed” if the system is completely self-contained. Economies are systems, and an open economy has different design considerations than a closed economy.

Most game economies are closed systems; you can generate or spend money within the game, but that’s it, and in fact some people get very uncomfortable if you try to change it to an open system: next time you play Monopoly, try offering one of your opponents a real-world cash dollar in exchange for 500 of their Monopoly dollars as a trade, and see what happens — at least one other player will probably become very upset!

Closed systems are a lot easier to manage from a design standpoint, because we have complete control as designers over the system, we know how the system works, and we can predict how changes in the system will affect the game. Open economies are a lot harder, because we don’t necessarily have control over the system anymore.

A simple example of an open economy in a game is in Poker if additional player buy-ins are allowed. If players can bring as much money as they want to the table, a sufficiently rich player could have an unfair advantage; if skill is equal, they could just keep buying more chips until the luck in the game turns their way. To solve this balance problem, usually additional buy-ins are restricted or disallowed in tournament play.

Another place where this can be a problem is CCGs, where a player spending more money can buy more cards and have a greater collection. Ideally, for the game to be balanced, we would want larger collections to give players more options but not more power, which is why I think rarity shouldn’t be a factor in the cost curve of such a game, at least if you want to maximize your player base. If more money always wins, you set up an in-game economy that essentially has a minimum bar for money to spend if you want to be competitive, and in the real world we also have supply and demand curves, and the higher your initial required buy-in is, the fewer people who will be willing to pay (and thus the smaller your player base).

There are other games where you can buy in-game stuff with real-world cash; this is a typical pattern for free-to-play MMOs and Facebook games, and developers have to be careful with exactly what the player can and can’t buy; if the player can purchase an advantage over their opponents, especially in games that are competitive by nature, that can make the game very unbalanced very quickly. (It’s less of an issue in games like FarmVille where there isn’t really much competition anyway.)

Some designers intentionally unbalance their game in this way, assuming that if they create a financial incentive to gain a gameplay advantage, players will pay huge amounts; and to be fair, some of them do, and if the game itself isn’t very compelling in its core mechanics then this might be the only thing you can fall back on to make money, but if you set out to do this from the start I would call it lazy design. A better method is to create a game that’s actually worth playing for anyone, and then offer to trade money for time (so, maybe you get your next unlock after another couple of hours of gameplay, and the gameplay is fun enough that you can do that without feeling like the game is arbitrarily forcing you to grind… but if you want to skip ahead by paying a couple bucks, we’ll let you do that). In this way, money doesn’t give any automatic gameplay advantage, it just speeds up the progression that’s already there.

One final example of an open economy is in any game, most commonly MMOs, where players can trade or “gift” resources within the game, because in any of those cases you can be sure a secondary economy will emerge where players will exchange real-world money for virtual stuff. Just google “World of Warcraft Gold” and you’ll probably find a few hundred websites where you can either purchase Gold for real-world cash, or sell Gold to them and get paid in cash. There are a few options you can consider if you’re designing an online game like this with any kind of trade mechanic:

  • You could just say that trading items for cash is against your Terms of Service, and that any player found to have done so will have their account terminated. This is mostly a problem because it’s a huge support headache: you get all kinds of players complaining to you that their account was banned, and just sending them a form email with the TOS still takes time. In some cases, like Diablo where there isn’t really an in-game trading mechanism and instead the players just drop stuff on the ground and then go pick it up, it can also be really hard to track this. If it’s easy to track (because trades are centralized somewhere), if you really don’t want people to buy in-game goods for cash, you should ask yourself why your trading system that you designed and built even allows it.
  • You could say that an open economy is okay, but you don’t support it, so if someone takes your money and doesn’t give you the goods, it’s the player’s problem and not the developer’s. Unfortunately, it is still the developer’s problem, because you will receive all kinds of customer support emails from players claiming they were scammed, and whether you fix it or not you still have to deal with the email volume. If you don’t fix it, then you have to accept you’re going to lose some customers and generate some community badwill. If you do fix it, then accept that players will think of you as a “safety net” which actually makes them more likely to get scammed, since they’ll trust other people by assuming that if the other person isn’t honest, they’ll just send an email to support to get it fixed. Trying to enforce sanctions against scammers is an unwinnable game of whack-a-mole.
  • You can formalize trading within your game, including the ability to accept cash payments. The good news for this is that players have no excuses; my understanding is that when Sony Online did this for some of their games, the huge win for them was something like a 40% reduction in customer support costs, which can be significant for a large game. The bad news is that you will want to contact a lawyer on this, to make sure you don’t accidentally run afoul of any national banking laws since you are now storing players’ money.

You’ll also want to consider whether players are allowed to sell their entire character, password and all. For Facebook games this is less of an issue because a Facebook account links to all the games and it’s not so easy for a player to give that away. For an MMO where each player has an individual account on your server that isn’t linked to anything else, this is something that will happen, so you need to decide how to deal with that. (On the bright side, selling a whole character doesn’t unbalance the game.)

In any case, you again want to make sure that whatever players can trade in game does not unbalance the game if a single player uses cash to buy lots of in-game stuff. One common pattern to avoid this is to place restrictions on items, for example maybe you can purchase a really cool suit of armor but you have to be at least Level 25 to wear it.

Inflation

Now, remember from before that the demand curve is based on each player’s maximum willingness to pay for some resource. Normally we’d like to think of the demand curve as this fixed thing, maybe fluctuating slightly if a different set of players or situations happen to be online, but over time it should balance out. But there are a few situations that can permanently shift the demand curve in one direction or another, and the most important for our purpose is when each player’s maximum willingness to pay increases.

Why would you change the amount you’re willing to pay? Mostly, if you have more purchasing power. If I doubled your income overnight but Starbucks raised the price of its coffee from $5 to $6, if you liked their coffee before you would probably be willing to pay the new price, because you can afford it.

How does this work in games? Consider a game with a positive-sum economy: that is, it is possible for me to generate wealth and goods without someone else losing them. The cash economy in the board game Monopoly is like this, as we’ve discussed before; so is the commodity economy in Catan, as is the gold economy in most MMOs. This means that over time, players get richer. With more total money in the economy (and especially, more total money per player on average), we see what is called inflation: the demand curve shifts to the right as more people are willing to pay higher prices, which then increases the market price of each good to compensate.

In Catan, this doesn’t affect the balance of the game; by the time you’re in the late game and willing to trade vast quantities of stuff for what you need, you’re at the point where you’re so close to winning that no one else is willing to trade with you anyway. In Monopoly the main problem, as I mentioned earlier, is that the economy is positive-sum but the object of the game is to bankrupt your opponents; here we see that one possible solution to this is to change the victory condition to “be the first player to get $2500” or something like that. In MMOs, inflation isn’t a problem for the existing players, because after all they are willing to pay more; however, it is a major problem for new players, who enter the game to find that they’re earning one gold piece for every five hours of play, and anything worth having in the game costs millions of gold, and they can never really catch up because even once they start earning more money, inflation will just continue. So if you’re running an MMO where players can enter and exit the game freely, inflation is a big long-term problem you need to think about. There are two ways to fix this: reduce the positive-sum nature of the economy, or add negative-sum elements to counteract the positive-sum ones.

Negative-sum elements are sometimes called “money sinks,” that is, some kind of mechanism that permanently removes money from the player economy. The trick is balancing the two so that on average, it cancels out; a good way to know this is to actually take metrics on the total sum of money in the game, and the average money per person, and track that over time to see if it’s increasing or decreasing. Money sinks take many forms:

  • Any money paid to NPC shopkeepers for anything, especially if that something is a consumable item that the player uses and then it’s gone for good.
  • Any money players have to pay as maintenance and upkeep; for example, having to pay gold to repair your weapon and armor periodically.
  • Losing some of your money (or items or stats or other things that cost money to replace) when you die in the game.
  • Offering limited quantities of high-status items, especially if those items are purely cosmetic in nature and not something that gives a gameplay advantage, which can remove large amounts of cash from the economy when a few players buy them.
  • While I don’t know of any games that do this, it works in real life: have an “adventurer’s tax” that all players have to pay as a periodic percentage of their wealth. This not only gives an incentive to spend, but it also penalizes the players who are most at fault for the inflation. Another alternative would be to actually redistribute the wealth, so instead of just removing money from the economy, you could transfer some money from the richest players and distribute it among the poorest; that on its own would be zero-sum and wouldn’t necessarily fix the inflation problem, but it would at least give the newer players a chance to catch up and increase their wealth over time.

To reduce the positive-sum nature of the economy is a bit harder, because players are used to going out there, killing monsters and getting treasure drops. If you make the monsters limited (so they don’t respawn) then the world will become depopulated of monsters very quickly. If you give no rewards, players will wonder why they’re bothering to kill monsters at all. In theory you could do something like this:

  • Monsters drop treasure but not gold, and players can’t sell or trade the treasure that’s dropped; so it might make their current equipment a little better, but that’s about it.
  • Players receive gold from completing quests, but only the first time each quest, so the gold they have at any given point in the game is limited. Players can’t trade gold between themselves.
  • Players can use the gold to buy special items in shops, so essentially it is like the players have a choice of what special advantages to buy for their character.

One other, final solution here is the occasional server reset, when everyone loses everything and has to start over. This doesn’t solve the inflation problem – a new player coming in at the end of a cycle has no chance of catching up – but it does at least mean that if they wait for everything to reset they’ll have as good a chance as anyone else after the reset.

Trading

Some games use trading and bartering mechanics extensively within their economies. Trading can be a very interesting mechanic if players have a reason to trade; usually that reason is that you have multiple goods, and each good is more valuable to some players than others. In Settlers of Catan, sheep are nearly useless to you if you want to build cities, but they’re great for settlements. In Monopoly, a single color property isn’t that valuable, but it becomes much more powerful if you own the other matching ones. In World of Warcraft, a piece of gear that can’t be equipped by your character class isn’t very useful to you, no matter how big the stat bonuses are for someone else. By giving each player an assortment of things that are better for someone else than for them, you give players a reason to trade resources.

Trading mechanics usually serve as a negative-feedback loop, especially within a closed economy. Players are generally more willing to offer favorable trades to those who are behind, while they expect to get a better deal from someone who is ahead (or else they won’t trade at all).

There are a lot of ways to include trading in your game; it isn’t as simple as just saying “players can trade”… but this is a good thing, because it gives you a lot of design control over the player experience. Here are a few options to consider:

  • Can players make deals for future actions as part of the trade (“I’ll give you X now for Y now and Z later”)? If so, are future deals binding, or can players renege?
    • Disallowing future deals makes trades simpler; with future considerations, players can “buy on credit” so to speak, which tends to complicate trades. On the other hand, it also gives the players a lot more power to strike interesting deals.
    • If future deals are non-binding, players will tend to be a lot more cautious and paranoid about making them. Think about whether you want players to be inherently mistrustful and suspicious of each other, or whether you want to give players every incentive to find ways of cooperating.
  • Can players only trade certain resources but not others? For example, in Catan you can trade resource cards but not victory points or progress cards; in Monopoly you can trade anything except developed properties; in some other games you can trade anything and everything.
    • Resource that are tradable are of course a lot more fluid than others. Some resources may be so powerful (like Victory Points) that no one in their right mind would want to trade them, so simply making them untradeable stops the players from ruining the game by making bad trades.
  • Can players only trade at certain times? In Catan you can only trade with the active player before they build; in Bohnanza there is a trading phase as part of each player’s turn; in Monopoly you can trade with anyone at any time.
    • If players can trade at any time, consider if they can trade at “instant speed” in response to a game event, because sometimes the ability to react to game events can become unbalanced. For example, in Monopoly you could theoretically avoid Income Tax by trading all of your stuff to another player, then taking it all back after landing on the space, and you could offer the other player a lesser amount (say, 5%) in exchange for the service of providing a tax shelter. In short, be very clear about exactly when players can and can’t trade.
    • If trading events in the game are infrequent (say, you can only trade every few turns or something), expect trading phases to take longer, as players have had more time to amass tradable resources so they will probably have a lot of deals to make.
    • If this is a problem, consider adding a timer where players only have so much time to make deals within the trading phase.
  • Does the game require all trades to be even (e.g. one card for one card) or are uneven trades allowed (Catan can have uneven numbers of cards, but at least one per side; other games might allow a complete “gift” of a trade)?
    • Requiring even trades places restrictions on what players can do and will reduce the number of trades made, but it will also cause trading to move along faster because there’s less room for haggling, and there’s also less opportunity for one weak player to make a bad trade that hands the game to someone else.
    • I could even imagine a game where uneven trades are enforced: if you trade at all, someone must get the shaft.
  • Are trades limited in quantity, or unlimited? A specific problem here is the potential for the “kingmaker” problem, where one player realizes they can’t win, but the top two players are in a close game, and the losing player can choose to “gift” all of their stuff to one of the top two players to allow one of them to win. Sometimes social pressure prevents people from doing something like this, but you want to be very careful in tournament situations and other “official” games where the economic incentive of prizes might trump good sportsmanship (I actually played in a tournament game once where top prize was $20, second prize was $10, and I was in a position to decide who got what, so I auctioned off my pieces to the highest bidder.)
  • Are trades direct, or indirect? Usually a trade just happens, I give you X and you give me Y, but it’s also possible to have some kind of “trade tax” where maybe 10% of a gift or trade is removed and given to the bank for example, to limit trades. This seems strange – why offer trading as a mechanic at all, if you’re then going to disincentivize it? But in some games trading may be so powerful (if two players form a trading coalition for their mutual benefit, allowing them both to pull ahead of all other players, for example) to the point where you might need to apply some restrictions just to prevent trades from dominating the rest of the game.
  • Is there a way for any player to force a trade with another player against their will? Trades usually require both players to agree on the terms, but you can include mechanisms to allow one player to force a trade on another player under certain conditions. For instance, in a set collection game, you might allow a player to able to force-trade a more-valuable single item for a less-valuable one from an opponent, once per game.

Auctions

Auction mechanics are a special case of trading, when one player auctions off their stuff to the other players, or where the “bank” creates an item out of thin air and it is auctioned to the highest bidding player. Auctions often serve as a self-balancing mechanic in that the players are ultimately deciding on the price of how much something is worth, so if you don’t know what to cost something you can put it up for auction and let the players decide. (However, this is lazy design; auctions work the best when the actual cost is variable, different between players, and situational, so that figuring out how much it’s worth is something that changes from game to game, so that the players are actually making interesting choices each time. With an effect that is always worth the same amount, “auction” is meaningless once players figure out how much it’s worth; they’ll just bid what it’s worth and be done with it.)

Auctions are a very pure form of “willingness to pay” because each player has to decide what they’re actually willing to pay so that they can make an appropriate bid. A lot of times there are meta-considerations: not just “how much do I want this for myself” but also “how much do I not want one of my opponents to get it because it would give them too much power” or even “I don’t want this, but I want an opponent to pay more for it, so I’ll bid up the price and take the chance that I won’t get stuck with it at the end.”

An interesting point with auctions is that normally if the auction goes to the highest bidder, that the item up for auction sells for the highest willingness to pay among all of the bidders – that’s certainly what the person auctioning the item wants, is to get the highest price. But in reality, the actual auction price is usually somewhere between the highest and second highest willingness to pay, and in fact it’s usually closer to the second-highest, although that depends on the auction type: sometimes you end up selling for much lower.

Just as there are many kinds of trading, there are also many kinds of auctions. Here’s a few examples:

  • Open auction. This is the type most people think of when they think of auctions, where any player can call a higher bid at any time, and when no other bids happen someone says “going once, going twice, sold.” If everyone refuses to bid beyond their own maximum willingness to pay, the person with the highest willingness will purchase the item for one unit more than the second-highest willingness, making this auction inefficient (in the sense that you’d ideally want the item to go for the highest price), but as we’ll see that is a problem with most auctions.
  • Fixed price auction. In turn order, each player is offered the option to purchase the item or decline. It goes around until someone accepts, or everyone declines. This gives an advantage to the first player, who gets the option to buy (or not) before anyone else – and if it’s offered at less than the first player’s willingness to pay, they get to keep the extra in their own pocket, so how efficient this auction is depends on how well the fixed price is chosen.
  • Circle auction. In turn order, each player can either make a bid (higher than the previous one) or pass. It goes around once, with the final player deciding whether to bid one unit higher than the current highest bid, or let the other player take it. This gives an advantage to the last player, since it is a fixed-price auction for them and they don’t have to worry about being outbid, so they may be able to offer less than their top willingness to pay.
  • Silent auction. Here, everyone secretly and simultaneously chooses their bid, all reveal at once, and highest bid wins. You need to include some mechanism of resolving ties, since sometimes two or more players will choose the same highest bid. This can often have some intransitive qualities to it, as players are not only trying to figure out their own maximum willingness to pay, but also other players’ willingness. If the item for auction is more valuable for you than the other players, you may bid lower than your maximum willingness to pay because you expect other players’ bids to be lower, so you expect to bid low and still win.
  • Dutch auction. These are rare in the States as it requires some kind of special equipment. You have an item that starts for bid at high price, and there’s some kind of timer that counts down the price at a fixed rate (say, dropping by $1 per second, or something). The first player to accept at the current price wins. In theory this means as soon as the price hits the top player’s maximum willingness to pay, they should accept, but there may be some interesting tension if they’re willing to wait (and possibly lose out on the item) in an attempt to get a better price. If players can “read” each others’ faces in real-time to try to figure out who is interested and who isn’t, there may be some bluffing involved here.

Even once you decide on an auction format, there are a number of ways to auction items:

  • The most common is that there’s a single item up for auction at a time; the top bidder receives the item, and no one else gets anything.
  • Sometimes an entire set of items are auctioned off at the same time in draft form: top bid simply gets first pick, then second-highest bidder, and so on. The lowest bidder gets the one thing no one else wanted… or sometimes they get nothing at all, if you want to give players some incentive to bid higher. In other words, even if a player doesn’t particularly want any given item, they may be willing to pay a small amount in order to avoid getting stuck with nothing. Conversely, if you want to give players an incentive to save their money by bidding zero, giving the last-place bidder a “free” item is a good way to do that – but of course if multiple players bid zero, you’ll need some way of breaking the tie.
  • If it’s important to have negative feedback on auction wins so that a single player shouldn’t win too many auctions in a row, giving a bonus to everyone who didn’t win (or even just the bottom bidder) for winning the next auction is a way to do that.
  • Some auctions are what are called negative auctions because they work in reverse: instead of something good happening to the highest bidder, something bad happens to the lowest bidder. In this case players are bidding for the right to not have something bad happen to them. This can be combined with other auctions: if the top bidder takes something from the bottom bidder, that gives players an incentive to bid high even if they don’t want anything. The auction game Fist of Dragonstones had a really interesting variant of this, where the top bidder takes something from the second highest bidder, meaning that if you bid for the auction at all you wanted to be sure you won and didn’t come in second place! On the other hand, if only one person bids for it, then everyone else is in second place, and the bidder can choose to take from any of their opponents, so sometimes it can be dangerous to not bid as well.

Even once you decide who gets what, there are several ways to define who pays their bid:

  • The most common for a single-item auction is that the top bidder pays their bid, and all other players spend nothing.
  • If the top two players pay their bid (but the top bidder gets the item and the second-highest bidder gets nothing), making low or medium bids suddenly becomes very dangerous, turning the auction into an intransitive mechanic where you either want to bid higher than anyone else, or low enough that you don’t lose anything. This is most common in silent auctions, where players are never sure of exactly what their opponents are bidding. If you do something like this with an open auction things can get out of hand very quickly, as each of the top two bidders is better off paying the marginal cost of outbidding their opponent than losing their current stake – for example, if you auction off a dollar bill in this way and (say) the top bid is 99 cents and the next highest is 98 cents, the second-highest bidder has an incentive to bid a dollar (which lets them break even rather than lose 98 cents)… which then gives incentive to the 99-cent bidder to paradoxically bid $1.01 (because in such a situation they’re only losing one cent rather than 99 cents), and if both players follow this logic they could be outbidding each other indefinitely!
  • The top bidder can win the auction but only pay an amount equal to the second-highest bid. Game theory tells us, through some math I won’t repeat here, that in a silent auction with these rules, the best strategy is to bid your maximum willingness to pay.
  • In some cases, particularly when every player gets something and highest bid just chooses first, you may want to have all players pay their bid. If only the top player gets anything, that makes it dangerous to bid if you’re not hoping to win – although in a series of such auctions, a player may choose to either go “all in” on one or two auctions to guarantee winning them, or else they may spread them out and try to take a lot of things for cheap when no one else bids.
  • If only the top and bottom bidders pay, players may again have incentive to bid higher than normal, because they’d be happy to win but even happier if they don’t have to lose anything. You may want to force players to bid a certain minimum, so there is always at least something at stake to lose (otherwise a player could bid zero and pay no penalty)… although if zero bids are possible, that makes low bids appear safer as there’s always the chance someone will protect you from losing your bid by bidding zero themselves.
  • If everyone pays their bid except the lowest bidder, that actually gives players an incentive to bid really high or really low.

Even if you know who has to pay what bid, you have to decide what happens to the money from the auction:

  • Usually it’s just paid “to the bank” – that is, it’s removed from the economy, leading to deflation.
  • It could be paid to some kind of holding block that collects auction money, and is then redistributed to one or more players later in the game when some condition is met.
  • The winning bid may also be paid to one or more other players, making the auction partially or completely zero-sum, in a number of ways. Maybe the bid is divided and evenly split among all other players. The board game Lascaux has an interesting bid mechanic: each player pays a chip to stay in the auction, in turn order, and on their turn a player can choose instead to drop out of the auction and take all of the chips the players have collectively paid so far. The auction continues with the remaining players, thus it is up to the player if it’s a good enough time to drop out (and gain enough auction chips to win more auctions later) or if it’s worth staying in for one more go-round (hoping everyone else will stay in, thus increasing your take when you drop out), or even continuing to stay in with the hopes of winning the auction.

Lastly, no matter what kind of auction there is, you have to decide what happens if no one bids:

  • It could be that one of the players gets the item for free (or minimal cost). If that player is known in advance to everyone, it gives other players an incentive to bid just to prevent someone else for getting an item for free. When the players know that the default state is one of their opponents getting a bonus, it’s often an incentive to open the bidding. If players don’t know who it is (say, a random player gets the item for free) then players may be more likely to not bid, as they have just as good a chance as anyone else.
  • As an alternative, the auction could have additional incentives added, and then repeated. If one resource is being auctioned off and no one wants it, a second resource could be added, and then the set auctioned… and then if no one wants that, add a third resource, and so on until someone finally thinks it’s worth it.
  • Or, what usually happens is the item is thrown out, no one gets it, and the game continues from there as if the auction never happened.

Needless to say, there are a lot of considerations when setting up an in-game economy! Like most things, there are no right or wrong answers here, but hopefully I’ve at least given you a few different options to consider, and the implications of those.

Solving common problems in multiplayer

This next section didn’t seem to fit anywhere else in the course so I’m mentioning it here, so if it seems out of place, that’s why. In multiplayer free-for-all games where there can only be one winner, there are a few problems that come up pretty frequently, that can either be considered a balance or imbalance depending on the game, but they are things that usually aren’t much fun so you want to be very careful of them.

Turtling

One problem, especially in war games or other games where players attack each other directly, is that if you get in a fight with another player – even if you win – it still weakens both of you relative to everyone else. The wise player reacts to this by doing their best to not get in any fights, instead building up their defenses to make them a less tempting target, and then when all the other players get into fights with each other they swoop in and mop up the pieces when everyone else is in a weakened state. The problem here is that the system is essentially rewarding the players for not interacting with each other, and if the interaction is the fun part of the game then you can hopefully see where this is something that needs fixing.

The game balance problem is that attacking – you know, actually playing the game – is not the optimal strategy. The most direct solution is to reward or incentivize aggression. A simple example is in the board game RISK, where attackers and defenders both lose armies so you’d normally want to just not attack, so the game goes to great lengths to avoid turtling by giving incentives to attack: more territories controlled means you get more armies next turn if you hold onto them, same for continent bonuses, and let’s not forget the cards you can turn in for armies but you only get a card if you attack.

Another solution is to force the issue by making it essentially impossible to not attack. As an example, Plague and Pestilence and Family Business are both light card games where you draw 1 then play 1. A few cards are defensive in nature, but most cards hurt opponents, and you must play one each turn (choosing a target opponent, even), so before too long you’re going to be forced to attack someone else – it’s simply not possible to avoid making enemies.

Kill the leader and Sandbagging

One common problem in games where players can directly attack each other, especially when it’s very clear who is in the lead, is that everyone by default will gang up on the leader. On the one hand, this can serve as a useful negative feedback loop to your game, making sure no one gets too much ahead. On the other hand, players tend to overshoot (so the leader isn’t just kept in check, they’re totally destroyed), and it ends up feeling like a punishment to be doing well.

As a response to this problem, a new dynamic emerges, which I’ve seen called sandbagging. The idea is that if it’s dangerous to be the leader, then you want to be in second place. If a player is doing well enough that they’re in danger of taking the lead, they will intentionally play suboptimally in order to not make themselves a target. As with turtling, the problem here is that players aren’t really playing the game you designed, they’re working around it.

The good news is that a lot of things have to happen in combination for this to be a problem, and you can break the chain of events anywhere to fix it.

  • Players need a mechanism to join forces and “gang up” on a single player; if you make it difficult or impossible for players to form coalitions or to coordinate strategies, attacking the leader is impossible. In a foot race, players can’t really “attack” each other, so you don’t see any kill-the-leader strategies in marathons. In an FPS multiplayer deathmatch, players can attack each other, but the action is moving so fast that it’s hard for players to work together (or really, to do anything other than shoot at whoever’s nearby).
  • Or, even if players can coordinate, they need to be able to figure out who the leader is. If your game uses hidden scoring or if the end goal can be reached a lot of different ways so that it’s unclear who is closest, players won’t know who to go after. Lots of Eurogames have players keep their Victory Points secret for this reason.
  • Or, even if players can coordinate and they know who to attack, they don’t need to if the game already has built-in opportunities for players to catch up. Some Eurogames have just a few defined times in the game where players score points, with each successive scoring opportunity worth more than the last, so in the middle of a round it’s not always clear who’s in the lead… and players know that even the person who got the most points in the first scoring round has only a minor advantage at best going into the final scoring round.
  • Or, even if players can coordinate and they know who to attack, the game’s systems can make this an unfavorable strategy or it can offer other strategies. For example, in RISK it is certainly arguable that having everyone attack the leader is a good strategy in some ways… but on the other hand, the game also gives you an incentive to attack weaker players, because if you eliminate a player from the game you get their cards, which gives you a big army bonus.
  • Or, since kill-the-leader is a negative feedback loop, the “textbook solution” is to add a compensating positive feedback loop that helps the leader to defend against attacks. If you want a dynamic where the game starts equal but eventually turns into one-against-many, this might be the way to go.

If you choose to remove the negative feedback of kill-the-leader, one thing to be aware of is that if you were relying on this negative feedback to keep the game balanced, it might now be an unbalanced positive feedback loop that naturally helps the leader, so consider adding another form of negative feedback to compensate for the removal of this one.

Kingmaking

A related problem is when one player is too far behind to win, but they are in a position to decide which of two other people wins. Sometimes this happens directly – in a game with trading and negotiation, the player who’s behind might just make favorable trades to one of the leading players in order to hand them the game. Sometimes it happens indirectly, where the player who’s behind has to make one of two moves as part of the game, and it is clear to everyone that if they make one move then one player wins, and another move causes another player to win.

This is undesirable because it’s anticlimactic: the winner didn’t actually win because of superior skill, but instead because one of the losing players liked them better. Now, in a game with heavy diplomacy (like the board game Diplomacy) this might be tolerable; after all, the game is all about convincing other people to do what you want. But in most games, the winners both feel like the win wasn’t really deserved, so the game designer generally wants to avoid this situation.

As with kill-the-leader, there are a lot of things that have to happen for kingmaking to be a problem, and you can eliminate any of them:

  • The players have to know their standing. If no player knows who is winning, who is losing, and what actions will cause one player to win over another, then players have no incentive to help out a specific opponent.
  • The player in last place has to know that they can’t win, and that all they can do is help someone else to win. If every player believes they have a chance to win, there’s no reason to give away the game to someone else.
  • Or, you can reduce or eliminate ways for players to affect each other. If the person in last place has no mechanism to help anyone else, then kingmaking is impossible.

Player elimination

A lot of two-player games are all about eliminating your opponent’s forces, so it makes sense that multi-player games follow this pattern as well. The problem is that when one player is eliminated and everyone else is still playing, that losing player has to sit and wait for the game to end, and sitting around not playing the game is not very fun.

With games of very short length, this is not a problem. If the entire game lasts two minutes and you’re eliminated with 60 seconds left to go, who cares? Sit around and wait for the next game to start. Likewise, if player elimination doesn’t happen until late in the game, this is not usually a problem. If players in a two-hour game start dropping around the 1-hour-50-minute mark, relatively speaking it won’t feel like a long time to wait until the game ends and the next one can begin. It’s when players can be eliminated early and then have to sit around and wait forever that you run into problems.

There are a few mechanics that can deal with this:

  • You can change the nature of your player elimination, perhaps disincentivizing players to eliminate their opponents, so that the only time a player will actually do this is when they feel they’re strong enough to eliminate everyone and win the game. The board game Twilight Imperium makes it exceedingly dangerous to attack your opponents because a war can leave you exposed, thus players tend to not attack until they feel confident that they can come out ahead, which doesn’t necessarily happen until late game.
  • You can also change the victory condition, removing elimination entirely; if the goal is to earn 10 Victory Points, instead of eliminating your opponents, then players can be so busy collecting VP that they aren’t as concerned with eliminating the opposition. The card game Illuminati has a mechanism for players to be eliminated, but the victory condition is to collect enough cards (not to eliminate your opponents), so players are not eliminated all that often.
  • One interesting solution is to force the game to end when the first player is eliminated; thus, instead of the victory being decided as last player standing, victory is the player in best standing (by some criteria) when the first player drops out. If players can help each other, this creates some tense alliances as one player nears elimination; the player in the lead wants that player eliminated, while everyone else actually wants to help that losing player stay in the game! The card game Hearts works this way, for example. The video game Gauntlet IV (for Sega Genesis) also did something like this in its multiplayer battle mode, where as soon as one player was eliminated a 60-second countdown timer started, and the round would end even if several players were still alive.
  • You can also give the eliminated players something to do in the game after they’re gone. Perhaps there are some NPCs in the game that are normally moved according to certain rules or algorithms, but you can give control of those to the eliminated players (my game group added this as a house rule in the board game Wiz-War, where an eliminated player would take control of all monsters on the board). Cosmic Encounter included rules for a seventh player beyond the six that the game normally supports, by adding “kibbutzing” mechanics where the seventh player can wander around, look at people’s hands, and give them information… and they have a secret goal to try to get a specific other player to win, so while they are giving away information they also may be lying. In Mafia/Werewolf and other variants, eliminated players can watch the drama unfold, so even though they can’t interact the game is fun to observe, so most players don’t mind taking on the “spectator” role.

Excel

Every game designer really needs to learn Excel at some point. Some of you probably already use it regularly, but if you don’t, you should learn your way around it, so consider this a brief introduction to how Excel works and how to use it in game design, with a few tricks from my own experience thrown in. For those of you who are already Excel experts, I beg your patience, and hope I can show you at least one or two little features that you didn’t know before. Note: I’m assuming Excel 2003 for PC; the exact key combinations I list below may vary for you if you’re using a different version or platform.

Excel is a spreadsheet program, which means absolutely nothing to you if you aren’t a financial analyst, so an easier way of thinking about Excel is that it’s a program that lets you store data in a list or a grid. At its most basic, you can use it to keep things like a grocery list or to-do list in a column, and if you want to include a separate column to keep track of whether it’s done or not, then that’s a perfectly valid use (I’ve worked with plenty of spreadsheets that are nothing more than that, for example a list of art or sound assets in a video game and a list of their current status). Data in Excel is stored in a series of rows and columns, where each row has a number, each column has a letter, and a single entry is in a given row and column. Any single entry location is called a cell (as in “cell phone” or “terrorist cell”), and is referred to by its row and column (like “A1” or “B19”). You can navigate between cells with arrow keys or by clicking with the mouse.

Entering data into cells

In general, each cell can hold one of three things: a number, written text, or a computed formula. Numbers are pretty simple, just type in a number in the formula bar at the top and then hit Enter, or click the little green checkmark if you prefer. Text is also simple, just type the text you want in the same way. What if you want to include text that looks like a number or formula, but you want Excel to treat it as text? Start the entry with a single apostrophe (‘) and then you can type anything you want, and Excel will get the message that you want it to treat that cell as text.

For a formula, start the entry with an equal sign (=) and follow with whatever you want computed. Most of the time you just want simple arithmetic, which you can do with the +, -, * and / characters. For example, typing =1+2 and hitting Enter will display 3 in the cell. You can also reference other cells: =A1*2 will take the contents in cell A1, multiply by 2, and put the result in whatever other cell there is. And the really awesome part about this is that if you change the value in A1, any formulas that reference it will change automatically, which is the main thing Excel does that saves you so much time. In fact, even if you insert new rows that change the actual name of the cell you’re referencing, Excel will change your formulas to update what they’re referencing.

Adding comments

Suppose you want to leave a note to yourself about something in one of the cells. One way to do this is just to put another text field next to it, of course, although as you’ll see getting a lot of text to display in one of those tiny cells isn’t trivial – you can type it all in, of course, but there are times when that’s not practical. For those cases you can instead use the Comment feature to add a comment to the box, which shows up as a little red triangle in the corner of the cell. Mousing over the cell reveals the comment.

Moving data around

Cut, copy and paste work pretty much as you’d expect them to. You can even click and drag, or hold Shift while moving around with the arrow keys, to select a rectangular block of cells… or click on one of the row or column headings to select everything in that row or column… or click on the corner between the row and column headings (or hit Ctrl-A) to select everything. By holding Ctrl down and clicking on individual cells, you can select several cells that aren’t even next to each other.

Now, if you paste a cell containing a formula a whole bunch of times, a funny thing happens: you’ll notice that any cells that are referenced in the formula keep changing. For example, if you’ve got a formula in cell B1 that references A1, and you copy B1 and paste into D5, you’ll notice the new formula references D4 instead. That’s because by default, all of these cell references are relative in position to the original. So when you reference A1 in your formula in B1, Excel isn’t actually thinking “the cell named A1”… it’s thinking “the cell just to the left of me in the same row.” So when you copy and paste the formula somewhere else, it’ll start referencing the cell just to the left in the same row. As you might guess in this example, if you paste this into a cell in column A (where there’s nothing to the left because you’re already all the way on the left), you’ll see a result that’s an error: #REF! which means you’re referencing a cell that doesn’t exist.

If you want to force Excel to treat a reference as absolute, so that it references a specific cell no matter where you copy or paste to, there are two ways to do it. First is to use a dollar sign ($) before the column letter or row number or both in the formula, which tells Excel to treat either the row or column (or both) as an absolute position. For our earlier example, if you wanted every copy-and-paste to look at A1, you could use $A$1 instead.

Why do you need to type the dollar sign twice in this example? Because it means you can treat the row as a relative reference while the column is absolute, or vice versa. There are times when you might want to do this which I’m sure you will discover as you use Excel, if you haven’t already.

There’s another way to reference a specific, named cell in an absolute way, which is mostly useful if you’re using several cells in a bunch of complicated formulas and it’s hard to keep straight in your head which cell is which value when you’re writing the formulas. You can give any cell a name; the name is just the letter and number of the cell by default, it’s displayed in the top left part of the Excel window. To change it, just click on that name and then type in whatever you want. And then you can reference that name anywhere else in the spreadsheet and it’ll be an absolute reference to that named cell.

Sorting

Sometimes you’ll want to sort the data in a worksheet. A common use of Excel is to keep track of a bunch of objects, one per row, and each attribute is listed in a separate column. For example, maybe on a large game project you’ll have an Excel file that lists all of the enemies in a game, with the name in one column, hit points in another, damage in another, and so on. And maybe you want to sort by name just so you have an easy-to-lookup master list, or maybe you want to sort by hit points to see the largest or smallest values, or whatever. This is pretty easy to do. First, select all the cells you want to sort. Go to the Data menu, and choose Sort. Next, tell it which column to sort by, and whether to sort ascending or descending. If you’ve got two entries in that column that are the same, you can give it a second column as a tiebreaker, and a third column as a second tiebreaker if you want (otherwise it’ll just preserve the existing order when it sorts like that). There’s also an option for ignoring the header row, so if you have a header with column descriptions at the very top and you don’t want that sorted… well, you can just not select it when sorting, of course, but sometimes it’s easier to just select the whole spreadsheet and click the button to ignore the header row. If you accidentally screw up when sorting, don’t panic – just hit Undo.

Sometimes you realize you need to insert a few rows or columns somewhere, in between others. The nice thing about this is that Excel updates all absolute and relative references just the way you’d want it to, so you should never have to change a value or formula or anything just because you inserted a row. To insert, right-click on the row or column heading, and “insert row” or “insert column” is one of the menu choices. You can also insert them from the Insert menu.

If you need to remove a row or column it works similarly, right-click on the row or column heading and select “delete.” You might think you could just hit the Delete key on the keyboard too, but that works differently: it just clears the values of the cells but doesn’t actually shift everything else up or left.

Using your data

Sometimes you’ve got a formula you want copied and pasted into a lot of cells all at once. My checkbook, for example, has formulas on each row to compute my current balance after adding or subtracting the current transaction from the previous one, and I want that computation on every line. All I had to do was write the formula once… but if I had to manually copy then paste into each individual cell, I’d cry. Luckily, there’s an easy way to do this: Fill. Just select the one cell you want to propagate, and a whole bunch of other cells below or to the right of it, then hit Ctrl+D to take the top value and propagate it down to all the others (Fill Down), or Ctrl+R to propagate the leftmost value to the right. If you want to fill down and right, you can just select your cell and a rectangular block of cells below and to the right of it, then hit Ctrl+D and Ctrl+R in any order. You can also Fill Up or Fill Left if you want, but those don’t have hotkeys; you’ll have to select those from the Edit menu under Fill. As with copying and pasting, Fill respects absolute and relative references to other cells in your formulas.

There’s kind of a related command to Fill, which is useful in a situation like the following example: suppose you’re making a list of game objects and you want to assign each one a unique ID number, and that number is going to start at 1 and then count upwards from there, and let’s say you have 200 game objects. So in one column, you want to place the numbers 1 through 200, each number in its own cell. Entering each number manually is tedious and error-prone. You could use a formula, like, say, putting the number 1 in the first cell, let’s say it’s cell A2, and then in B2 put the formula =A2+1 (which computes to 2), then Fill Down that formula to create the numbers all the way down to 200. And that will work at first, but whenever you start reordering or sorting rows, all of these cells referencing each other might get out of whack, and it might work or it might not but it’ll be a big mess regardless. And anyway, you don’t really want a formula in those cells anyway, you want a number.

You could create the 200 numbers by formula on a scratch area somewhere, then copy, then Paste Special (under the Edit menu), and select Values, which just takes the computed values and pastes them in as numbers without copying the formulas. And then you just delete the formulas that you don’t need anymore. That would work, and Paste Special / Values is an awesome tool for a lot of things, but it’s overkill here.

Here’s a neat little trick: just create two or three cells and put the numbers 1, 2, 3 in them. Now, select the three cells, and you’ll notice there’s a little black square dot in the lower right corner of the selection. Click on that, and drag down a couple hundred rows. When you release the mouse button, Excel takes its best guess what you were doing, and fills it all in. For something simple like counting up by 1, Excel can figure that out, and it’ll do it for you. For something more complicated you probably won’t get the result you’re looking for, but you can at least have fun trying and seeing what Excel thinks you’re thinking.

Functions

Excel comes with a lot of built-in functions that you can use in your formulas. Functions are always written in capital letters, followed by an open-parenthesis, then any parameters the function might take in (this varies by function), then a close-parenthesis. If there are several parameters, they are separated by commas (,). You can embed functions inside other ones, so one of the parameters of a function might actually be the result of another function; Excel is perfectly okay with that.

Probably the single function I use more than any other is SUM(), which takes any number of parameters and adds them together. So if you wanted to sum all of the cells from A5 to A8, you could say =A5+A6+A7+A8, or you could say =SUM(A5,A6,A7,A8), or you could say =SUM(A5:A8). The last one is the most useful; use a colon between two cells to tell Excel that you want the range of all cells in between those. You can even do this with a rectangular block of cells by giving the top-left and bottom-right corners: =SUM(A5:C8) will add up all twelve cells in that 3×4 block.

The second most useful function for me is IF, which takes in three parameters. The first is a condition that’s evaluated either to a true or false value. The second parameter is evaluated and returned if the condition is true. The third parameter is evaluated and returned if the condition is false. The third parameter is optional; if you leave it out and the condition is false, the cell will just appear blank instead. For example, you could say: =IF(A1>0,1,5) which means that if A1 is greater than zero, this cell’s value is 1, otherwise it’s 5. One of the common things I use with IF is the function ISBLANK() which takes a cell, and returns true if the cell is blank, or false if it isn’t. So you can use this, for example, if you’re using one column as a checklist and you want to set a column to a certain value if something hasn’t been checked. If you’re making a checklist and want to know how many items have (or haven’t) been checked off, by the way, there’s also the function COUNTBLANK() which takes a range of cells as its one parameter, and returns the number of cells that are blank.

For random mechanics, look back in the week where we talked about pseudorandomness to see my favorite three functions for that: RAND() which takes no parameters at all and returns a pseudorandom number from 0 to 1 (it might possibly be zero, but never one). Changing any cell or pressing F9 causes Excel to reroll all randoms. FLOOR() and CEILING() will take a number and round it down or up to the nearest whole number value, or you can use ROUND() which will round it normally.

FLOOR() and CEILING() both require a second parameter, the multiple to round to; for most cases you want this to be 1, since you want it rounding to the nearest whole number, but if you want it to round up or down to the nearest 5, or the nearest 0.1, or whatever, then use that as your second parameter instead. Just to be confusing, ROUND() also takes a second parameter, but it works a little differently. For ROUND(), if the second parameter is zero (which you normally want) then it will round to the nearest whole number. If the second parameter is 1, it rounds to the nearest tenth; if the second parameter is 2, it rounds to the nearest hundredth; if the second parameter is 3, it rounds to the nearest thousandth; and so on – in other words, the second parameter for ROUND() tells you the number of digits after the decimal point to include in the significance.

RANK() and VLOOKUP(), I already mentioned back in Week 6; they’re useful for when you need to take a list and shuffle it randomly.

Multiple worksheets

By default, a new Excel file has three worksheet tabs, shown at the bottom left. You can rename these to something more interesting than “Sheet1” by just double-clicking the tab, typing a name, and hitting Enter. You can also reorder them by clicking and dragging, if you want a different sheet to be on the left or in the middle. You can add new worksheets or delete them by right-clicking on the worksheet tab, or from the Insert menu. Ctrl+PgUp and Ctrl+PgDn provide a convenient way to switch between tabs without clicking down there all the time, if you find yourself going back and forth between two tabs a lot.

The reason to create multiple worksheets is mostly for organizational purposes; it’s easier sometimes if you’ve got a bunch of different but related systems to put each one in its own worksheet rather than having to scroll around all over the place to find what you’re looking for on a single worksheet.

You can actually reference cells in other worksheets in a formula, if you want. The easiest way to do it is to type in the formula until you get to the place where you’d type in the cell name, then use your mouse to click on the other worksheet, then actually click on the cell or cells you want to reference. One thing to point out here is that it’s easy to do this by accident, where you’re entering something in a cell and don’t realize you haven’t finished, and then you click on another cell to see what’s there and instead it starts adding that cell to your formula; if that happens or you otherwise feel like you’re lost and not sure how to get out of entering half of a formula, just hit the red X button to the left of the formula bar and it’ll undo any typing you did just now.

Graphing

One last thing that I find really useful is the ability to create a graph, great for looking graphically at your game objects when they relate to each other. Select two or more rows or columns, each one is just a separate curve, then select the Insert menu, then Chart. Select XY(Scatter) and then the subtype that shows curvy lines. From there, go through the wizard to select whatever options you want, then click Finish and you’ll have your chart.

One thing you’ll often want to do with graphs is to add a trendline; right-click on any single data point on the graph, and select Add Trendline. You’ll have to tell it whether the trendline should be linear, or exponential, or polynomial, or what. On the Options tab of the trendline wizard, you can also have it display the equation on the chart so you can actually see what the best-fit curve is, and also display the R-squared value (which is just a measure of how close the fitted curve is to the actual data; R-squared of 1 means the curve is a perfect fit, R-squared of 0 means it may as well be random… although in practice, even random data will have an R-squared value of more than zero, sometimes significantly more). If you’re trying to fit a curve, as happens a lot when analyzing metrics, you’ll probably want to add these right away.

Another thing you should know is that by default, the charts Excel makes are… umm…  really ugly. Just about everything you can imagine to make the display better, you can do: adding vertical and not just horizontal lines on the graph, changing the background and foreground colors of everything, adding labels on the X and Y axis, changing the ranges of the axes and labeling them… it’s all there somewhere. Every element of the graph is clickable and selectable on its own, and generally if you want to change something, just right-click on it and select Format, or else double-click it. Just be aware that each individual element – the gridlines, the graphed lines, the legend, the background, the axes – are all treated separately, so if you don’t see a display option it probably just means you have the wrong thing selected. Play around with the formatting options and you’ll see what I mean.

Making things look pretty

Lastly, there are a few things you can do to make your worksheets look a little bit nicer, even without the graphs, even if it’s just cells. Aside from making things look more professional, it also makes it look more like you know what you’re doing 🙂

The most obvious thing you can do is mess with the color scheme. You can change the text color and background color of any cell; the buttons are on a toolbar in the upper right (at least they are on my machine; maybe they aren’t for you, if not just add the Formatting toolbar and it’s all on there). You can also make a cell Bolded or Italicized, left or right justified, all the other things you’re used to doing in Word. Personally, I find it useful to use background color to differentiate between cells that are just text headings (no color), cells where the user is supposed to change values around to see what effect they have on the rest of the game (yellow), and cells that are computed values or formulas that should not be changed (gray), and then Bolding anything really important.

You also have a huge range of possible ways to display numbers and text. If you select a single cell, a block of cells, an entire row or column, or even the whole worksheet, then right-click (or go to the Format menu) and select Format Cells, you’ll have a ton of options at your disposal. The very first tab lets you say if this is text or a number, and what kind. For example, you can display a number as currency (with or without a currency symbol like a dollar sign or something else), or a decimal (to any number of places).

On the Alignment tab are three important features:

  • Orientation lets you display the text at an angle, even sideways, which can make your column headings readable if you want the columns themselves to be narrow. (Speaking of which – you can adjust the widths of columns and heights of rows just by clicking and dragging between two rows or columns, or right-clicking and selecting Column Width or Row Height).
  • Word Wrap does exactly what you think it does, so that the text is actually readable. Excel will gleefully expand the row height to fit all of the text, so if the column is narrow and you’ve got a paragraph in there, it’ll probably be part of a word per line and the whole mess will be unreadable, so you’ll want to adjust column width before doing that.
  • Then there’s a curious little option called Merge Cells, which lets you convert Excel’s pure grid form into something else. To use it, select multiple cells, then Format Cells and then click the Merge Cells option and click OK. You’ll see that all the cells you selected are now a single giant uber-cell. I usually use this for cosmetic reasons, like if you’ve got a list of game objects and each column is some attribute, and you’ve got a ton of attributes but you want to group them together… say, you have some offensive attributes and some defensive ones, or whatever. You could create a second header column over the individual headers, merge the cells over each group, and have a single cell that says (for example) “defensive attributes”. Excel actually has a way to do this automatically in certain circumstances, called Pivot Tables, but that’s a pretty advanced thing that I’ll leave to you to learn on your own through google if you reach a point where you need it.

One thing you’ll find sometimes is that you have a set of computed cells, and while you need them to be around because you’re referencing them, you don’t actually need to look at the cells themselves – they’re all just intermediate values. One way to take care of this is to stick them in their own scratch worksheet, but an easier way is to stick them in their own row or column and then Hide the row or column. To do that, just right-click on the row or column, and there’s a Hide option. Select that and the row or column will disappear, but you’ll see a thick line between the previous and next columns, a little visual signal to you that something else is still in there that you just can’t see. To display it again, select the rows or columns on either side, right-click and select Unhide.

If you want to draw a square around certain blocks of data in order to group them together visually, another button in the Formatting tab lets you select a border. Just select a rectangle of cells, then click on that and select the border that looks like a square, and it’ll put edges around it. The only thing I’ll warn you is that when you copy and paste cells with borders, those are copied and pasted too under normal conditions, so don’t add borders until you’re done messing around (or if you have to, just remove the borders, move the cells around, then add them back in… or use Paste Special to only paste formulas and not formatting).

Another thing that I sometimes find useful, particularly when using a spreadsheet to balance game objects, is conditional formatting. First select one or more cells, rows or columns, then go to the Format menu and select Conditional Formatting. You first give it a condition which is either true or false. If it’s true, you can give it a format: using a different font or text color, font effects like bold or italic, adding borders or background colors, that sort of thing. If the condition isn’t true, then the formatting isn’t changed. When in the conditional formatting dialog, there’s an “Add” button at the bottom where you can add up to two other conditions, each with its own individual formatting. These are not cumulative; the first condition is always evaluated first (and its formatting is used if the condition is satisfied). If not, then it’ll try the second condition, and if not that it’ll try the third condition. As an example of how I use this, if I’m making a game with a cost curve, I might have a single column that adds up the numeric benefits minus costs of each game object. Since I want the benefits and costs to be equivalent, this should be zero for a balanced object (according to my cost curve), positive if it’s overpowered and negative if it’s underpowered. In that column, I might use conditional formatting to turn the background of a cell a bright green color if benefits minus costs is greater than zero, or red if it’s less than zero, so I can immediately get a visual status of how many objects are still not balanced right.

Lastly, in a lot of basic spreadsheets you just want to display a single thing, and you’ve got a header row along the top and a header column on the left, and the entire rest of the worksheet is data. Sometimes that data doesn’t fit on a single page, but as you scroll down you forget which column is which. Suppose you want the top row or two to stay put, always displayed no matter how far you scroll down, so you can always see the headings. To do that, select the first row below your header row, then go to the Window menu and select Freeze Panes. You’ll see a little line appear just about where you’d selected, and you’ll see it stay even if you scroll down. To undo this, for example if you selected the wrong row by accident, go to the Window menu and select Unfreeze Panes (and then try again). If you want instead to keep the left columns in place, select the column just to the right, then Freeze Panes again. If you want to keep the leftmost columns and topmost rows in place, select a single cell and Freeze Panes, and everything above or to the left of that cell is locked in place now.

About that whole “Fun” thing…

At this point we’ve covered just about every topic I can think of that relates to game balance, so I want to take some time to reflect on where balance fits in to the larger field of game design. Admittedly, the ultimate goal of game design depends on the specific game, but in what I’d say is the majority of cases, the goal of the game designer is to create a fun experience for the players. How does balance fit in with this?

When I was a younger designer, I wanted to believe the two were synonymous. A fun game is a balanced game, and a balanced game is a fun game. I’m not too proud to say that I was very wrong about this. I encountered two games in particular that were fun in spite of being unbalanced, and these counterexamples changed my mind.

The first was a card game that I learned as Landlord (although it has many other names, some more vulgar than others); the best known is probably a variant called The Great Dalmuti. This is a deliberately unbalanced game. Each player sits in a different position, with the positions forming a definite progression from best to worst. Players in the best position give their worst cards to those in the worst position at the start of the round, and the worst-position players give their best cards to those in the best position, so the odds are strongly in favor of the people at the top. At the end of each hand, players reorder themselves based on how they did in the round, so the top player takes top seat next round. This is a natural positive feedback loop: the people at the top have so many advantages that they’re likely to stay there, while the people at the bottom have so many disadvantages that they’re likely to stay there as well. As I learned it, the game never ends, you just keep playing hand after hand until you’re tired of it. In college my friends and I would sometimes play this for hours at a time, so we were clearly having a good time in spite of the game being obviously unbalanced. What’s going on here?

I think there are two reasons here. One is that as soon as you learn the rules, it is immediately obvious to you that the game is not fair, and in fact that the unfairness is the whole point. It’s not that fairness and balance are always desirable, it’s that when players expect a fair and balanced game and then get one that isn’t, the game doesn’t meet their expectations. Since this game sets the expectation of unfairness up front, by choosing to play at all you have already decided that you are willing to explore an unbalanced system.

Another reason why this game doesn’t fail is that it has a strong roleplaying dynamic, which sounds strange because this isn’t an RPG… but at the same time, players in different seats do have different levels of power, so some aspect of roleplaying happens naturally in most groups. The players at the top are having fun because “it’s good to be king.” The players at the bottom are also having fun because there’s a thrill of fighting against the odds, striking a blow for the Little Guy in an unfair system, and every now and then one of the players at the bottom ends up doing really well and suddenly toppling the throne, and that’s exciting (or one of the guys on top crashes and falls to the bottom, offering schadenfreude for the rest of the players). For me it’s equally exciting to dig my way out from the bottom, slowly and patiently, over many hands, and eventually reaching the top (sort of like a metaphor for hard work and retirement, I guess). Since the game replicates a system that we recognize in everyday life where we see the “haves” and “have-nots,” being able to play in and explore this system from the magic circle of a game has a strong appeal.

The second game I played that convinced me that there’s more to life than balance, is the (grammatically-incorrectly titled) board game Betrayal at House on the Hill. This game is highly unbalanced. Each time you play you get a random scenario which has a different set of victory conditions, but most of them strongly favor some players over others, and most don’t scale very well with number of players (that is, most scenarios are much easier to win or lose if there are 3 players or 6 players, so the game is often decided as a function of how many players there are and which random scenario you get). The game has a strong random element that makes it likely one or more players will have a very strong advantage or disadvantage, and in most scenarios it’s even possible to have early player elimination. Not that it has to do with balance, but the first edition of this game also has a ton of printing errors, making it seem like it wasn’t playtested nearly enough. (In fact, I understand that it was playtested extensively, but the playtesters were having such a fun time playing that they didn’t bother to notice or report the errors they encountered.)

In spite of the imbalances, the randomness and the printing errors, the game itself is pretty fun if you play in the right group. The reason is that no matter what happens, in nearly every game, some kind of crazy thing happened that’s fun to talk about after the fact. The game is very good at creating a story of the experience, and the stories are interesting. Partly this has to do with all of the flavor text in the game, on the cards and in the scenarios… but that just sets the haunted-house environment to put the players in the right frame of mind. Mostly it’s that because of the random nature of the game, somewhere along the line you’ll probably see something that feels highly unlikely, like one player finding a whole bunch of useful items all at once, or a player rolling uncharacteristically well or poorly on dice at a key point, or drawing just the right card you need at just the right time, or a player finding out the hard way what that new mysterious token on the board does. And so, players are willing to overlook the flaws because the core gameplay is about working together as a team to explore an unfamiliar and dangerous place, then having one of your kind betray the others and shifting to a one-against-many situation, and winning or losing as a coordinated team. And as a general game structure, that turns out to be unique enough to be interesting.

Now, player expectation is another thing that is a huge factor in Betrayal. I’ve seen some players that didn’t know anything about the game and were just told, “oh, this is a fun game” and they couldn’t get over the fact that there were so many problems with it. When I introduce people to the game, I always say up front that the game is not remotely balanced, because it helps people to enjoy the experience more. And incidentally, I do think it would be a better game if it were more balanced, but my point is that it is possible for a game design to succeed without it.

So, at the end of the day, I think that what game balance does is that it makes your game fair. In games where players expect a fair contest, balance is very important; for example, one of the reasons a lot of players hate the “rubber-banding” negative feedback in racing games, where an AI-controlled car suddenly gets an impossible burst of speed when it’s too far behind you, is that it feels unfair because real-life racing doesn’t work that way. But in a game like The Great Dalmuti which is patently unfair, players expect it to be unbalanced so they accept it easily. This is also why completely unbalanced, overpowered cards in a Trading Card Game (especially if they’re rare) are seen as a bad thing, but in a single-player card-battle game using the same mechanics they can be a lot of fun: for a head-to-head tabletop card game players expect the game to provide a fair match, so they want the cards to be balanced; in the case of the single-player game, the core of the game is about character growth and progression, so getting more powerful cards as the game progresses is part of the expectation.

Just like everything in game design, it’s all about understanding the design goals, what it is you want the player to experience. But if you want them to experience a fair game, which is at least true in most games, then that is the function of balance. In fact, the only games I can think of where you don’t want balance are those where the core gameplay is specifically built around playing with the concept of fairness and unfairness.

If You’re Working on a Game Now…

Well, you’re probably already using Excel in that case, so there’s not much I can have you do to exercise those skills that you’re not already doing.

If your game has a free-for-all multiplayer structure, ask yourself if any of the problems mentioned in this post (turtling, kill-the-leader, sandbagging, kingmaking, early elimination) might be present, and then decide what (if anything) to do about them.

If your game has an economic system, analyze it. Can players trade? Are there auctions? What effect would it have on the game if you added or removed these? Are there alternatives to the way your economic system works now that you hadn’t considered?

Homework

Since it’s the end of the course, there are two ways to approach this. One is to say, no “homework” at all, because hey, the course is over! But that would be lazy design on my part.

Instead, let me set you a longer challenge that brings together everything we’ve talked about here. Make a game, and then apply all the lessons of game balance that you can. Spend a month on it, maybe more if it ends up being interesting, and then you’ll have something you can add to your game design portfolio. It’s up to you whether you want to do this, of course.

Some suggestions:

  • Design the base set for an original trading-card game. I usually tell students to stay away from projects like this, because TCGs have a huge amount of content, so let’s keep it limited here:
    • Design an expansion set to an existing TCG. First, use your knowledge of the existing game to derive a cost curve. Then, put that into Excel, and use it to create and balance a new set. Create one or two new mechanics that you need to figure out a cost for on the curve, and playtest on your own to figure out how much the new mechanic is actually worth. Make a small set, maybe 50 to 80 cards.
  • Or, if you’ve got a bit more time and want to do a little bit of systems design work as well as balance:
    • Make the game self-contained, and 100 cards or less. Consider games like Dominion or Roma or Ascension which behave like TCGs but require no collecting, and have players build their deck or hand during play instead.
    • As soon as you’re done with the core mechanics, make a cost curve for the game. Put the curve into an Excel spreadsheet and use it to create and balance the individual cards.
    • Playtest the game with friends and challenge them to “break” the game by finding exploits and optimal strategies. Adjust your cost curve (and cards) accordingly, and repeat the process.
  • Or, find a turn-based or real-time strategy game on computer that includes some kind of mod tools. Work on the balance:
    • First, play the game a bit and use your intuition to analyze the balance. Are certain units or strategies or objects too good or too weak? Look around for online message boards to see if other players feel the same, or if you were just using different strategies. Once you’ve identified one or more imbalances, analyze the game mathematically using every tool at your disposal, to figure out exactly what numbers need to change, and by how much. Mod the game, and playtest to see if the problem is fixed.
    • For a more intense project, use the mod tools to wipe out an entire part of the gameplay, and start over designing a new one from scratch. For example, maybe you can design a brand-new set of technology upgrades for Civilization, or a new set of units for Starcraft. Use the existing art if you want, but change the nature of the gameplay. Then, work on balancing your new system.
  • If you like RTS games but prefer something on tabletop, instead design a miniatures game. Most miniatures games are expensive (you have to buy and paint a lot of miniatures, after all) so challenge yourself to keep it cheap. Use piles of cardboard squares that you assemble and cut out on your own. With such cheap components, you could even add economic and production elements of the RTS genre if you’d like.
    • First, look at the mechanics of some existing miniatures games, which tend to be fairly complicated. Where can you simplify the combat mechanics, just to keep your workload manageable? Try to reduce the game down to a simple set of movement and attack mechanics with perhaps a small handful of special abilities. (For a smaller challenge you can, of course, just create an expansion set to an existing miniatures game that you already play.)
    • As with other projects, create a cost curve for all attributes and abilities of the playing pieces, and use Excel to create and balance a set of unit types. Keep this small, maybe 5 to 10 different units; you’ll find it difficult enough to balance them even with just that few. Consider adding some intransitive relationships between the units, to make sure that no single strategy is strictly better than another. If you end up really liking the game, you can make another set of units of the same size for a new “faction” and try to balance the second set with the first set.
    • Print out a set of cheap components, and playtest and iterate on your design.
  • Or, if you prefer tabletop RPGs, analyze the combat (or conflict resolution) system of your favorite game to find imbalances, and propose rules changes to fix it. For a longer project, design your own original combat system, either for an existing RPG as a replacement, or for an original RPG set in an original game world. As a challenge to yourself and to keep the scope of this under control, set a page limit: a maximum of ten pages of rules descriptions, and it should all fit on a one-page summary. Playtest the system with your regular tabletop RPG group if you have one (if you don’t have one, you might consider selecting a different project instead).

References

I found the following two blog posts useful to reference when writing about auctions and multiplayer mechanics, respectively:

http://jergames.blogspot.com/2006/10/learn-to-love-board-games-again100.html#auctions

and

http://pulsiphergamedesign.blogspot.com/2007/11/design-problems-to-watch-for-in-multi.html

In Closing…

I’d just like to say that putting this information together over this summer has been an amazing experience for me, and I hope you have enjoyed going on this journey with me.

You might be wondering if I’m going to do something like this again in the future. You can bet the answer to that will be yes, although of course I don’t know exactly what that will be. Expect to see an announcement here when I’m setting up the courses for Summer 2011, if you want to take part.

Enjoy,

– Ian Schreiber

Level 9: Intransitive Mechanics

September 1, 2010

Readings/Playings

See “additional resources” at the end of this blog post for further reading.

This Week

Welcome back! Today we’re going to learn about how to balance intransitive mechanics. As a reminder, “intransitive” is just a geeky way of saying “games like Rock-Paper-Scissors” – that is, games where there is no single dominant strategy, because everything can be beaten by something else.

We see intransitive mechanics in games all the time. In fighting games, a typical pattern is that normal attacks are defeated by blocks, blocks are defeated by throws, and throws are defeated by attacks. In real-time strategy games, a typical pattern is that you have fliers that can destroy infantry, infantry that works well against archers, and archers are great at bringing down fliers. Turn-based strategy games often have some units that work well against others, an example pattern being that heavy tanks lose to anti-tank infantry which loses to normal infantry which lose to heavy tanks. First-person shooters sometimes have an intransitive relationship between different weapons or vehicles, like rocket launchers being good against tanks (since they’re slow and easy to hit) which are good against light vehicles (which are destroyed by the tank’s fast rate of fire once they get in range) which in turn are good against rocket launchers (since they can dodge and weave around the slow incoming rockets). MMOs and tabletop RPGs often have some character classes that are particularly good at fighting against other classes, as well. So you can see that intransitive mechanics are in all kinds of places.

Some of these relationships might not be immediately obvious. For example, consider a game where one kind of unit has long-range attacks, which is defeated by a short-range attacker who can turn invisible; this in turn is defeated by a medium-range attacker with radar that reveals invisible units; and the medium-range attacker is of course weak against the long-range attacker.

Sometimes it’s purely mathematical; in Magic: the Gathering, a 1/3 creature will lose in combat to a 3/2 creature, which loses to a 2/1 First Strike creature, which in turn loses to the original 1/3 creature. Within the metagame of a CCG you often have three or four dominant decks, each one designed to beat one or more of the other ones. These kinds of things aren’t even necessarily designed with the intention of being intransitive, but that is what ends up happening.

Solutions to intransitive mechanics

Today we’re going to get our hands pretty dirty with some of the mathiest math we’ve done so far, borrowing when needed from the tools of algebra, linear algebra, and game theory. In the process we’ll learn how to solve intransitive mechanics, so that we can learn more about how these work within our game and what we can expect from player behavior at the expert level.

What does a “solution” look like here? It can’t be a cost curve, because each choice wins sometimes and loses sometimes. Instead it’s a ratio of how often you choose each available option, and how often you expect your opponent to choose each of their options. For example, building an army of 30% archers, 50% infantry, 20% fliers (or 3:5:2) might be a solution to an intransitive game featuring those units, under certain conditions.

As a game designer, you might desire certain game objects to be used more or less frequently than others, and by changing the relative costs and availability of each object you can change the optimal mix of objects that players will use in play. By designing your game specifically to have one or more optimal strategies of your choosing, you will know ahead of time how the game is likely to develop. For example, you might want certain things to only happen rarely during normal play but be spectacular when they do, and if you understand how your costs affect relative frequencies, you can design a game to be like that intentionally. (Or, if it seems like in playtesting, your players are using one thing a lot more than another, this kind of analysis may be able to shed light on why that is.)

Who Cares?

It may be worth asking, if all intransitive mechanics are just glorified versions of Rock-Paper-Scissors, what’s the appeal? Few people play Rock-Paper-Scissors for fun, so why should they enjoy a game that just uses the same mechanics and dresses them differently?

For one thing, an intransitive game is at least more interesting than one with a single dominant strategy (“Rock-Rock-Rock”) because you will see more variety in play. For another, an intransitive mechanic embedded in a larger game may still allow players to change or modify their strategies in mid-game. Players may make certain choices in light of what they observe other players doing now (in real-time), particularly in action-based games where you must react to your opponent’s reaction to your reaction to their action in the space of a few milliseconds.

In games with bluffing mechanics, players may make choices based on what they’ve observed other players doing in the past and trying to use that to infer their future moves, which is particularly interesting in games of partial but incomplete information (like Poker). So, hopefully you can see that just because a game has an intransitive mechanic, does not mean it’s as dull as Rock-Paper-Scissors.

Additionally, intransitive mechanics serve as a kind of “emergency brake” on runaway dominant strategies. Even if you don’t know exactly what the best strategy in your game is, if all strategies have an intransitive relationship, you can at least know that there will not be a single dominant strategy that invalidates all of the others, because it will be weak against at least one other counter-strategy. Even if the game itself is unbalanced, intransitive mechanics allow for a metagame correction – not an ideal thing to rely on exclusively (such a thing would be very lazy design), but better to have a safety net than not if you’re releasing a game where major game balance changes can’t be easily made after the fact.

So, if I’ve managed to convince you that intransitive mechanics are worth including for at least some kinds of games, get ready and let’s learn how to solve them!

Solving the basic RPS game

Let’s start by solving the basic game of Rock-Paper-Scissors to see how this works. Since each throw is theoretically as good as any other, we would expect the ratio to be 1:1:1, meaning you choose each throw equally often. And that is what we’ll find, but it’s important to understand how to get there so that we can solve more complex problems.

First, let’s look at the outcomes. Let’s call our opponent’s throws r, p and s, and our throws R, P and S (we get the capital letters because we’re awesome). Since winning and losing are equal and opposite (that is, one win + one loss balances out) and draws are right in the middle, let’s call a win +1 point, a loss -1 point, and a draw 0 points. The math here would actually work for any point values really, but these numbers make it easiest. We now construct a table of results:

r           p          s

R          0          -1         +1

P          +1        0          -1

S          -1         +1        0

Of course, this is from our perspective – for example, if we throw (R)ock and opponent throws (s)cissors, we win, for a net +1 to our score. Our opponent’s table would be the reverse.

Let’s re-frame this a little bit, by calling r, p and s probabilities that the opponent will make each respective throw. For example, suppose you know ahead of time that your opponent is using a strategy of r=0.5, p=s=0.25 (that is, they throw 2 rock for every paper or scissors). What’s the best counter-strategy?

To answer that question, we can construct a set of three equations that tells you your payoffs for each throw:

  • Payoff for R = 0r + (-1)p + 1s = s-p
  • Payoff for P = 1r + 0p + (-1)s = r-s
  • Payoff for S = (-1)r + 1p + 0s = p-r

So based on the probabilities, you can calculate the payoffs. In the case of our rock-heavy opponent, the payoffs are R=0, P=0.25, S=-0.25. Since P has the best payoff of all three throws, assuming the opponent doesn’t vary their strategy at all, our best counter-strategy is to throw Paper every time, and we expect that we will gain 0.25 per throw – that is, out of every four throws, we’ll win one more game than we lose. In fact, we’ll find that if our opponent merely throws rock the tiniest, slightest bit more often than the others, the net payoff for P will be better than the others, and our best strategy is still to throw Paper 100% of the time, until our opponent modifies their strategy. This is significant; it tells us that an intransitive mechanic is very fragile, and that even a slight imbalance on the player’s part can lead to a completely dominant strategy on the part of the opponent.

Of course, against a human opponent who notices we’re always throwing P, their counter-strategy would be to throw a greater proportion of s, which then forces us to throw some R, which then causes them to throw p, which makes us throw S, which makes them throw r, and around and around we go. If we’re both constantly adjusting our strategies to counter each other, do we ever reach any point where both of us are doing the best we can? Over time, do we tend towards a stable state of some kind?

Some Math Theorems

Before answering that question, there are a couple of things I’m going to ask you to trust me on; people smarter than me have actually proved these mathematically, but this isn’t a course in math proofs so I’m handwaving over that part of things. I hope you’ll forgive me for that.

First is that if the game mechanics are symmetric (that is, both players have exactly the same set of options and they work the same way), the solution will end up being the same for both players; the opponent’s probability of choosing Rock is the same as our probability.

Second is that each payoff must be the same as the other payoffs; that is, R = P = S; if any strategy is worth choosing at all, it will provide the same payoff as all other valid strategies, because if the payoff were instead any less than the others it would no longer be worth choosing (you’d just take something else with a higher payoff), and if it were any higher than the others you’d choose it exclusively and ignore the others. Thus, all potential moves that are worth taking have the same payoff.

Lastly, in symmetric zero-sum games specifically, the payoff for everything must be zero (because the payoffs are going to be the same for both players due to symmetry, and the only way for the payoffs to sum to zero and still be equal is if they’re both zero).

To summarize:

  • All payoffs that are worth taking at all, give an equal payoff to each other.
  • Symmetric zero-sum games have all payoffs equal to zero.
  • Symmetric games have the same solution for all players.

Finishing the RPS Solution

Let’s go back to our equations. Rock-Paper-Scissors is a symmetric zero-sum game, so:

  • R = P = S = 0.

Since the opponent must select exactly one throw, we also know the probabilities of their throw add up to 100%:

  • r + p + s = 1

From here we can solve the system of equations by substitution:

  • R = 0 = s-p, therefore p=s
  • P = 0 = r-s, therefore r=s
  • S = 0 = p-r, therefore p=r
  • r+p+s = r+r+r = 1, therefore r=1/3
  • Since r=p=s, p=1/3, s=1/3

So our solution is that the opponent should throw r, p and s each with probabilities of 1/3. This suggests that against a completely random opponent it doesn’t matter what we choose, our odds of winning are the same no matter what. Of course, the opponent knows this too, so if we choose an unbalanced strategy they can alter their throw ratio to beat us; our best strategy is also to choose each throw with 1/3 probability.

Note that in actual play, this does not mean that the best strategy is to actually play randomly (say, by rolling a die secretly before each throw)! As I’ve said before, when humans try to play randomly, they tend to not do a very good job of it, so in the real world the best strategy is still to play each throw about as often as any other, but at the same time which throw you choose depends on your ability to detect and exploit patterns in your opponent’s play, while at the same time masking any apparent patterns in your own play. So our solution of 1:1:1 does not say which throw you must choose at any given time (that is in fact where the skill of the game comes in), but just that over time we expect the optimal strategy to be a 1:1:1 ratio (because any deviation from that hands your opponent a strategy that wins more often over you until you readjust your strategy back to 1:1:1).

Solving RPS with Unequal Scoring

The previous example is all fine and good for Rock-Paper-Scissors, but how can we apply this to something a little more interesting? As our next step, let’s change the scoring mechanism. For example, in fighting games there’s a common intransitive system that attacks beat throws, throws beat blocks, and blocks beat attacks, but each of these does a different amount of damage, so they tend to have different results in the sense that each choice puts a different amount at risk. How does Rock-Paper-Scissors change when we mess with the costs?

Here’s an example. Suppose I make a new rule: every win using Rock counts double. You could just as easily frame it like this: in a fighting game, attacks do normal damage, and blocks do the same amount of damage as an attack (let’s say that a successful block allows for a counterattack), but that throws do twice as much damage as an attack or block. But let’s just say “every win with Rock counts double” for simplicity here. How does that affect our probabilities?

Again we start with a payoff table:

r           p          s

R          0          -1         +2

P          +1        0          -1

S          -2         +1        0

We then use this to construct our three payoff equations:

  • R = 2s-p
  • P = r-s
  • S = p-2r

Again, the game is zero-sum and symmetric, and both us and our opponent must choose exactly one throw, so we still have:

  • R = P = S = 0
  • r+p+s = 1

Again we solve:

  • R = 0 = 2s-p, therefore 2s = p
  • P = 0 = r-s, therefore r = s
  • S = 0 = p-2r, therefore 2r = p
  • r+p+s = r+2r+r = 1, therefore r=1/4
  • r=s, therefore s=1/4
  • 2r=p, therefore p=1/2

So here we get a surprising result: if we double the wins for Rock, the end result is that Paper gets chosen half of the time, while Rock and Scissors each get chosen a quarter of the time! This is an answer you’d be unlikely to come up with on your own without doing the math, but in retrospect it makes sense: since Scissors is such a risky play, players are less likely to choose it. If you know your opponent is not likely to play Scissors, Paper is more likely to either draw or win, so it is actually Paper (and not Rock) that is played more frequently.

So if you had a fighting game where a successful throw does twice as much damage as a successful attack or a successful block, but you do as much damage with a block or an attack, then you’d actually expect to see twice as many attack attempts as throws or blocks!

Solving RPS with Incomplete Wins

Suppose we factor resource costs into this. Fighting games typically don’t have a “cost” associated with performing a move (other than time, perhaps), but RTS games usually have actual resource costs to produce units.

Let’s take a simple RTS game where you have knights that beat archers, archers beat fliers, and fliers beat knights. Let’s say further that if you send one type of unit against the same type, they kill each other mutually so there is no net gain or loss on either side, but that it’s a little different with winners. Let’s say that when knights attack archers, they win, but they still lose 20% of their health to the initial arrow volley before they close the ranks. And let’s say against fliers, archers lose 40% of their health to counterattacks. But against knights, fliers take no damage at all, because the knights can’t do anything other than stand there and take it (their swords don’t work too well against enemies a hundred feet above them, dropping rocks down on them from above). Finally, let’s say that knights cost 50 gold, archers cost 75, and fliers cost 100. Now how does this work?

We start with the payoff table:

k                                              a                                              f

K         50-50=0                                  (-50*0.2)+75=+65                  -50

A         -75+(0.2*50)= -65                  75-75=0                                  (-75*0.4)+100=+70

F          +50                                          -100+(75*0.4)= -70                100-100=0

To explain: if we both take the same unit it ends up being zero, that’s just common sense, but really what’s going on is that we’re both paying the same amount and both lose the unit. So we both actually have a net loss, but relative to each other it’s still zero-sum (for example, with Knight vs Knight, we gain +50 Gold relative to the opponent by defeating their Knight, but also lose -50 Gold because our own Knight dies as well, and adding those results together we end up with a net gain of zero).

What about when our Knight meets an enemy Archer? We kill their Archer, which is worth a 75-gold advantage, but they also reduced our Knight’s HP by 20%, so you could say we lost 20% of our Knight cost of 50, which means we lost an equivalent of 10 gold in the process. So the actual outcome is we’re up by 65 gold.

When our Knight meets an enemy Flier, we lose the Knight so we’re down 50 gold. It didn’t hurt the opponent at all. Where does the Flier cost of 100 come in? In this case it doesn’t, really – the opponent still has a Flier after the exchange, so they still have 100 gold worth of Flier in play, they’ve lost nothing… at least, not yet!

So in the case of different costs or incomplete victories, the hard part is just altering your payoff table. From there, the process is the same:

  • K = 0k + 65a + (-50)f = 65a-50f
  • A = (-65)k + 0a + 70f = 70f-65k
  • F = 50k + (-70)a + 0f = 50k-70a
  • K = A = F = 0
  • k+a+f = 1

Solving, we find:

  • K = 0 = 65a-50f, therefore 65a = 50f
  • A = 0 = 70f-65k, therefore 70f = 65k, therefore f = (13/14)k
  • F = 0 = 50k-70a, therefore 50k = 70a, therefore a = (10/14)k
  • k+a+f = k + (10/14)k + (13/14)k = (37/14)k = 1, therefore k = 14/37
  • f = (13/14)k = (13/14)(14/37), therefore f = 13/37
  • a = (10/14)k = (10/14)(14/37), therefore a = 10/37

In this case you’d actually see a pretty even mix of units, with knights being a little more common and archers a little less. If you wanted fliers to be more rare you could play around with their costs, or allow knights to do a little bit of damage to them, or something.

Solving RPS with Asymmetric Scoring

So far we’ve assumed a game that’s symmetric: we both have the exact same set of throws, and we both win or lose the same amount according to the same set of rules. But not all intransitive games are perfectly symmetric. For example, suppose I made a Rock-Paper-Scissors variant where each round, I flip up a new card that alters the win rewards. This round, my card says that my opponent gets two points for a win with Rock, but I don’t (I would just score normally). How does this change things?

It actually complicates the situation a great deal, because now both players must figure out the probabilities of their opponents’ throws, and those probabilities may not be the same anymore! Let’s say that Player A has the double-Rock-win bonus, and Player B does not. What’s the optimal strategy for both players? And how much of an advantage does this give to Player A, if any? Let’s find out by constructing two payoff tables.

Player A’s payoff table looks like this:

rB        pB        sB

RA       0          -1         +2

PA       +1        0          -1

SA       -1         +1        0

Player B’s payoff table looks like this:

rA        pA       sA

RB       0          -1         +1

PB       +1        0          -1

SB       -2         +1        0

Here we can assume that RA=PA=SA and RB=PB=SB, and also that rA+pA+sA = rB+pB+sB = 1. However, we cannot assume that RA=PA=SA=RB=PB=SB=0, because we don’t actually know that the payoffs for players A and B are equal; in fact, intuition tells us they probably aren’t! We now have this intimidating set of equations:

  • RA = 2sB – pB
  • PA = rB – sB
  • SA = pB – rB
  • RB = sA – pA
  • PB = rA – sA
  • SB = pA – 2rA
  • RA = PA = SA
  • RB = PB = SB
  • rA + pA + sA = 1
  • rB + pB + sB = 1

We could do this the hard way through substitution, but an easier way is to use matrices. Here’s how it works: we rewrite the payoff tables as matrices. Here’s the first one:

RA       0          -1         +2

[           PA       +1        0          -1         ]

SA       -1         +1        0

Here, the left column represents the left side of the first three equations above, the second column is rA, the third column is pA, and the fourth column is sA. Two changes for clarity: first, let’s move the leftmost column to the right instead, which will make it easier to work with; and second, since RA=PA=SA, let’s just replace them all with a single variable X, which represents the net payoff for Player A:

0          -1         +2        X

[           +1        0          -1         X         ]

-1         +1        0          X

This is just a shorthand way of writing down these three equations, omitting the variable names but keeping them all lined up in the same order so that each column represents a different variable:

0rB      -1pB    +2sB    = X

1rB      +0pB   -1sB     = X

-1rB     +1pB   +0sB    = X

Algebra tells us we can multiply everything in an equation by a constant and it’s still true (which means we could multiply any row of the matrix by any value and it’s still valid, as long as we multiply all four entries in the row by the same amount). Algebra also tells us that we can add both sides of two equations together and the result is still true, meaning we could add each entry of two rows together and the resulting row is still a valid entry (which we could use to add to the rows already there, or even replace an existing row with the new result). And we can also rearrange the rows, because all of them are still true no matter what order we put them in. What we want to do here is put this matrix in what’s called triangular form, that is, of the form where everything under the diagonal is zeros, and the diagonals themselves (marked here with an asterisk) have to be non-zero:

*          ?          ?          ?

[           0          *          ?          ?          ]

0          0          *          ?

So, first we reorder them by swapping the top and middle rows:

-1         +1        0          X

[           0          -1         +2        X         ]

+1        0          -1         X

To eliminate the +1 in the bottom row, we add the top and bottom rows together and replace the bottom row with that:

-1         +1        0          X

+          +1        0          -1         X

0          +1        -1         2*X

Our matrix is now:

-1         +1        0          X

[           0          -1         +2        X         ]

0          +1        -1         2*X

Now we want to eliminate the +1 on the bottom row, so we add the middle and bottom rows together and replace the bottom row with the result:

-1         +1        0          X

[           0          -1         +2        X         ]

0          0          +1        3*X

Now we can write these in the standard equation forms and solve, going from the bottom up, using substitution:

  • +1(sB) = 3*X, therefore sB = 3*X
  • -1(pB) +2(sB) = X, therefore -1(pB)+2(3*X) = X, therefore pB = 5*X
  • -1(rB) + 1(pB) = X, therefore rB = 4*X

At this point we don’t really need to know what X is, but we do know that the ratio for Player B is 3 Scissors to 5 Paper to 4 Rock. Since sB+pB+rB = 1, this means:

rB = 4/12         pB = 5/12        sB = 3/12

We can use the same technique with the second set of equations to figure out the optional ratio for Player A. Again, the payoff table is:

rA        pA       sA

RB       0          -1         +1

PB       +1        0          -1

SB       -2         +1        0

This becomes the following matrix:

0          -1         +1        RB

[           +1        0          -1         PB       ]

-2         +1        0          SB

Again we reorganize, and since RB=PB=SB, let’s call these all a new variable Y (we don’t use X to avoid confusion with the previous X; remember that the payoff for one player may be different from the other here). Let’s swap the bottom and top this time, along with replacing the payoffs by Y:

-2         +1        0          Y

[           +1        0          -1         Y         ]

0          -1         +1        Y

To eliminate the +1 in the center row, we have to multiply the center row by 2 before adding it to the top row (or, multiply the top row by 1/2, but I find it easier to multiply by whole numbers than fractions).

-2         +1        0          Y

+          +1*2    0*2      -1*2     Y*2

0          +1        -2         Y*3

Our matrix is now:

-2         +1        0          Y

[           0          +1        -2         Y*3     ]

0          -1         +1        Y

Adding second and third rows to eliminate the -1 in the bottom row we get:

-2         +1        0          Y

[           0          +1        -2         Y*3     ]

0          0          -1         Y*4

Again working backwards and substituting:

  • sA = -Y*4
  • pA – 2sA = Y*3, therefore pA = -Y*5
  • -2rA + pA = Y, therefore -2rA = 6Y, therefore rA = -Y*3

Now, it might seem kind of strange that we get a bunch of negative numbers here when we got positive ones before. This is probably just a side effect of the fact that the average payoff for Player A is probably positive while Player B’s is probably negative, but in either case it all factors out because we just care about the relative ratio of Rock to Paper to Scissors. For Player A, this is 3 Rock to 4 Scissors to 5 Paper:

rA = 3/12         pA = 5/12        sA = 4/12

This is slightly different from Player B’s optimal mix:

rB = 4/12         pB = 5/12        sB = 3/12

Now, we can use this to figure out the actual advantage for Player A. We could do this through actually making a 12×12 chart and doing all 144 combinations and counting them up using probability, or we could do a Monte Carlo simulation, or we could just plug these values into our existing equations. For me that last one is the easiest, because we already have a couple of equations from earlier that directly relate these together:

sA = -Y*4, therefore Y = -1/12

rB = X*4, therefore X = +1/12

We know that RA=PA=SA and RB=PB=SB, so this means the payoff for Player A is +1/12 and for Player B it’s -1/12. This makes a lot of sense and acts as a sanity check: since this is still a zero-sum game, we know that the payoff for A must be equal to the negative payoff for B. In a symmetric game both would have to be zero, but this is not symmetric. That said, it turns out that if both players play optimally, the advantage is surprisingly small: only one extra win out of every 12 games if both play optimally!

Solving Extended RPS

So far all of the relationships we’ve analyzed have had only three choices. Can we use the same technique with more? Yes, it just means we do the same thing but more of it.

Let’s analyze the game Rock-Paper-Scissors-Lizard-Spock. In this game, Rock beats Scissors and Lizard; Paper beats Rock and Spock; Scissors beats Paper and Lizard; and Lizard beats Spock (and Lizard beats Paper, Spock beats Scissors and Rock). Our payoff table is (with ‘k’ for Spock since there’s already an ‘s’ for Scissors, and ‘z’ for Lizard so it doesn’t look like the number one):

r           p          s           z           k

R          0          -1         +1        +1        -1

P          +1        0          -1         -1         +1

S          -1         +1        0          +1        -1

Z          -1         +1        -1         0          +1

K         +1        -1         +1        -1         0

We also know r+p+s+z+k=1, and R=P=S=L=K=0. We could solve this by hand as well, but there’s another way to do this using Excel which makes things slightly easier sometimes.

First, you would enter in the above matrix in a 5×5 grid of cells somewhere. You’d also need to add another 1×5 column of all 1s (or any non-zero number, really) to represent the variable X (the payoff) to the right of your 5×5 grid. Then, select a new 1×5 column that’s blank (just click and drag), and then enter this formula in the formula bar:

=MMULT(MINVERSE(A1:E5),F1:F5)

For the MINVERSE parameter, put the top left and lower right cells of your 5×5 grid (I use A1:E5 if the grid is in the extreme top left corner of your worksheet). For the final parameter (I use F1:F5 here), give the 1×5 column of all 1s. Finally, and this is important, press Ctrl+Shift+Enter when you’re done typing in the formula (not just Enter). This propagates the formula to all five cells that you’ve highlighted and treats them as a unified array, which is necessary.

One warning is that this method does not always work; in particular, if there are no solutions or infinite solutions, it will give you #NUM! as the result instead of an actual number. In fact, if you enter in the payoff table above, it will give you this error; by setting one of the entries to something very slightly different (say, changing one of the +1s to +0.999999), you will generate a unique solution that is only off by a tiny fraction, so round it to the nearest few decimal places for the “real” answer. Another warning is that anyone who actually knows a lot about math will wince when you do this, because it’s kind of cheating and you’re really not supposed to solve a matrix like that.

Excel gives us a solution of 0.2 for each of the five variables, meaning that it is equally likely that the opponent will choose any of the five throws. We can then verify that yes, in fact, R=P=S=L=K=0, so it doesn’t matter which throw we choose, any will do just as well as any other if the opponent plays randomly with equal chances of each throw.

Solving Extended RPS with Unequal Relationships

Not all intransitive mechanics are equally balanced. In some cases, even without weighted costs, some throws are just better than other throws. For example, let’s consider the unbalanced game of Rock-Paper-Scissors-Dynamite. The idea is that with this fourth throw, Dynamite beats Rock (by explosion), and Scissors beats Dynamite (by cutting the wick). People will argue which should win in a contest between Paper and Dynamite, but for our purposes let’s say Dynamite beats Paper. In theory this makes Dynamite and Scissors seem like really good choices, because they both beat two of the three other throws. It also makes Rock and Paper seem like poor choices, because they both lose to two of the other three throws. What does the actual math say?

Our payoff table looks like this:

r           p          s           d

R          0          -1         +1        -1

P          +1        0          -1         -1

S          -1         +1        0          +1

D         +1        +1        -1         0

Before we go any further, we run into a problem: if you look closely, you’ll see that Dynamite is better than or equal to Paper in every situation. That is, for every entry in the P row, it is either equal or less than the corresponding entry in the D row (and likewise, every entry in the p column is worse or equal to the d column). Both Paper and Dynamite lose to Scissors, both beat Rock, but against each other Dynamite wins. In other words, there is no logical reason to ever take Paper because whenever you’d think about it, you would take Dynamite instead! In game theory terms, we say that Paper is dominated by Dynamite. If we tried to solve this matrix mathematically like we did earlier, we would end up with some very strange answers and we’d quickly find it was unsolvable (or that the answers made no sense, like a probability for r, p, s or d that was less than zero or greater than one). The reason it wouldn’t work is that at some point we would make the assumption that R=P=S=D, but in this case that isn’t true – the payoff for Paper must be less than the payoff for Dynamite, so it is an invalid assumption. To fix this, before proceeding, we must systematically eliminate all choices that are dominated. In other words, remove Paper as a choice.

The new payoff table becomes:

R          s           d

R          0          +1        -1

S          -1         0          +1

D         +1        -1         0

We check again to see if, after the first set of eliminations, any other strategies are now dominated (sometimes a row or column isn’t strictly dominated by another, until you cross out some other dominated choices, so you do have to perform this procedure repeatedly until you eliminate everything). Again, to check for dominated strategies, you must compare every pair of rows to see if one dominates another, and then every pair of columns in the same way. Yes, this means a lot of comparisons if you give each player ten or twelve choices!

In this case eliminating Paper was all that was necessary, and in fact we’re back to the same exact payoff table as with the original Rock-Paper-Scissors, but with Paper being “renamed” to Dynamite. And now you know, mathematically, why it never made sense to add Dynamite as a fourth throw.

Another Unequal Relationship

What if instead we created a new throw that wasn’t weakly dominated, but that worked a little different than normal? For example, something that was equivalent to Scissors except it worked in reverse order, beating Rock but losing to Paper? Let’s say… Construction Vehicle (C), which bulldozes (wins against) Rock, is given a citation by (loses against) Paper, and draws with Scissors because neither of the two can really interact much. Now our payoff table looks like this:

r           p          s           c

R          0          -1         +1        -1

P          +1        0          -1         +1

S          -1         +1        0          0

C         +1        -1         0          0

Here, no single throw is strictly better than any other, so we start solving. We know r+p+s+c=1, and the payoffs R=P=S=D=0. Our matrix becomes:

0          -1         +1        -1         0

+1        0          -1         +1        0

[           -1         +1        0          0          0          ]

+1        -1         0          0          0

Rearranging the rows to get non-zeros along the diagonal, we get this by reversing the order from top to bottom:

+1        -1         0          0          0

-1         +1        0          0          0

[           +1        0          -1         +1        0          ]

0          -1         +1        -1         0

Zeroing the first column by adding the first two rows, and subtracting the third from the first, we get:

+1        -1         0          0          0

0          0          0          0          0

[           0          -1         +1        -1         0          ]

0          -1         +1        -1         0

Curious! The second row is all zeros (which gives us absolutely no useful information, as it’s just telling us that zero equals zero), and the bottom two rows are exactly the same as one another (which means the last row is redundant and again tells us nothing extra). We are left with only two rows of useful information. In other words, we have two equations (three if you count r+p+s+c=1) and four unknowns.

What this means is that there is actually more than one valid solution here, potentially an infinite number of solutions. We figure out the solutions by hand:

  • r-p=0, therefore r=p
  • -p+s-c=0, therefore c=s-p

Substituting into r+p+s+c=1, we get:

  • p+p+s+(s-p)=1, therefore p+2s=1, therefore p=1-2s (and therefore, r=1-2s).

Substituting back into c=s-p, we get c=s-1+2s, therefore c=3s-1.

We have thus managed to put all three other variables in terms of s:

  • p=1-2s
  • r=1-2s
  • c=3s-1

So it would seem at first that there are in fact an infinite number of solutions: choose any value for s, then that will give you the corresponding values for p, r, and c. But we can narrow down the ranges even further.

How? By remembering that all of these variables are probabilities, meaning they must all be in the range of 0 (if they never happen) to 1 (if they always happen). Probabilities can never be less than zero or greater than 1. This lets us limit the range of s. For one thing, we know it must be between 0 and 1.

From the equation c=3s-1, we know that s must be at least 1/3 (otherwise c would be negative) and s can be at most 2/3 (otherwise c would be greater than 100%). Looking instead at p and r, we know s can range from 0 up to 1/2. Combining the two ranges, s must be between 1/3 and 1/2. This is interesting: it shows us that no matter what, Scissors is an indispensible part of all ideal strategies, being used somewhere between a third and half of the time.

At the lower boundary condition (s=1/3), we find that p=1/3, r=1/3, c=0, which is a valid strategy. At the upper boundary (s=1/2), we find p=0, r=0, c=1/2. And we could also opt for any strategy in between, say s=2/5, p=1/5, r=1/5, c=1/5.

Are any of these strategies “better” than the others, such that a single one would win more than the others? That unfortunately requires a bit more game theory than I wanted to get into today, but I can tell you the answer is “it depends” based on certain assumptions about how rational your opponents are, whether the players are capable of making occasional mistakes when implementing their strategy, and how much the players know about how their opponents play, among other things. For our purposes, we can say that any of these is as good as any other, although I’m sure professional game theorists could philosophically argue the case for certain values over others.

Also, for our purposes, we could say that Construction Vehicle is probably not a good addition to the core game of Rock-Paper-Scissors, as it allows one winning strategy where the throw of C can be completely ignored, and another winning strategy where both P and R are ignored, making us wonder why we’re wasting development resources on implementing two or three throws that may never even see play once the players are sufficiently skilled!

Solving the Game of Malkav

So far we’ve systematically done away with each of our basic assumptions: that a game has a symmetric payoff, that it’s zero-sum, that there are exactly three choices. There’s one other thing that we haven’t covered in the two-player case, and that’s what happens if the players have a different selection of choices – not just an asymmetric payoff, but an asymmetric game. If we rely on there being exactly as many throws for one player as the other, what happens when one player has, say, six different throws when their opponent has only five? It would seem such a problem would be unsolvable for a unique solution (there are six unknowns and only five equations, right?) but in fact it turns out we can use a more powerful technique to solve such a game uniquely, in some cases.

Let us consider a card called “Game of Malkav” from an obscure CCG that most of you have probably never heard of. It works like this: all players secretly and simultaneously choose a number. The player who played this card chooses between 1 and 6, while all other players choose between 1 and 5. Each player gains as much life as the number they choose… unless another player chose a number exactly one less, in which case they lose that much life instead. So for example if you choose 5, you gain 5 life, unless any other player chose 4. If anyone else chose 4, you lose 5 life… and they gain 4, unless someone else also chose 3, and so on. This can get pretty complicated with more players, so let’s simply consider the two-player case. Let’s also make the simplifying assumption that the game is zero-sum, and that you gaining 1 life is equivalent in value to your opponent losing 1 life (I realize this is not necessarily valid, and this will vary based on relative life totals, but at least it’s a starting point for understanding what this card is actually worth).

We might wonder, what is the expected payoff of playing this card, overall? Does the additional option of playing 6 when your opponent can only play up to 5? What is the best strategy, and what is the expected end result? In short, is the card worth playing… and if so, when you play it, how do you decide what to choose?

As usual, we start with a payoff table. Let’s call the choices P1-P6 (for the Player who played the card), and O1-O5 (for the Opponent):

O1       O2       O3       O4       O5

P1        0          +3        -2         -3         -4

P2        -3         0          +5        -2         -3

P3        +2        -5         0          +7        -2

P4        +3        +2        -7         0          +9

P5        +4        +3        +2        -9         0

P6        +5        +4        +3        +2        -11

We could try to solve this, and there do not appear to be any dominated picks for either player, but we will quickly find that the numbers get very hairy very fast… and also that it ends up being unsolvable for reasons that you will find if you try. Basically, with 6 equations and 5 unknowns, there is redundancy… except in this case, no rows cancel, and instead you end up with at least two equations that contradict each other. So there must actually be some dominated strategies here… it’s just that they aren’t immediately obvious, because there are a set of rows or columns that are collectively dominated by another set, which is much harder to just find by looking. How do we find them?

We start by finding the best move for each player, if they knew what the opponent was doing ahead of time. For example, if the opponent knows we will throw P1, their best move is O5 (giving them a net +4 and us a net -4). But then we continue by reacting to their reaction: if the player knows the opponent will choose O5, their best move is P4. But against P4, the best move is O3. Against O3, the best move is P2. Against P2, there are two equally good moves: O1 and O5, so we consider both options:

  • Against O5, the best response is P4, as before (and we continue around in the intransitive sequence O5->P4->O3->P2->O5 indefinitely).
  • Against O1, the best response is P6. Against P6, the best response is O5, which again brings us into the intransitive sequence O5->P4->O3->P2->O1->P6->O5.

What if we start at a different place, say by initially throwing P3? Then the opponent’s best counter is O2, our best answer to that is P6, which then leads us into the O5->P4->O3->P2->O1->P6->O5 loop. If we start with P5, best response is O4, which gets the response P3, which we just covered. What if we start with O1, O2, O3, O4, O5, P2, P4 or P6? All of them are already accounted for in earlier sequences, so there’s nothing more to analyze.

Thus, we see that no matter what we start with, eventually after repeated play only a small subset of moves actually end up being part of the intransitive nature of this game because they form two intransitive loops (O5/P4/O3/P2, and O5/P4/O3/P2/O1/P6). Looking at these sequences, the only choices ever used by either player are O1, O3, O5 and P2, P4, P6. Any other choice ends up being strictly inferior: for example, at any point where it is advantageous to play P6 (that is, you are expecting a positive payoff), there is no reason you would prefer P5 instead (even if you expect your opponent to play O5, your best response is not P5, but rather P4).

By using this technique to find intransitive loops, you can often reduce a larger number of choices to a smaller set of viable ones… or at worst, you can prove that all of the larger set are in fact viable. Occasionally you will find a game (Prisoner’s Dilemma being a famous example, if you’ve heard of that) where there are one or more locations in the table that are equally advantageous for both players, so that after repeated play we expect all players to be drawn to those locations; game theorists call these Nash Equilibriums after the mathematician who first wrote about them. Not that you need to care.

So in this case, we can reduce the table to the set of meaningful values:

O1       O3       O5

P2        -3         +5        -3

P4        +3        -7         +9

P6        +5        +3        -11

From there we solve, being aware that this is not symmetric. Therefore, we know that O1=O3=O5 and P2=P4=P6, but we do not know if they are all equal to zero or if one is the negative of the other. (Presumably, P2 is positive and O1 is negative, since we would expect the person playing this card to have an advantage, but we’ll see.)

We construct a matrix, using X to stand for the Payoff for P2, P4 and P6:

-3         +5        -3         X

[           +3        -7         +9        X         ]

+5        +3        -11       X

This can be reduced to triangular form and then solved, the same as earlier problems. Feel free to try it yourself! I give the answer below.

Now, solving that matrix gets you the probabilities O1, O3 and O5, but in order to learn the probabilities of choosing P2, P4 and P6 you have to flip the matrix across the diagonal so that the Os are all on the left and the Ps are on the top (this is called a transpose). In this case we’d also need to make all the numbers negative, since such a matrix is from Player O’s perspective and therefore has the opposite payoffs:

+3        -3         -5         Y

[           -5         +7        -3         Y         ]

+3        -9         +11      Y

This, too, can be solved normally. If you’re curious, the final answers are roughly:

P2:P4:P6 = 49% : 37% : 14%

O1:O3:O5 = 35% : 41% : 24%

Expected payoff to Player P (shown as “X” above): 0.31, and the payoff for Player O (shown as “Y”) is the negative of X: -0.31.

In other words, in the two-player case of this game, when both players play optimally, the player who initiated this card gets ahead by an average of less than one-third of a life point – so while we confirm that playing the card and having the extra option of choosing 6 is in fact an advantage, it turns out to be a pretty small one. On the other hand, the possibility of sudden large swings may make it worthwhile in actual play (or maybe not), depending on the deck you’re playing. And of course, the game gets much more complicated in multi-player situations that we haven’t considered here.

Solving Three-Player RPS

So far we’ve covered just about every possible case for a two-player game, and you can combine the different methods as needed for just about any application, for any kind of two-player game. Can we extend this kind of analysis to multiple players? After all, a lot of these games involve more than just a single head-to-head, they may involve teams or free-for-all environments.

Teams are straightforward, if there’s only two teams: just treat each team as a single “player” for analysis purposes. Free-for-all is a little harder because you have to manage multiple opponents, and as we’ll see the complexity tends to explode with each successive player. Three-player games are obnoxious but still quite possible to solve; four-player games are probably the upper limit of what I’d ever attempt by hand using any of the methods I’ve mentioned today. If you have a six-player free-for-all intransitive game where each player has a different set of options and a massive payoff matrix that gives payoffs to each player for each combination… well, let’s just say it can be done, probably requiring the aid of a computer and a professional game theorist, but at this point you wouldn’t want to. One thing that game theorists have learned is that the more complex the game, the longer it tends to take human players in a lab to converge on the optimal strategies… which means for a highly complex game, playtesting will give you a better idea of how the game actually plays “in the field” than doing the math to prove optimal solutions, because the players probably won’t find the optimal solutions anyway. Thus, for a complicated system like that, you’re better off playtesting… or more likely, you’re better off simplifying your mechanics!

Let’s take a simple multi-player case: three-player Rock-Paper-Scissors. We define the rules like this: if all players make the same throw or if all players each choose different throws, we call it a draw. If two players make the same throw and the third player chooses a different one (“odd man out”), then whoever throws the winning throw gets a point from each loser. So if two players throw Rock and the third throws Scissors, each of the Rock players get +1 point and the unfortunate Scissors player loses 2 points. Or if it’s reversed, one player throwing Rock while two throw Scissors, the one Rock player gets +2 points while the other two players lose 1 point each. (The idea behind these numbers is to keep the game zero-sum, for simplicity, but you could use this method to solve for any other scoring mechanism.)

Of course, we know because of symmetry that the answer to this is 1:1:1, just like the two-player version. So let’s throw in the same wrinkle as before: wins with Rock count double (which also means, since this is zero-sum, that losses with Scissors count double). In the two-player case we found the solution of Rock=Scissors=1/4, Paper=1/2. Does this change at all in the three-player case, since there are now two opponents which make it even more dangerous to throw Scissors (and possibly even more profitable to throw Rock)?

The trick we need to use here to make this solvable is to look at the problem from a single player’s perspective, and treat all the opponents collectively as a single opponent. In this case, we end up with a payoff table that looks like this:

rr          rp         rs         pp        ps         ss

R          0          -1         +2        -2         0          +4

P          +2        +1        0          0          -1         -2

S          -4         0          -2         +2        +1        0

You might say: wait a minute, there’s three variables here and six unknowns (two ‘r’ and two ‘p’ and two ‘s’, one for each opponents) which means this isn’t uniquely solvable. But the good news is that this game is symmetric, so we actually can solve it, because the probabilities of the opponents are taken together and multiplied (recall that we multiply probabilities when we need two independent things to happen at the same time). One thing to be careful of: there’s actually nine possibilities for the opponents, not six, but some of them are duplicated. The actual table is like this:

rr          rp         pr         rs         sr         pp        ps         sp         ss

R          0          -1         -1         +2        +2        -2         0          0          +4

P          +2        +1        +1        0          0          0          -1         -1         -2

S          -4         0          0          -2         -2         +2        +1        +1        0

All this means is that when using the original matrix and writing it out in longhand form, we have to remember to multiply rp, rs and ps by 2 each, since there are two ways to get each of them (rp and pr, for example). Note that I haven’t mentioned which of the two opponents is which; as I said earlier, it doesn’t matter because this game is symmetric, so the probability of any player throwing Rock or Scissors is the same as that of the other players.

This payoff table doesn’t present so well in matrix form since we’re dealing with two variables rather than one. One way to do this would be to actually split this into three mini-matrices, one for each of the first opponent’s choices, and then comparing each of those to the second opponent’s choice… then solving each matrix individually, and combining the three solutions into one at the end. That’s a lot of work, so let’s try to solve it algebraically instead, writing it out in longhand form and seeing if we can isolate anything by combining like terms:

  • Payoff for R = -2rp+4rs-2pp+4ss = 0
  • Payoff for P = 2rr+2rp-2sp-2ss = 0
  • Payoff for S = -4rr-4rs+2pp+2sp = 0
  • r+s+p=1 (as usual)

The “=0” at the end is because we know this game is symmetric and zero sum.

Where do you start with something like this? A useful starting place is usually to use r+s+p=1 to eliminate one of the variables by putting it in terms of the other, then substituting into the three Payoff equations above. Eliminating Rock (r=1-s-p) and substituting, after multiplying everything out and combining terms, we get:

  • -4pp+2ps-2p+4s = 0
  • -2p-4s+2 = 0
  • 2pp-6ps+8p+4s-4 = 0

We could isolate either p or s in the first or last equations by using the Quadratic Formula (you know, “minus b plus or minus the square root of 4ac, all divided by 2a”). This would yield two possible solutions, although in most cases you’ll find you can eliminate one as it strays outside the bounds of 0 to 1 (which r, p and s must all lie within, as they are all probabilities).

However, the middle equation above makes our lives much easier, as we can solve for p or s in terms of the other:

  • p = 1-2s

Substituting that into the other two equations gives us the same result, which lets us know we’re probably on the right track since the equations don’t contradict each other:

  • 20ss-26s+6 = 0

Here we do have to use the dreaded Quadratic Formula. Multiplying everything out, we find s=(26+/-14)/40… that is, s=100% or s=30%. Are both of these valid solutions? To find out, we evaluate p=1-2s and any other equation with r.

For s=30%, we find p=40% and r=30%, so that is a valid solution. For s=100%, we get p= -100% and r=100%, which is invalid (p cannot be below zero), leaving us with only a single valid solution: r:p:s = 3:4:3.

It turns out that having multiple players does have an effect on the “rock wins count double” problem, but it might not be the result we expected; with three players it is actually closer to 1:1:1 than it was with two players! Perhaps it’s because the likelihood of drawing with one player choosing Rock, one choosing Paper and one choosing Scissors makes Scissors less risky than it would be in a two-player game, because even if one opponent chooses Rock, the other might choose Paper and turn your double-loss into a draw.

Summary

This week we looked at how to evaluate intransitive mechanics using math. It’s probably the most complicated thing we’ve done, as it brings together the cost curves of transitive mechanics, probability, and statistics which is why I’m doing it at the very end of the course only after covering those! To solve these, you go through this process:

  • Make a payoff table.
  • Eliminate all dominated choices from both players (by comparing all combinations of rows and columns and seeing if any pair contains one row or column that is strictly better or equal to another). Keep doing that until all remaining choices are viable.
  • Find all intransitive “loops” through finding the best opposing response to each player’s initial choice.
  • Calculate the payoffs of each choice for one of the players, setting the payoffs equal to the same variable X. In a zero-sum game, X for one player will be the negative of the X for the other player. In a symmetric game, X is zero, so just set all the payoffs to zero instead.
  • Add one more equation, that the probabilities of all choices sum to 1.
  • Using algebraic substitution, triangular-form matrices, Excel, or any other means you have at your disposal, solve for as many variables as you can. If you manage to learn the value of X, it tells you the expected gain (or loss) for that player. Summing all players’ X values tells you if the game is zero-sum (X1+X2+…=0), positive-sum (>0) or negative-sum (<0), and by how much overall.
  • If you can find a unique value for each choice that is between 0 and 1, those are the optimal probabilities with which you should choose each throw. For asymmetric games, you’ll need to do this individually for each player. This is your solution.
  • For games with more than two players each making a simultaneous choice, choose one player’s payoffs as your point of reference, and treat all other players as a single combined opponent. The math gets much harder for each player you add over two. After all, with two players all equations are strictly linear; with three players you have to solve quadratic equations, with four players there are cubic equations, with five players you see quartic equations, and so on.

I should also point out that the field of game theory is huge, and it covers a wide variety of other games we haven’t covered here. In particular, it’s also possible to analyze games where players choose sequentially rather than simultaneously, and also games where players are able to negotiate ahead of time, making pleas or threats, coordinating their movements or so on (as might be found in positive-sum games where two players can trade or otherwise cooperate to get ahead of their opponents). These are beyond the scope of this course, but if you’re interested, I’ll give a couple of references at the end.

If You’re Working on a Game Now…

Think about your game and whether it features any intransitive mechanics. If not, ask yourself if there are any opportunities or reasons to take some transitive mechanics and convert them to intransitive (for example, if you’re working on an RPG, maybe instead of just having a sequence of weapons where each is strictly better than the previous, perhaps there’s an opportunity at one point in the game to offer the player a choice of several weapons that are all equally good overall, but each is better than another in different situations).

If you do have any intransitive mechanics in your game, find the one that is the most prominent, and analyze it as we did today. Of the choices you offer the player, are any of them dominant or dominated choices? What is the expected ratio of how frequently the player should choose each of the available options, assuming optimal play? Is it what you expected? Is it what you want?

Homework

For practice, feel free to start by doing the math by hand to confirm all of the problems I’ve solved here today, to get the hang of it. When you’re comfortable, here’s a game derived from a mini-game I once saw in one of the Suikoden series of RPGs (I forget which one). The actual game there used 13 cards, but for simplicity I’m going to use a 5-card deck for this problem. Here are the rules:

  • Players: 2
  • Setup: Each player takes five cards, numbered 1 through 5. A third stack of cards numbered 1 through 5 is shuffled and placed face-down as a draw pile.
  • Progression of play: At the beginning of each round, a card from the draw pile is flipped face up; the round is worth a number of points equal to the face value of that card. Both players then choose one of their own cards, and play simultaneously. Whoever played the higher card gets the points for that round; in case of a tie, no one gets the points. Both players set aside the cards they chose to play in that round; those cards may not be used again.
  • Resolution: The game ends after all five rounds have been played, or after one player reaches 8 points. Whoever has the most points wins.

It is easy to see that there is no single dominant strategy. If the opponent plays completely randomly (20% chance of playing each card), you come out far ahead by simply playing the number in your hand that matches the points each round is worth (so play your 3 if a 3 is flipped, play your 4 on a 4, etc.). You can demonstrate this in Excel by shuffling the opponent’s hand so that they are playing randomly, and comparing that strategy to the “point matching” strategy I’ve described here, and you’ll quickly find that point matching wins the vast majority of the time. (You could also compute the odds exhaustively for this, as there are only 120 ways to rearrange 5 cards, if you wanted.)

Does that mean that “matching points” is the dominant strategy? Certainly not. If I know my opponent is playing this strategy, I can trounce them by playing one higher than matching on all cards, and playing my 1 card on the 5-point round. I’ll lose 5 points, but I’ll capture the other 10 points for the win. Does the “one higher” strategy dominate? No, playing “two higher” will beat “one higher”… and “three higher” will beat “two higher”, “four higher” will beat “three higher”, and “matching points” beats “four higher” – an intransitive relationship. Essentially, the goal of this game is to guess what your opponent will play, and then play one higher than that (or if you think your opponent is playing their 5, play your 1 on that).

Since each strategy is just as good as any other if choosing between those five, you might think that means you can do no worse than choosing one of those strategies at random… except that as we saw, if you play randomly, “matching points” beats you! So it is probably true that the optimal strategy is not 1:1:1:1:1, but rather some other ratio. Figure out what it is.

If you’re not sure where to start, think of it this way: for any given play there are only five strategies: matching, one higher, two higher, three higher, or four higher. Figure out the payoff table for following each strategy across all five cards. You may shift strategies from round to round, like rock-paper-scissors, but with no other information on the first round you only have five choices, and each of those choices may help or hurt you depending on what your opponent does. Therefore, for the first play at least, you would start with this payoff table (after all, for the first round there are only five strategies you can follow, since you only have five cards each):

matching           match+1           match+2           match+3           match+4

M         0                      -5                     +3                    +9                    +13

M+1    +5                    0                      -7                     -1                     +3

M+2    -3                     +7                    0                      -9                     -10

M+3    -9                     +1                    +9                    0                      -11

M+4    -13                   -3                     +10                  +11                  0

References

Here are a pair of references that I found helpful when putting together today’s presentation.

“Game Architecture and Design” (Rollings & Morris), Chapters 3 and 5. This is where I first heard of the idea of using systems of equations to solve intransitive games. I’ve tried to take things a little farther today than the authors did in this book, but of course that means the book is a bit simpler and probably more accessible than what I’ve done here. And there is, you know, the whole rest of the book dealing with all sorts of other topics. I’m still in the middle of reading it so I can’t give it a definite stamp of personal approval at this time, but neither can I say anything bad about it, so take a look and decide for yourself.

“Game Theory: a Critical Text” (Heap & Varoufakis). I found this to be a useful and fairly accessible introduction to Game Theory. My one warning would be that in the interest of brevity, the authors tend to define acronyms and then use them liberally in the remainder of the text. This makes it difficult to skip ahead, as you’re likely to skip over a few key definitions, and then run into sentences which have more unrecognizable acronyms than actual words!

Level 8: Metrics and Statistics

August 25, 2010

Readings/Playings

See “additional resources” at the end of this blog post for a number of supplemental readings.

This Week

One of the reasons I love game balance is that different aspects of balance touch all these other areas of game development. When we were talking about pseudorandom numbers, that’s an area where you get dangerously close to programming. Last week we saw how the visual design of a level can be used as a game reward or to express progression to the player, which is game design but just this side of art. This week, we walk right up to the line where game design intersects business.

This week I’m covering two topics: statistics, and metrics. For anyone who isn’t familiar with what these mean, ‘metrics’ just means measurements, so it means you’re actually measuring or tracking something about your game; leaderboards and high score lists are probably the best-known metrics because they are exposed to the players, but we also can use a lot of metrics behind the scenes to help design our games better. Once we collect a lot of metrics, once we take these measurements, they don’t do anything on their own until we actually look at them and analyze them to learn something. ‘Statistics’ is just one set of tools we can use to get useful information from our metrics. Even though we collect metrics first and then use statistics to analyze them, I’m actually going to talk about statistics first because it’s useful to know how your tools work before you decide what data to capture.

Statistics

People who have never done statistics before think of it as an exact science. It’s math, math is pure, and therefore you should be able to get all of the right answers all the time. In reality, it’s a lot messier, and you’ll see that game designers (and statisticians) disagree about the core principles of statistics even more than they disagree about the core principles of systems design, if such a thing is possible.

What is statistics, and how is it different from probability?

In probability, you’re given a set of random things, and told exactly how random they are and what the nature of that randomness is, and your goal is to try to predict what the data will look like when you set those random things in motion. Statistics is kind of the opposite: here you’re given the data up front, and you’re trying to figure out the nature of the randomness that caused that data.

Probability and statistics share one important thing in common: neither one is guaranteed. Probability can tell you there’s a 1/6 chance of rolling a given number on 1d6, but it does not tell you what the actual number will be when you roll the die for real. Likewise, statistics can tell you from a bunch of die rolls that there is probably a uniform distribution, and that you’re 95% sure, but there’s a 5% chance that you’re wrong. That chance never goes to zero.

Statistical Tools

This isn’t a graduate-level course in statistical analysis, so all I’ll say is that there are a lot more tools than this that are outside the scope of this course. What I’m going to put down here is the bare minimum I think every game designer should know to be useful when analyzing metrics in their games.

Mean: when someone asks for the “average” of something, they’re probably talking about the mean average (there are two other kinds of average that I know of, and probably a few more that I don’t). To get the mean of a bunch of values, you add them all up and then divide by the number of values. This is sort of like “expected value” in probability, except that you’re computing it based on real-world die-rolls and not a theoretically balanced set of die-rolls. Calculating the mean is incredibly useful; it tells you what the ballpark expected value is of something in your game. You can think of the mean as a Monte Carlo calculation of expected value, except you’re using real-world playtest data rather than a computer simulation.

Median: this is another kind of average. To calculate it, take all your values and sort them from smallest to largest, then pick the one in the center. So, if you have five values, the third one is the median. (If you have an even number of values so that there are two in the middle rather than one, you’re supposed to take the mean of those, in case you’re curious.) On its own, the median isn’t all that useful, but it tells you a lot when you compare it with the mean, about whether your values are all weighted to one side, or if they’re basically symmetric. For example, in the US, the median household income is a lot lower than the mean, which basically means we’ve got a lot of people making a little, and a few people making these ridiculously huge incomes that push up the mean. In a classroom, if the median is lower than the mean, it means most of the students are struggling and one or two brainiacs are wrecking the curve (although more often it’s the other way around, where most students are clustered around 75 or 80 and then you’ve got some lazy kid who’s getting a zero which pulls down the mean a lot). If you’re making a game with a scoreboard of some kind and you see a median that’s a lot lower than the mean, it probably means you’ve got a small minority of players that are just obscenely good at the game and getting these massive scores, while everyone else who is just a mere mortal is closer to the median.

Standard deviation: this is just geeky enough to make you sound like you’re good at math if you use it in normal conversation. You calculate it by taking each of your data points, subtracting it from the mean, squaring the result (that is, multiply the result by itself), add all of those squares together, divide by the total number of data points, then take the square root of the whole thing. For reasons that you don’t really need to know, going through this process gives you a number that represents how spread out the data is. Basically, about two-thirds of your data is within a single standard deviation from the mean, and nearly all of your data is within two standard deviations, so how big your SD is ends up being relative to how big your mean is. A mean of 50, SD of 25 looks a lot more spread out than a mean of 5000, SD of 25. A relatively large SD means your data is all over the place, while a really small SD means your data is all clustered together.

Examples

To give you an example, let’s consider two random variables: 2d6, and 1d11+1. Like we talked about in the week on probability, both of these will give you a number from 2 to 12. But they have a very different nature; the 2d6 clusters around the center, while the 1d11+1 is spread out among all outcomes evenly. Now, statistics doesn’t have anything to say about this, but let’s just assume that I happen to roll the 2d6 thirty-six times and get one of each result, and I roll 1d11 eleven times and get one of each result… which is wildly unlikely, but it does allow us to use statistical tools to analyze probability.

The mean of both of these is 7, which means if you’re trying to balance either of these numbers in your game, you can use 7 as the expected value. What about the range? The median is also 7 for both, which means you’re just as likely to be above or below the mean, which makes sense because both of these are symmetric. However, you’ll see the standard deviations are a lot different: for 2d6, the SD is about two-and-a-half, meaning that most of the time you’ll get a result in between 5 and 9; for 1d11+1, the SD is about three-and-a-half, so you’ll get about as many rolls in the 4 to 10 range here, as you did in the 5 to 9 range for 2d6. Which doesn’t actually sound like that big a deal, until you start rolling.

As a different example, maybe you’re looking at the time it takes playtesters to get through your first tutorial level in a video game you’re designing. Your target is that it should take about 5 minutes. You measure the mean at 5 minutes, median at 6 minutes, standard deviation at 2 minutes. What does that tell us? Most people take between 3 and 7 minutes, which might be good or bad depending on just how much of the level is under player control, but in a lot of games the tutorial is meant to be a pretty standardized, linear experience so this would actually feel like a pretty huge range. The other cause for concern is the high median, which suggests most people actually take longer than 5 minutes, you just have a few people who get through the level really fast and they bring down the mean. This is good news in that you know you’re not having anyone taking four hours to complete it or whatever (otherwise the mean would be a lot higher than the median instead!), but it’s potentially bad news in that some players might have found an unintentional shortcut or exploit, or else they’re just skipping through all your intro dialogue or something which is going to get them stuck and frustrated in level 2, or something else.

This suggests another lesson: statistics can tell us that something is happening, but it can’t tell us why, and sometimes there are multiple explanations for the why. This is one area where statistics is often misused or flat out abused, by finding one logical explanation for the numbers and ignoring that there could be other explanations as well. In this case, we have no way of knowing why the median is shorter than the mean, or its implications of game design… but we could spend some time thinking about all the possible answers, and then we could collect more data that would help us differentiate between them. For example, if one fear is that players are skipping through the intro dialogue, we could actually measure the time spent reading dialogues in addition to the total level time. We’ll come back to this concept of metrics design later today.

There’s also a third lesson here: I didn’t tell you how many playtesters it took to get this data! The more tests you have, the more accurate your final analysis will be. If you only had three tests, these numbers are pretty meaningless if you’re trying to predict general trends. If there were a few thousand tests, that’s a lot better. (How many tests are required to make sure your analysis is good enough? Depends what “good enough” means to you. The more you have, the more sure you can be, but it’s never actually 100% no matter how many tests you do. People who do this for a living have “confidence intervals” where they’ll tell you a range of values and then say something like they’re 95% sure that the actual mean in reality is within such-and-such a range. This is a lot more detail than most of us need for our day-to-day design work.)

Outliers

When you have a set of data with some small number of points that are way above or below the mean, the name for those is outliers (pronounced like the words “out” and “liars”). Since these tend to throw off your mean a lot more than the median, if you see the mean and median differing by a lot it’s probably because of an outlier.

When you’re doing a statistical analysis, you might wonder what to do with the outliers. Do you include them? Do you ignore them? Do you put them in their own special group? As with most things, it depends.

If you’re just looking for normal, usual play patterns, it is generally better to discard the outliers because by definition, those are not arising from normal play. If you’re looking for edge cases then you want to leave them in and pay close attention; for example, if you’re trying to analyze the scores people get so you know how to display them on the leaderboards, realize that your top-score list is going to be dominated by outliers at the top.

In either case, if you have any outliers, it is usually worth investigating further to figure out what happened. Going back to our earlier example of level play times, if most players take 5 to 7 minutes to complete your tutorial but you notice a small minority of players that get through in 1 or 2 minutes, that suggests those players may have found some kind of shortcut or exploit, and you want to figure out what happened. If most players take 5 to 7 minutes and you have one player that took 30 minutes, that is probably because the player put it on pause or had to walk away for awhile, or they were just having so much fun playing around in the sandbox that they didn’t care about advancing to the next level or whatever, and you can probably ignore that if it’s just one person. But if it’s three or four people (still in the vast minority) who did that, you might investigate further, because there might be some small number of people who are running into problems… or, players who find one aspect of your tutorial really fun, which is good to know as you’re designing the other levels.

Population samples

Here’s another way statistics can go horribly wrong: it all comes down to what and who you’re sampling.

I already mentioned one frequent problem, which is not having a large enough sample. The more data points you have, the better. I’ll give you an example: back when I played Magic: the Gathering regularly, this one time I put together a tournament deck for a friend, for a tournament that I couldn’t play in but they could. To tell if I had the right ratio of land to spells, I shuffled and dealt an opening hand and played a few mock turns to see if I was getting enough. I’d do this a bunch of times going through most of the deck, then I’d take some land out or put some in depending on how many times I had too much or too little, and then I’d reshuffle and do it again. At the time I figured this was a pretty good quick-and-dirty way to figure out how much land I needed. But it just so happened that I wasn’t noticing that the land was actually very evenly distributed and not clustered, so most of the time it seemed like I was doing okay by the end… but I never actually stopped to count. After the tournament, which my friend lost badly, they reported to me that they were consistently not drawing enough land, and when we actually went through the deck and counted, there were only 16 lands in a deck of 60 cards! I took a lot of flak from my friend for that, and rightly so. The real problem here was that I was trying to analyze the number of land through statistical methods, but my sample size was way too small to draw any meaningful conclusions.

Here’s another example: suppose you’re making a game aimed at the casual market. You have everyone on the development team play through the game to get some baseline data on how long it takes to play through each level and how challenging each level is. Problem: the people playing the game are probably not casual gamers, so this is not really a representative sample of your target market. I’m sure this has happened before at some point in the past.

A more recent example: in True Crime: Hong Kong, publisher Activision allegedly demanded that the developers change the main character from male to female, because their focus group said they preferred a male protagonist. The problem: the focus group was made up of all males, or the questions were inherently biased by the person setting it up, as a deliberate attempt to further their agenda rather than to actually find out the real-world truth. Activision denies all of this, of course, but that hasn’t stopped it from being the subject of many industry conversations… not just about the role of women in games, but about the use of focus groups and statistics in game design. You also see things like this happening in the rest of the world, particularly in governmental politics, where a lot of people have their own personal agenda and they’re willing to warp a study and use statistics as a way of proving their point.

Basically, when you’re collecting playtest data, you want to do your best to recruit playtesters who are as similar as possible to your target market, and you want to have as many playtests as possible so that the random noise gets filtered out. Your analysis is only as good as your data!

Even if you use statistics “honestly” there’s still problems every game designer runs into, depending on the type of game.

  • For video games, you are at the mercy of your programmers, and there’s nothing you can do about that. The programmers are the ones who need to spend time coding the metrics you ask for. Programming time is always limited, so at some point you’ll have to make the call between having your programming team implement metrics collection… or having them implement, you know, the actual game mechanics you’ve designed. And that’s if the decision isn’t made for you by your producer or your publisher. This is easier in some companies than others, but in some places “metrics” falls into the same category as audio, and localization, and playtesting: tasks that are pushed off towards the end of the development cycle until it’s too late to do anything useful.
  • For tabletop games, you are at the mercy of your playtesters. The more data points you collect, the better, of course. But in reality, a video game company can release an early beta and get hundreds or thousands of plays, while you might realistically be able to do a fraction of that with in-person tabletop tests. With a smaller sample, your playtest data is a lot more suspect.
  • For any kind of game, you need to be very clear ahead of time what it is you need measured, and in what level of detail. If you run a few hundred playtests and only find out afterwards that you need to actually collect certain data from the game state that you weren’t collecting before, you’ll have to do those tests over again. The only thing to do about this is to recognize that just like design itself, playtesting with metrics is an iterative process, and you need to build that into your schedule.
  • Also for any kind of game, you need to remember that it’s very easy to mess things up accidentally and get the wrong answer, just like probability. Unlike probability, there aren’t as many sanity checks to make the wrong numbers look wrong, since by definition you don’t always know exactly what you’re looking for or what you expect the answer to be. So you need to proceed with caution, and use every method you can find of independently verifying your numbers. It also helps if you try to envision in advance what the likely outcomes of your analysis might be, and what they’ll look like.

Correlation and causality

Finally, one of the most common errors with statistics is when you notice some kind of correlation between two things. “Correlation” just means that when one thing goes up, another thing always seems to go up (which is a positive correlation) or down (a negative correlation) at the same time. Recognizing correlations is useful, but a lot of times people assume that just because two things are correlated, that one causes the other, and that is something that you cannot tell from statistics alone.

Let’s take an example. Say you notice when playing Puerto Rico that there’s a strong positive correlation between winning, and buying the Factory building; say, out of 100 games, in 95 of them the winner bought a Factory. The natural assumption is that the Factory must be overpowered, and that it’s causing you to win. But you can’t draw this conclusion by default, without additional information. Here are some other equally valid conclusions, based only on this data:

  • Maybe it’s the other way around, that winning causes the player to buy a Factory. That sounds odd, but maybe the idea is that a Factory helps the player who is already winning, so it’s not that the Factory is causing the win, it’s that being strongly in the lead causes the player to buy a Factory for some reason.
  • Or, it could be that something else is causing a player both to win and to buy a Factory. Maybe some early-game purchase sets the player up for buying the Factory, and that early-game purchase also helps the player to win, so the Factory is just a symptom and not the root cause.
  • Or, the two could actually be uncorrelated, and your sample size just isn’t large enough for the Law of Large Numbers to really kick in. We actually see this all the time in popular culture, where two things that obviously have no relation are found to be correlated anyway, like the Redskins football game predicting the next Presidential election in the US, or an octopus that predicts the World Cup winner, or a groundhog seeing its shadow supposedly predicting the remaining length of Winter. As we learned when looking at probability, if you take a lot of random things you’ll be able to see patterns; one thing is that you can expect to see unlikely-looking streaks, but another is that if you take a bunch of sets of data, some of them will probably be randomly correlated. If you don’t believe me, try rolling two separate dice a few times and then computing the correlation between those numbers; I bet it’s not zero!

Statistics in Excel

Here’s the good news: while there are a lot of math formulas here, you don’t actually need to know any of them. Excel will do this for you, it has all these formulas already. Here are a few useful ones:

  • AVERAGE: given a range of cells, this calculates the mean. You could also take the SUM of the cells and then divide by the number of cells, but AVERAGE is easier.
  • MEDIAN: given a range of cells, this calculates the median, as you might guess.
  • STDEV: given a range of cells, this gives you the standard deviation.
  • CORREL: you give this two ranges of cells, not one, and it gives you the correlation between the two sets of data. For example, you could have one column with a list of final game scores, and another column with a list of scores at the end of the first turn, to see if early-game performance is any kind of indicator of the final game result (if so, this might suggest a positive feedback loop in the game somewhere). The number Excel gives you from the CORREL function ranges between -1 (perfect negative correlation) to 0 (uncorrelated) to +1 (perfect positive correlation).

Is there any good news?

At this point I’ve spent so much time talking about how statistics are misused, that you might be wondering if they’re actually useful for anything. And the answer is, yes. If you have a question that can’t be answered with intuition alone, and it can’t be answered just through the math of your cost or progression curves, statistics let you draw useful conclusions… if you ask the right questions, and if you collect the right data.

Here’s an example of a time when statistics really helped a game I was working on. I worked for a company that made this online game, and we found that our online population was falling and people weren’t playing as many games, because we hadn’t released an update in awhile. (That part was expected. With no updates, I’ve found that an online game loses about half of its core population every 6 months or so, at least that was my experience.)

But what we didn’t expect, was one of our programmers got bored one day and made a trivia bot, just this little script that would log into our server with its own player account, send a trivia question every couple of minutes, and then parse the incoming public chat to see if anyone said the right answer. And it was popular, as goofy and stupid and simple as it was, because it was such a short, immediate casual experience.

Now, the big question is: what happened to the player population, and what happened to the actual, real game that players were supposed to be playing (you know, the one where they would log in to the chat room to find someone to challenge, before they got distracted by the trivia bot)?

Some players loved the trivia bot. It gave them something to do in between games. Others hated the trivia bot; they claimed that it was harder to find a game, because everyone who was logged in was too busy answering dumb trivia questions to actually play a real game. Who was right? Intuition failed, because everyone’s intuition was different. Listening to the players failed, because the vocal minority of the player base was polarized, and there was no way to poll those who weren’t in the vocal minority. Math failed, because the trivia bot wasn’t part of the game, let alone part of the cost curve. Could we answer this with statistics? We sure could, and we did!

This was simple enough that it didn’t even require much analysis. Measure the total number of logins per day. Measure total number of actual games played. Since our server tracked every player login, logout and game start already, we had this data, all we had to do was some very simple analysis, tracking how these things changed over time. As expected, the numbers were all falling gradually since the time of the last real release, but the trivia bot actually caused a noticeable increase in both total logins and number of games played. It turned out that players were logging in and playing with the trivia bot, but as long as they were there, they were also playing games with each other! That was a conclusion that would have been impossible to reach in any kind of definitive way, without analysis of the hard data. And it taught us something really important about online games: more players online, interacting with each other, is better… even if they’re interacting in nonstandard ways.

Metrics

Here’s a common pattern in artistic and creative fields, particularly things like archaeology or art preservation or psychology or medicine where it requires a certain amount of intuition but at the same time there is still a “right answer” or “best way” to do things. The progression goes something like this:

  1. Practitioners see their field as a “soft science”; they don’t know a whole lot about best principles or practices. They do learn how things work, eventually, but it’s mostly through trial and error.
  2. Someone creates a technology that seems to solve a lot of these problems algorithmically. Practitioners rejoice. Finally, we’re a hard science! No more guesswork! Most younger practitioners abandon the “old ways” and embrace “science” as a way to solve all their field’s problems. The old guard, meanwhile, sees it as a threat to how they’ve always done things, and eyes it skeptically.
  3. The limitations of the technology become apparent after much use. Practitioners realize that there is still a mysterious, touchy-feely element to what they do, and that while some day the tech might answer everything, that day is a lot farther off than it first appeared. Widespread disillusionment occurs as people no longer want to trust their instincts because theoretically technology can do it better, but people don’t want to trust the current technology because it doesn’t work that great yet. The young turks acknowledge that this wasn’t the panacea they thought; the old guard acknowledge that it’s still a lot more useful than they assumed at first. Everyone kisses and makes up.
  4. Eventually, people settle into a pattern where they learn what parts can be done by computer algorithms, and what parts need an actual creative human thinking, and the field becomes stronger as the best parts of each get combined. But learning which parts go best with humans and which parts are best left to computers is a learning process that takes awhile.

Currently, game design seems to be just starting Step 2. We’re hearing more and more people anecdotally saying why metrics and statistical analysis saved their company. We hear about MMOs that are able to solve their game balance problems by looking at player patterns, before the players themselves learn enough to exploit them. We hear of Zynga changing the font color from red to pink which generates exponentially more click-throughs from players to try out other games. We have entire companies that have sprung up solely to help game developers capture and analyze their metrics. The industry is falling in love with metrics, and I’ll go on record predicting that at least one company that relies entirely on metrics-driven design will fail, badly, by the time this whole thing shakes out, because they will be looking so hard at the numbers that they’ll forget that there are actually human players out there who are trying to have fun in a way that can’t really be measured directly. Or maybe not. I’ve been wrong before.

At any rate, right now there seems to be three schools of thought on the use of metrics:

  • The Zynga model: design almost exclusively by metrics. Love it or hate it, 60 Million monthly active unique players laugh at your feeble intuition-based design.
  • Rebellion against the Zynga model: metrics are easy to misunderstand, easy to manipulate, and are therefore dangerous and do more harm than good. If you measure player activity and find out that more players use the login screen than any other in-game action, that doesn’t mean you should add more login screens to your game out of some preconceived notion that if a player does it, it’s fun. If you design using metrics, you push yourself into designing the kinds of games that can be designed solely by metrics, which pushes you away from a lot of really interesting video game genres.
  • The moderate road: metrics have their uses, they help you tune your game to find local “peaks” of joy. They help you take a good game and make it just a little bit better, by helping you explore the nearby design space. However, intuition also has its uses; sometimes you need to take broad leaps in unexplored territory to find the global “peaks,” and metrics alone will not get you there, because sometimes you have to make a game a little worse in one way before it gets a lot better in another, and metrics won’t ever let you do that.

Think about it for a bit and decide where you stand, personally, as a designer. What about the people you work with on a team (if you work with others on a team)?

How much  to measure?

Suppose you want to take some metrics in your game so you can go back and do statistical analysis to improve your game balance. What metrics do you actually take – that is, what exactly do you measure?

There are two schools of thought that I’ve seen. One is to record anything and everything you can think of, log it all, mine it later. The idea is that you’d rather collect too much information and not use it, than to not collect a piece of critical info and then have to re-do all your tests.

Another school of thought is that “record everything” is fine in theory, but in practice you either have this overwhelming amount of extraneous information from which you’re supposed to find this needle in a haystack of something useful, or potentially worse, you mine the heck out of this data mountain to the point where you’re finding all kinds of correlations and relationships that don’t actually exist. By this way of thinking, instead you should figure out ahead of time what you’re going to need for your next playtest, measure that and only that, and that way you don’t get confused when you look at the wrong stuff in the wrong way later on.

Again, think about where you stand on the issue.

Personally, I think a lot depends on what resources you have. If it’s you and a few friends making a small commercial game in Flash, you probably don’t have time to do much in the way of intensive data mining, so you’re better off just figuring out the useful information you need ahead of time, and add more metrics later if a new question occurs to you that requires some data you aren’t tracking yet. If you’re at a large company with an army of actuarial statisticians with nothing better to do than find data correlations all day, then sure, go nuts with data collection and you’ll probably find all kinds of interesting things you’d never have thought of otherwise.

What specific things do you measure?

That’s all fine and good, but whether you say “just get what we need” or “collect everything we can,” neither of those is an actual design. At some point you need to specify what, exactly, you need to measure.

Like game design itself, metrics is a second-order problem. Most of the things that you want to know about your game, you can’t actually measure directly, so instead you have to figure out some kind of thing that you can measure that correlates strongly with what you’re actually trying to learn.

Example: measuring fun

Let’s take an example. In a single-player Flash game, you might want to know if the game is fun or not, but there’s no way to measure fun. What correlates with fun, that you can measure? One thing might be if players continue to play for a long time, or if they spend enough time playing to finish the game and unlock all the achievements, or if they come back to play multiple sessions (especially if they replay even after they’ve “won”), and these are all things you can measure. Now, keep in mind this isn’t a perfect correlation; players might be coming back to your game for some other reason, like if you’ve put in a crop-withering mechanic that punishes them if they don’t return, or something. But at least we can assume that if a player keeps playing, there’s probably at least some reason, and that is useful information. More to the point, if lots of players stop playing your game at a certain point and don’t come back, that tells us that point in the game is probably not enjoyable and may be driving players away. (Or if the point where they stopped playing was the end, maybe they found it incredibly enjoyable but they beat the game and now they’re done, and you didn’t give a reason to continue playing after that. So it all depends on when.)

Player usage patterns are a big deal, because whether people play, how often they play, and how long they play are (hopefully) correlated with how much they like the game. For games that require players to come back on a regular basis (like your typical Facebook game), the two buzzwords you hear a lot are Monthly Active Uniques and Daily Active Uniques (MAU and DAU). The “Active” part of that is important, because it makes sure you don’t overinflate your numbers by counting a bunch of old, dormant accounts belonging to people who stopped playing. The “Unique” part is also important, since one obsessive guy who checks FarmVille ten times a day doesn’t mean he counts as ten users. Now, normally you’d think Monthly and Daily should be equivalent, just multiply Daily by 30 or so to get Monthly, but in reality the two will be different based on how quickly your players burn out (that is, how much overlap there is between different sets of daily users). So if you divide MAU/DAU, that tells you something about how many of your players are new and how many are repeat customers.

For example, suppose you have a really sticky game with a small player base, so you only have 100 players, but those players all log in at least once per day. Here your MAU is going to be 100, and your average DAU is also going to be 100, so your MAU/DAU is 1. Now, suppose instead that you have a game that people play once and never again, but your marketing is good, so you get 100 new players every day but they never come back. Here your average DAU is still going to be 100, but your MAU is around 3000, so your MAU/DAU is about 30 in this case. So that’s the range, MAU/DAU goes between 1 (for a game where every player is extremely loyal) to 28, 30 or 31 depending on the month (representing a game where no one ever plays more than once).

A word of warning: a lot of metrics, like the ones Facebook provides, might use different ways of computing these numbers so that one set of numbers isn’t comparable to another. For example, I saw one website that listed the “worst” MAU/DAU ratio in the top 100 applications as 33-point-something, which should be flatly impossible, so clearly the numbers somewhere are being messed with (maybe they took the Dailies from a different range of dates than the Monthlies or something). And then some people compute this as a %, meaning on average, what percentage of your player pool logs in on a given day, which should range from a minimum of about 3.33% (1/30 of your monthly players logging in each day) to 100% (all of your monthly players log in every single day). This is computed by taking DAU/MAU (instead of MAU/DAU) and multiplying by 100 to get a percentage. So if you see any numbers like this from analytics websites, make sure you’re clear on how they’re computing the numbers so you’re not comparing apples to oranges.

Why is it important to know this number? For one thing, if a lot of your players keep coming back, it probably means you’ve got a good game. For another, it means you’re more likely to make money on the game, because you’ve got the same people stopping by every day… sort of like how if you operate a brick-and-mortar storefront, an individual who just drops in to window-shop may not buy anything, but if that same individual comes in and is “just looking” every single day, they’re probably going to buy something from you eventually.

Another metric that’s used a lot, particularly on Flash game portals, is to go ahead and ask the players themselves to rate the game (often in the form of a 5-star rating system). In theory, we would hope that higher ratings mean a better game. In theory, we’d also expect that a game with high player ratings would also have a good MAU/DAU ratio, that is, that the two would be correlated. I don’t know of any actual studies that have checked this, though I’d be interested to see the results, but if I had to guess I’d assume that there is some correlation but not a lot. Users that give ratings are not a representative sample; for one thing, they tend to have strong opinions or else they wouldn’t bother rating (seriously, I always had to wonder about those opinion polls that would say something like 2% of poll respondents said they had no opinion… like, who calls up a paid opinion poll phone line just to say they have no opinion?), so while actual quality probably falls along a bell curve you tend to have more 5-star and 1-star ratings than 3-star, which is not what you’d expect if everyone rated the game fairly. Also, there’s the question of whether player opinion is more or less meaningful than actual play patterns; if a player logs into a game every day for months on end but rates it 1 out of 5 stars, what does that mean? Or if a player admits they haven’t even played the game, but they’re still giving it 4 out of 5 stars based on… I don’t know… its reputation or something? Also, players tend to not rate a game while they’re actively playing, only (usually) after they’re done, which probably skews the ratings a bit (depending on why they stopped playing). So it’s probably better to pay attention to usage patterns than player reporting, especially if that reporting isn’t done during the game from within the game in a way that you can track.

Now, I’ve been talking about video games, in fact most of this is specific to online games. The equivalent in tabletop games is a little fuzzier, but as the designer you basically want to be watching people’s facial expressions and posture to see where in the game they’re engaged and where they’re bored or frustrated. You can track how these correlate to certain game events or board positions. Again, you can try to rely on interviews with players, but that’s dangerous because player memory of these things is not good (and even if it is, not every playtester will be completely honest with you). For video games that are not online, you can still capture metrics based on player usage patterns, but actually uploading them anywhere is something you want to be very clear to your players about, because of privacy concerns.

Another example: measuring difficulty

Player difficulty, like fun, is another thing that’s basically impossible to measure directly, but what you can measure is progression, and failure to progress. Measures of progression are going to be different depending on your game.

For a game that presents skill-based challenges like a retro arcade game, you can measure things like how long it takes the player to clear each level, how many times they lose a life on each level, and importantly, where and how they lose a life. Collecting this information makes it really easy to see where your hardest points are, and if there are any unintentional spikes in your difficulty curve. I understand that Valve does this for their FPS games, and that they actually have a visualizer tool that will not only display all of this information, but actually plot it overlaid on a map of the level, so you can see where player deaths are clustered. Interestingly, starting with Half-Life 2 Episode 2 they actually have live reporting and uploading from players to their servers, and they have displayed their metrics on a public page (which probably helps with the aforementioned privacy concerns, because players can see for themselves exactly what is being uploaded and how it’s being used).

Yet another example: measuring game balance

What if instead you want to know if your game is fair and balanced? That’s not something you can measure directly either. However, you can track just about any number attached to any player, action or object in the game, and this can tell you a lot about both normal play patterns, and also the relative balance of strategies, objects, and anything else.

For example, suppose you have a strategy game where each player can take one of four different actions each turn, and you have a way of numerically tracking each player’s standing. You could record each turn, what action each player takes, and how it affects their respective standing in the game.

Or, suppose you have a CCG where players build their own decks, or a Fighting game where each player chooses a fighter, or an RTS where players choose a faction, or an MMO or tabletop RPG where players choose a race/class combination. Two things you can track here are which choices seem to be the most and least popular, and also which choices seem to have the highest correlation with actually winning. Note that this is not always the same thing; sometimes the big, flashy, cool-looking thing that everyone likes because it’s impressive and easy to use is still easily defeated by a sufficiently skilled player who uses a less well-known strategy. Sometimes, dominant strategies take months or even years to emerge through tens of thousands of games played; the Necropotence card in Magic: the Gathering saw almost no play for six months or so after release, until some top players figured out how to use it, because it had this really complicated and obscure set of effects… but once people started experimenting with it, they found it to be one of the most powerful cards ever made. So, both popularity and correlation with winning are two useful metrics here.

If a particular game object sees a lot more use than you expected, that can certainly signal a potential game balance issue. It may also mean that this one thing is just a lot more compelling to your target audience for whatever reason – for example, in a high fantasy game, you might be surprised to find more players creating Elves than Humans, regardless of balance issues… or maybe you wouldn’t be that surprised. Popularity can be a sign in some games that a certain play style is really fun compared to the others, and you can sometimes migrate that into other characters or classes or cards or what have you in order to make the game overall more fun.

If a game object sees less use than expected, again that can mean it’s underpowered or overcosted. It might also mean that it’s just not very fun to use, even if it’s effective. Or it might mean it is too complicated to use, it has a high learning curve relative to the rest of the game, and so players aren’t experimenting with it right away (which can be really dangerous if you’re relying on playtesters to actually, you know, playtest, if they leave some of your things alone and don’t play with them).

Metrics have other applications besides game objects. For example, one really useful area is in measuring beginning asymmetries, a common one being the first-player advantage (or disadvantage). Collect a bunch of data on seating arrangements versus end results. This happens a lot with professional games and sports; for example, I think statisticians have calculated the home-field advantage in American Football to be about 2.3 points, and depending on where you play the first-move advantage in Go is 6.5 or 7.5 points (in this latter case, the half point is used to prevent tie games). Statistics from Settlers of Catan tournaments have shown a very slight advantage to playing second in a four-player game, on the order of a few hundredths of a percent; normally we could discard that as random variation, but the sheer number of games that have been played gives the numbers some weight.

One last example: measuring money

If you’re actually trying to make money by selling your game, in whole or part, then at the end of the day this is one of your most important considerations. For some people it’s the most important consideration: they’d rather have a game that makes lots of money but isn’t fun or interesting at all, than a game that’s brilliant and innovative and fun and wonderful but is a “sleeper hit” which is just a nice way of saying it bombed in the market but didn’t deserve to. Other game designers would rather make the game fun first, so one thing for each of you to consider is, personally, which side of the fence you’re on… because if you don’t know that about yourself, someone else is going to make the call for you some day.

At any rate, money is something that just about every commercial game should care about in some capacity, so it’s something that’s worth tracking. Those sales tell you something related to how good a job you did with the game design, along with a ton of other factors like market conditions, marketing success, viral spread, and so on.

With traditional games sold online or through retail, this is a pretty standard curve: big release-day sales that fall off over time on an exponentially decreasing curve, until they get to the point where the sales are small enough that it’s not worth it to sell anymore. With online games you don’t have to worry about inventory or shelf space so you can hold onto it a bit longer, which is where this whole “long tail” thing came from, because I guess the idea is that this curve looks like it has a tail on the right-hand side. In this case the thing to watch for is sudden spikes, when those are, and what caused them, because they don’t usually happen on their own.

Unfortunately, that means sales metrics for traditional sales models aren’t all that useful to game designers. We see a single curve that combines lots of variables, and we only get the feedback after the game is released. If it’s one game in a series it’s more useful because we can see how the sales changed from game to game and what game mechanics changed, so if the game took a major step in a new direction and that drastically increased or reduced sales, that gives you some information there.

If instead your game is online, such as an MMO, or a game in a Flash portal or on Facebook, the pattern can be a bit different: sales start slow (higher if you do some marketing up front), then if the game is good it ramps up over time as word-of-mouth spreads, so it’s basically the same curve but stretched out a lot longer. The wonderful thing about this kind of release schedule is that you can manage the sales curve in real-time: make a change to your game today, measure the difference in sales for the rest of the week, and keep modifying as you go. Since you have regular incremental releases that each have an effect on sales, you’re getting constant feedback on the effects that minor changes have on the money your game brings in. However, remember that your game doesn’t operate in a vacuum; there are often other outside factors that will affect your sales. For example, I bet if there’s a major natural disaster that’s making international headlines, that most Facebook games will see a temporary drop in usage because people are busy watching the news instead. So if a game company made a minor game change the day before the Gulf oil spill and they noticed a sudden decrease in usage from that geographical area, the designers might mistakenly think their game change was a really bad one if they weren’t paying attention to the real world.

Ideally, you’d like to eliminate these factors, so you know what you’re measuring, controlling for outside factors. One way of doing this, which works in some special cases, is to actually have two separate versions of your game that you roll out simultaneously to different players, and then you compare the two groups. One important thing about this is that you do need to select the players randomly (and not, say, giving one version to the earliest accounts created on your system and the other version to the most recent adopters). Of course, if the actual gameplay itself is different between the two groups, that’s hard to do without some players getting angry about it, especially if one of the two groups ends up with an unbalanced design that can be exploited. So it’s better to do this with things that don’t affect balance: banner ads, informational popup dialog text, splash screens, the color or appearance of the artwork in your game, and other things like that. Or, if you do this with gameplay, do it in a way that is honest and up front with the players; I could imagine assigning players randomly to a faction (like World of Warcraft’s Alliance/Horde split, except randomly chosen when an account is created) and having the warring factions as part of the backstory of the game, so it would make sense that each faction would have some things that are a little bit different. I don’t know of any game that’s actually done this, but it would be interesting to see in action.

For games where players can either play for free or pay – this includes shareware, microtransactions, subscriptions, and most other kinds of payment models for online games – you can look at not just how many users you have, or how much money you’re getting total, but also where that money is coming from on a per-user basis. This is very powerful, but there are also a lot of variables to consider.

First, what counts as a “player”? If some players have multiple accounts (with or without your permission) or if old accounts stay around while dormant, the choice of whether to count these things will change your calculations. Typically companies are interested in looking at revenue from unique, active users, because dormant accounts tend to not be spending money, and a single player with several accounts should really be thought of as one entity (even if they’re spending money on each account).

Second, there’s a difference between players who are playing for free and have absolutely no intention of paying for your game ever, versus players who spend regularly. Consider a game where you make a huge amount of money from a tiny minority of players; this suggests you have a great game that attracts and retains free players really well, and that once players can be convinced to spend any money at all they’ll spend a lot, but it also says that you have trouble with “conversion” – that is, convincing players to take that leap and spend their first dollar with you. In this case, you’d want to think of ways to give players incentive to spend just a little bit. Now consider a different game, where most people that play spend something but that something is a really small amount. That’s a different problem, suggesting that your payment process itself is driving away players, or at least that it’s giving your players less incentive to spend more, like you’re hitting a spending ceiling somewhere. You might be getting the same total cash across your user base in both of these scenarios, but the solutions are different.

Typically, the difference between them is shown with two buzzwords, ARPU (Average Revenue Per User) and ARPPU (Average Revenue Per Paying User). I wish we called them players rather than users, but it wasn’t my call. At any rate, in the first example with a minority of players paying a lot when most people play for free, ARPPU will be really high; in the second case, ARPPU will be really low, even if ARPU is the same for both games.

Of course, total number of players is also a consideration, not just the average. If your ARPU and ARPPU are both great but you’ve got a player base of a few thousand when you should have a few million, then that’s probably more of a marketing problem than a game design problem. It depends on what’s happening to your player base over time, and where you are in the “tail” of your sales curve. So these three things, sales, ARPU and ARPPU, can give you a lot of information about whether your problem is with acquisition (that is, getting people to try your game the first time), conversion (getting them to pay you money the first time), or retention (getting players to keep coming back for more). And when you overlap these with the changes you make in your game and the updates you offer, a lot of times you can get some really useful correlations between certain game mechanics and increased sales.

Another interesting metric to look at is the graph of time-vs-money for the average user. How much do people give you on the day they start their account? What about the day after that, and the day after that? Do you see a large wad of cash up front and then nothing else? A decreasing curve where players try for free for awhile, then spend a lot, then spend incrementally smaller amounts until they hit zero? An increasing curve where players spend a little, then a bit more, then a bit more, until a sudden flameout where they drop your game entirely? Regular small payments on a traditional “long tail” model? What does this tell you about the value you’re delivering to players in your early game à mid-game à late game à elder game progression?

While you’re looking at revenue, don’t forget to take your costs into account. There are two kinds of costs: up-front development, and ongoing costs. The up-front costs are things like development of new features, including both the “good” ones that increase revenue and also the “bad” ones that you try out and then discard; keep in mind that your ratio of good-to-bad features will not be perfect, so you have to count some portion of the bad ideas as part of the cost in developing the good ones (this is a type of “sunk cost” like we discussed in Week 6 when we talked about situational balance). Ongoing costs are things like bandwidth and server costs and customer support, which tend to scale with the number of players. Since a business usually wants to maximize its profits (that is, the money it takes in minus the money it spends) and not its revenue (which is just the money it takes in), you’ll want to factor these in if you’re trying to optimize your development resources.

A word of warning (gosh, I seem to be giving a lot of warnings this week): statistics are great at analyzing the past, but they’re a lot trickier if you try to use them to predict the future. For example, a really hot game that just launched might have what initially looks like an exponentially-increasing curve. It’s tempting to assume, especially if it’s a really tight fit with an exponential function, that the trend will continue. But common sense tells us this can’t continue indefinitely: the human population is finite, so if your exponential growth is faster than human population growth it has to level off eventually. Business growth curves are usually not exponential, but instead what is called “S-shaped” where it starts as an exponentially increasing curve and eventually transitions to a logarithmically (that is, slowly) increasing curve, and then eventually levels off or starts decreasing. A lot of investors get really burned when they mistake an S curve for an exponential increase, as we saw (more or less) with the dot-com crash about 10 years ago. Illegal pyramid schemes also tend to go through this kind of growth curve, with the exception that once they reach the peak of the “S” there’s usually a very sudden crash.

A Note on Ethics

This is the second time this Summer when talking about game balance that I’ve brought up an issue of professional ethics. It’s weird how this comes up in discussions of applied mathematics, isn’t it? Anyway…

The ethical consideration here is that a lot of these metrics look at player behavior but they don’t actually look at the value added (or removed) from the players’ lives. Some games, particularly those on Facebook which have evolved to make some of the most efficient use of metrics of any games ever made, have also been accused (by some people) of being blatantly manipulative, exploiting known flaws in human psychology to keep their players playing (and giving money) against their will. Now, this sounds silly when taken to the extreme, because we think of games as something inherently voluntary, so the idea of a game “holding us prisoner” seems strange. On the other hand, any game you’ve played for an extended period of time is a game you are emotionally invested in, and that emotional investment does have cash value. If it seems silly to you that I’d say a game “makes” you spend money, consider this: suppose I found all of your saved games and put them in one place. Maybe some of these are on console memory cards or hard disks. Maybe some of them are on your PC hard drive. For online games, your “saved game” is on some company’s server somewhere. And then suppose I threatened to destroy all of them… but not to worry, I’d replace the hardware. So you get free replacements of your hard drive and console memory cards, a fresh account on every online game you subscribe to, and so on. And then suppose I asked you, how much would you pay me to not do that. And I bet when you think about it, the answer is more than zero, and the reason is that those saved games have value to you! And more to the point, if one of these games threatened to delete all your saves unless you bought some extra downloadable content, you would at least consider it… not because you wanted to gain the content, but because you wanted to not lose your save.

To be fair, all games involve some kind of psychological manipulation, just like movies and books and all other media (there’s that whole thing about suspending our disbelief, for example). And most people don’t really have a problem with this; they still see the game experience itself as a net value-add to their life, by letting them live more in the hours they spend playing than they would have lived had they done other activities.

But just like difficulty curves, the difference between value added and taken away is not constant; it’s different from person to person. This is why we have things like MMOs that enhance the lives of millions of subscribers, while also causing horrendous bad events in the lives of a small minority that lose their marriage and family to their game obsession, or that play for so long without attending to basic bodily needs that they keel over and die at the keyboard.

So there is a question of how far we can push our players to give us money, or just to play our game at all, before we cross an ethical line… especially in the case where our game design is being driven primarily by money-based metrics. As before, I invite you to think about where you stand on this, because if you don’t know, the decision will be made for you by someone else who does.

If You’re Working on a Game Now…

If you’re working on a game now, as you might guess, my suggestion for any game you’re working on is to ask yourself what game design questions could be best answered through metrics:

  • What aspects of your design (especially relating to game balance) do you not know the answers to, at this point in time? Make a list.
  • Of those open questions, which ones could be solved through playtesting, taking metrics, and analyzing them?
  • Choose one question from the remaining list that is, in your opinion, the most vital to your gameplay. Figure out what metrics you want to use, and how you will use statistics to draw conclusions. What are the different things you might see? What would they mean? Make sure you know how you’ll interpret the data in advance.
  • If you’re doing a video game, make sure the game has some way of logging the information you want. If it’s a board game, run some playtests and start measuring!

Homework

This is going to be mostly a thought experiment, more than practical experience, because I couldn’t think of any way to force you to actually collect metrics on a game that isn’t yours.

Choose your favorite genre of game. Maybe an FPS, or RTS, CCG, tabletop RPG, Euro board game, or whatever. Now choose what you consider to be an archetypal example of such a game, one that you’re familiar with and preferably that you own.

Pretend that you were given the rights to do a remake of this game (not a sequel), that is, your intention was to keep the core mechanics basically the same but just to possibly make some minor changes for the purpose of game balance. Think of it as a “version 2.0” of the original. You might have some areas where you already suspect, from your designer’s instinct, that the game is unbalanced… but let’s assume you want to actually prove it.

Come up with a metrics plan. Assume that you have a ready supply of playtesters, or else existing play data from the initial release, and it’s just a matter of asking for the data and then analyzing it. Generate a list:

  • What game balance questions would you want answers to, that could be answered with statistical analysis?
  • What metrics would you use for each question? (It’s okay if there is some overlap here, where several questions use some of the same metrics.)
  • What analysis would you perform on your metrics to get the answers to each question? That is, what would you do to the data (such as taking means, medians and standard deviations, or looking for correlations)? If your questions are “yes” or “no,” what would a “yes” or “no” answer look like once you analyzed the data?

Additional Resources

Here are a few links, in case you didn’t get enough reading this week. Much of what I wrote was influenced by these:

http://chrishecker.com/Achievements_Considered_Harmful%3F

and

http://chrishecker.com/Metrics_Fetishism

Game designer Chris Hecker gave a wonderful GDC talk this year called “Achievements Considered Harmful” which talks about a different kind of metric – the Achievements we use to measure and reward player performance within a game – and why this might or might not be such a good idea. In the second article, he talks about what he calls “Metrics Fetishism,” basically going into the dangers of relying too much on metrics and not enough on common sense.

http://www.gamasutra.com/view/news/29916/GDC_Europe_Playfishs_Valadares_on_Intuition_Versus_Metrics_Make_Your_Own_Decisions.php

This is a Gamasutra article quoting Playfish studio director Jeferson Valadares at GDC Europe, suggesting when to use metrics and when to use your actual game design skills.

http://www.lostgarden.com/2009/08/flash-love-letter-2009-part-2.html

Game designer Dan Cook writes on the many benefits of metrics when developing a Flash game.

http://www.gamasutra.com/features/20070124/sigman_01.shtml

Written by the same guy who did the “Orc Nostril Hair” probability article, this time giving a basic primer on statistics rather than probability.

Level 7: Advancement, Progression and Pacing

August 18, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Question

If you want to check your answer from last week:

Well, I must confess I don’t know for sure if this is the right answer or not. In theory I could write a program to do a brute-force solution with all twelve towers – if each tower is either Swarm, Boost or Nothing (simply don’t build a tower in that location), then it’s “only” 3^12 possibilities – but I don’t have the time to do that at this moment. If someone finds a better solution, feel free to post here!

By playing around by hand in a spreadsheet, the best I came up with was the top and bottom rows both consisting of four Swarm towers, with the center row holding four Boost towers, giving a damage/cost ratio of 1.21.

The two Boost towers in the center give +50% damage to six Swarm towers surrounding them, thus providing a damage bonus of 1440 damage each, while the two Boost towers on the side support four Swarm towers for a damage bonus of 960 each. On average, then, Boost towers provide 1200 damage for a cost of 500, or a damage/cost ratio of 2.4.

Each Swarm tower provides 480 damage (x8 = 3840 damage, total). Each tower costs 640, for a damage/cost ratio of 0.75 for each one. While this is much less efficient than the Boost towers, the Swarm towers are still worth having; deleting any of them makes the surrounding Boost towers less effective, so in combination the Swarm towers are still more cost-efficient than having nothing at all.

However, the Boost towers are still much more cost-effective than Swarm towers (and if you look at the other tower types, Boost towers are the most cost-effective tower in the game, hands-down, when you assume many fully-upgraded towers surrounding them). The only thing that prevents Boost towers from being the dominant strategy at the top levels of play, I think, is that you don’t have enough cash to make full use of them. A typical game that lasts 40 levels might only give you a few thousand dollars or so, which is just not enough to build a killer array of fully-upgraded towers. Or, maybe there’s an opportunity for you to find new dominant strategies that have so far gone undiscovered…

This Week

In the syllabus, this week is listed as “advancement, progression and pacing for single-player games” but I’ve changed my mind. A lot of games feature some kind of advancement and pacing, even multiplayer games. There’s the multiplayer co-op games, like the tabletop RPG Dungeons & Dragons or the console action-RPG Baldur’s Gate: Dark Alliance or the PC game Left 4 Dead. Even within multiplayer competitive games, some of them have the players progressing and getting more powerful during play: players get more lands and cast more powerful spells as a game of Magic: the Gathering progresses, while players field more powerful units in the late game of Starcraft. Then there are MMOs like World of Warcraft that clearly have progression built in as a core mechanic of the game, even on PvP servers. So in addition to single-player experiences like your typical Final Fantasy game, we’ll be talking about these other things too: basically, how do you balance progression mechanics?

Wait, What’s Balance Again?

First, it’s worth a reminder of what “balance” even means in this context. As I said in the intro to this course, in terms of progression, there are three things to consider:

  1. Is the difficulty level appropriate for the audience, or is the game overall too hard or too easy?
  2. As the player progresses through the game, we expect the game to get harder to compensate for the player’s increasing skill level because they are getting better; does the difficulty increase at a good rate, or does it get too hard too fast (which leads to frustration), or does it get harder too slowly (leading to boredom while the player waits for the game to get challenging again)?
  3. If your avatar increases in power, whether that be from finding new game objects like better weapons or tools or other toys, gaining new special abilities, or just getting a raw boost in stats like Hit Points or Damage, are you gaining these at a good rate relative to the increase in enemy power? Or do you gain too much power too fast (making the rest of the game trivial after a certain point), or do you gain power too slowly (requiring a lot of mindless grinding to compensate, which artificially lengthens the game at the cost of forcing the player to re-play content that they’ve already mastered)?

We will consider each of these in turn.

Flow Theory

If you’re not familiar with the concept of “flow” then read up here from last summer’s course. Basically, this says that if the game is too hard for your level of skill you get frustrated, if it’s too easy you get bored, but if you’re challenged at the peak of your ability then you find the game engaging and usually more fun, and one of our goals as game designers is to provide a suitable level of challenge to our players.

There’s two problems here. First, not every player comes to the game with the same skill level, so what’s too easy for some players is too hard for others. How do you give all players the same experience but have it be balanced for all of them?

Second, as a player progresses through the game, they get better at it, so even if the game’s challenge level remains constant it will actually get easier for the player.

How do we solve these problems? Well, that’s most of what this week is about.

Why Progression Mechanics?

Before moving on, though, it’s worth asking what the purpose is behind progression mechanics to begin with. If we’re going to dedicate a full tenth of this course to progression through a game, progression mechanics should be a useful design tool that’s worth talking about. What is it useful for?

Ending the game

In most cases, the purpose of progression is to bring the game to an end. For shorter games especially, the idea is that progression makes sure the game ends in a reasonable time frame. So whether you’re making a game that’s meant to last 3 minutes (like an early-80s arcade game) or 30-60 minutes (like a family board game) or 3 to 6 hours (like a strategic wargame) or 30 to 300 hours (like a console RPG), the idea is that some games have a desired game length, and if you know what that length is, forced progression keeps it moving along to guarantee that the game will actually end within the desired time range. We’ll talk more about optimal game length later in this post.

Reward and training for the elder game

In a few specialized cases, the game has no end (MMOs, Sims, tabletop RPGs, or progression-based Facebook games), so progression is used as a reward structure and a training simulator in the early game rather than a way to end the game. This has an obvious problem which can be seen with just about all of these games: at some point, more progression just isn’t meaningful. The player has seen all the content in the game that they need to, they’ve reached the level cap, they’ve unlocked all of their special abilities in their skill tree, they’ve maxed their stats, or whatever. In just about all cases, when the player reaches this point, they have to find something else to do, and there is a sharp transition into what’s sometimes called the “elder game” where the objective changes from progression to something else. For players who are used to progression as a goal, since that’s what the game has been training them for, this transition can be jarring. The people who enjoy the early-game progression may not enjoy the elder game activities as much since they’re so different (and likewise, some people who would love the elder game never reach it because they don’t have the patience to go through the progression treadmill).

What happens in the elder game?

In Sim games and FarmVille, the elder game is artistic expression: making your farm pretty or interesting for your friends to look at, or setting up custom stories or skits with your sims.

In MMOs, the elder game is high-level raids that require careful coordination between a large group, or PvP areas where you’re fighting against other human players one-on-one or in teams, or exploring social aspects of the game like taking on a coordination or leadership role within a Guild.

In tabletop RPGs, the elder game is usually finding an elegant way to retire your characters and end the story in a way that’s sufficiently satisfying, which is interesting because in these games the “elder game” is actually a quest to end the game!

What happens with games that end?

In games where progression does end the game, there is also a problem: generally, if you’re gaining power throughout the game and this serves as a reward to the player, the game ends right when you’re reaching the peak of your power. This means you don’t really get to enjoy being on top of the world for very long. If you’re losing power throughout the game, which can happen in games like Chess, then at the end you just feel like you’ve been ground into the dirt for the entire experience, which isn’t much better.

Peter Molyneux has pointed out this flaw when talking about the upcoming Fable 3, where he insists you’ll reach the peak of your power early on, succeed in ruling the world, and then have to spend the rest of the game making good on the promises you made to get there… which is a great tagline, but really all he’s saying is that he’s taking the standard Console RPG progression model, shortening it, and adding an elder game, which means that Fable 3 will either live or die on its ability to deliver a solid elder-game experience that still appeals to the same kinds of players who enjoyed reaching that point in the first place. Now, I’m not saying it can’t be done, but he’s got his work cut out for him. In the interview I saw, it sounded like he was treating this like a simple fix to an age-old problem, but as we can see here it’s really just replacing one difficult design problem with another. I look forward to seeing if he solves it… because if he does, that will have major applications for MMOs and FarmVille and everything with an elder game in between.

Two Types of Progression

Progression tends to work differently in PvP games compared to PvE games. In PvP (this includes multi-player PvP like “deathmatch” and also single-player games played against AI opponents), you’re trying to win against another player, human or AI, so the meaning of your progression is relative to the progression of your opponents. In PvE games (this includes both single-player games and multi-player co-op) you are progressing through the game to try to overcome a challenge and reach some kind of end state, so for most of these games your progress is seen in absolute terms. So that is really the core distinction I’d like to make, games where the focus is either on relative power between parties, or absolute power with respect to the game’s core systems. I’m just using “PvP” and “PvE” as shorthand here, and if I slip up and refer to PvP as “multi-player” and PvE as “single-player” that is just because those are the most common design patterns.

Challenge Levels in PvE

When you’re progressing through a bunch of challenges within a game, how do you track the level of challenge that the player is feeling, so you know if it’s increasing too quickly or too slowly, and whether the total challenge level is just right?

This is actually a tricky question to answer, because the “difficulty” felt by the player is not made up of just one thing here, it’s actually a combination of four things, but the player experiences it only as a single “am I being challenged?” feeling. If we’re trying to measure the player perception of how challenged they are, it’s like if the dashboard of your car took the gas, current speed, and engine RPMs and multiplied them all together to get a single “happiness” rating, and you only had this one number to look at to try to figure out what was causing it to go up or down.

The four components of perceived difficulty

First of all, there’s the level of the player’s skill at the game. The more skilled the player is at the game, the easier the challenges will seem, regardless of anything else.

Second, there’s the player’s power level in the game. Even if the player isn’t very good at the game, doubling their Hit Points will still keep them alive longer, increasing their Attack stat will let them kill things more effectively, giving them a Hook Shot lets them reach new places they couldn’t before, and so on.

Third and fourth, there’s the flip side of both of these, which are how the game creates challenges for the player. The game can create skill-based challenges which require the player to gain a greater amount of skill in the game, for example by introducing new enemies with better AI that make them harder to hit. Or it can provide power-based challenges, by increasing the hit points or attack power or other stats of the enemies in the game (or just adding more enemies in an area) without actually making the enemies any more skilled.

Skill and power are interchangeable

You can substitute skill and power, to an extent, either on the player side or the challenge side. We do this all the time on the challenge side, adding extra hit points or resource generation or otherwise just using the same AI but inflating the numbers, and expecting that the player will need to either get better stats themselves or show a higher level of skill in order to compensate. Or a player who finds a game too easy can challenge themselves by not finding all of the power-ups in a game, giving themselves less power and relying on their high level of skill to make up for it (I’m sure at least some of you have tried beating the original Zelda with just the wooden sword, to see if it could be done). Creating a stronger AI to challenge the player is a lot harder and more expensive, so very few games do that (although the results tend to be spectacular when they do – I’m thinking of Gunstar Heroes as the prototypical example).

At any rate, we can think of the challenge level as the sum of the player’s skill and power, subtracted from the game’s skill challenges and power challenges. This difference gives us the player’s perceived level of difficulty. So, when any one of these things changes, the player will feel the game get harder or easier. Written mathematically, we have this equation:

PerceivedDifficulty = (SkillChallenge + PowerChallenge) – (PlayerSkill + PlayerPower)

Example: perceived challenge decreases naturally

How do we use this information? Let’s take the player’s skill, which generally increases over time. That’s significant, because it means that if everything else is equal, that is if the player’s power level, and the overall challenge in the game stay the same, over time the player will feel like the game is getting easier, and eventually it’ll be too easy. To keep the player’s attention once they get better, every game must get harder in some way. (Or at least, every game where the player’s skill can increase. There are some games with no skill component at all, and those are exempted here.)

Changing player skill

Now, you might think the player skill curve is not under our control. After all, players come to our game with different pre-existing skill levels, and they learn at different rates. However, as designers we actually do have some control over this, based on our mechanics:

  • If we design deep mechanics that interact in a lot of ways with multiple layers of strategy, so that mastering the basic game just opens up new ways to look at the game at a more abstract meta-level, the player’s skill curve will be increasing for a long time, probably with certain well-defined jumps when the player finally masters some new way of thinking, like when a Chess player first starts to learn book openings, or when they start understanding the tradeoffs of tempo versus board control versus total pieces on the board.
  • If our game is more shallow, or has a large luck component, we will expect to see a short increase in skill as the player masters what little they can, and then a skill plateau. There are plenty of valid design reasons to do this intentionally. One common example is educational games, where part of the core vision is that you want the player to learn a new skill from the game, and then you want them to stop playing so they can go on to learn other things. Or this might simply be the tradeoff for making your game accessible: “A minute to learn, a minute to master.”
  • You can also control how quickly the player learns, based on the number of tutorials and practice areas you provide. One common design pattern, popularized by Valve, is to give the player some new weapon or tool or toy in a safe area where they can just play around with it, then introduce them immediately to a relatively easy area where they are given a series of simple challenges that let them use their new toy and learn all the cool things it can do, and then you give them a harder challenge where they have to integrate the new toy into their existing play style and combine it with other toys. By designing your levels to teach the player specific skills in certain areas, you can ramp the player up more quickly so they can increase their skill faster.
  • What if you don’t want the player to increase in skill quickly, because you want the game to last longer? If you want the player to learn more slowly, you can instead use “skill gating” as I’ve heard it called. That is, you don’t necessarily teach the player how to play your game, or hold their hand through it. Instead, you simply offer a set of progressively harder challenges, so you are at least guaranteed that if a player completes one challenge, they are ready for the next: each challenge is essentially a signpost that says “you must be at least THIS GOOD to pass.”

Measuring the components of perceived challenge

Player skill is hard to measure mathematically on its own, because as I said earlier, it is combined with player power in any game that includes both. For now, I can say that the best way to get a handle on this is to use playtesting and metrics, for example looking at how often players die or are otherwise set back, where these failures happen, how long it takes players to get through a level the first time they encounter it, and so on. We’ll talk more about this next week.

Player power and power-based challenges are much easier to balance mathematically: just compare the player power curve with the game’s opposition power curve. You have complete control over both of these; you control when the player is gaining power, and also when their enemies are presenting a larger amount of power to counter them. What do you want these curves to look like? Part of it depends on what you expect the skill curve to be, since you can use power as a compensatory mechanism in either direction. As a general guideline, the most common pattern I’ve seen looks something like this: within a single area like an individual dungeon or level, you start with a sudden jump in difficulty since the player is entering a new space after mastering the old one. Over time, the player’s power increases, either through level-ups or item drops, until they reach the end of the level where there may be another sudden difficulty jump in the form of a boss, and then after that typically another sudden jump in player power when they get loot from the boss or reach a new area that lets them upgrade their character.

Some dungeons split themselves into several parts, with an easier part at the beginning, then a mid-boss, then a harder part, and then a final boss, but really you can just think of this as the same pattern repeated several times without a change of graphical scenery. String a bunch of these together and that’s the power progression in your game: the difficulty jumps initially in a new area, stays constant awhile, has a sudden jump at the end for the boss, then returns; meanwhile the player’s power has sudden jumps at the end of an area, with incremental gains along the way as they find new stuff or level up.

That said, this is not the only pattern of power progression, not even necessarily the best for your game! These will vary based on genre and intended audience. For Space Invaders, over the course of a single game, the game’s power challenges, player skill and player power are all constant; the only thing that increases is the game’s skill challenge (making the aliens start faster and lower to the ground in each successive wave) until eventually they present a hard enough challenge to overwhelm the player.

Rewards in PvE

In PvE games especially, progression is strongly related to what is sometimes called the “reward schedule” or “risk/reward cycle.” The idea is that you don’t just want the player to progress, you want them to feel like they are being rewarded for playing well. In a sense, you can think of progression as a reward itself: as the player continues in the game and demonstrates mastery, the ability to progress through the game shows the player they are doing well and reinforces that they’re a good player. One corollary here is that you do need to make sure the player notices you’re rewarding them (in practice, this is usually not much of a problem). Another corollary is that timing is important when handing out rewards:

  • Giving too few rewards, or spacing them out for too long so that the player goes for long stretches without feeling any sense of progression, is usually a bad thing. The player is demoralized and may start to feel like if they aren’t making progress, they’re playing the game wrong (even if they’re really doing fine).
  • Ironically, giving too many rewards can also be hazardous. One of the things we’ve learned from psychology is that happiness comes from experiencing some kind of gain or improvement, so many little gains produce a lot more happiness than one big gain, even if they add up to the same thing. Giving too many big rewards in a small space of time diminishes their impact.
  • Another thing we know from psychology is that a random reward schedule is more powerful than a fixed schedule. This does not mean that the rewards themselves should be arbitrary; they should be linked to the player’s progress through the game, and they should happen as a direct result of what the player did, so that the player feels a sense of accomplishment. It is far more powerful to reward the player because of their deliberate action in the game, than to reward them for something they didn’t know about and weren’t even trying for.

I’ll give a few examples:

  • Have you ever started a new game on Facebook and been immediately given some kind of trophy or “achievement unlocked” bonus just for logging in the first time? I think this is a mistake a lot of Facebook games make: they give a reward that seems arbitrary, and it actually waters down the player’s actual achievements later. It gives the impression that the game is too easy. Now, for some games, you may want them to seem easy if they are aimed at an extremely casual audience, but the danger is reducing the actual, genuine feelings of accomplishment the player gets later.
  • “Hidden achievements” in Xbox 360 games, or their equivalents on other platforms. If achievements are a reward for skill, how is the player to know what the achievement is if it’s hidden? Even more obnoxious, a lot of these achievements are for things that aren’t really under player control and that seem kind of arbitrary, like “do exactly 123 damage in a single attack” where damage is computed randomly. What exactly is the player supposed to feel rewarded for here?
  • A positive example would be random loot drops in a typical action game or action-RPG. While these are random, and occasionally the player gets a really cool item, this is still tied to the deliberate player action of defeating enemies, so the player is rewarded but on a random schedule. (Note that you can play around with “randomness” here, for example by having your game track the time between rare loot drops, and having it deliberately give a big reward if the player hasn’t seen one in awhile. Some games split the difference, with random drops and also fixed drops from specific quests/bosses, so that the player is at least getting some great items every now and then.)
  • Another common example: the player is treated to a cut-scene once they reach a certain point in a dungeon. Now, at first you might say this isn’t random – it happens at exactly the same point in the game, every time, because the level designer scripted that event to happen at exactly that place! And on multiple playthroughs you’d be correct… but the first time a player experiences the game, they don’t know where these rewards are, so from the player’s perspective it is not something that can be predicted; it may as well be random.

Now I’d like to talk about three kinds of rewards that all relate to progression: increasing player power, level transitions, and story progression.

Rewarding the player with increased power

Progression through getting a new toy/object/capability that actually increases player options is another special milestone. Like we said before, you want these spaced out, though a lot of times I see the player get all the cool toys in the first third or half of the game and then spend the rest of the game finding new and interesting ways to use them. This can be perfectly valid design; if the most fun toy in your game is only discovered 2/3rds of the way through, that’s a lot of time the player doesn’t get to have fun – Valve made this example famous through the Gravity Gun in Half-Life 2: as the story goes, they initially had you get this gun near the end of the game, but players had so much fun with it that they restructured their levels to give it to the player much earlier. Still, if you give the player access to everything early on, you need to use other kinds of rewards to keep them engaged through the longer final parts of the game where they don’t find any new toys. How can you do this? Here’s a few ways:

  • If your mechanics have a lot of depth, you can just present unique combinations of things to the player to keep them challenged and engaged. (This is really hard to do in practice.)
  • Use other rewards more liberally after you shut off the new toys: more story, more stat increases, more frequent boss fights or level transitions. You can also offer upgrades to their toys, although it’s debatable whether you can think of an “upgrade” as just another way of saying “new toy.”
  • Or you can, you know, make your game shorter. In this day and age, thankfully, there’s no shame in this. Portal and Braid are both well-known for two things: being really great games, and being short. At the big-budget AAA level, Batman: Arkham Asylum was one of the best games of last year (both in critical reception and sales), even though I hear it only lasts about ten hours or so.

Rewarding the player with level transitions

Progression through level transitions – that is, progression to a new area – is a special kind of reward, because it makes the player feel like they’re moving ahead (and they are!). You want these spaced out a bit so the player isn’t so overwhelmed by changes that they feel like the whole game is always moving ahead without them; a rule of thumb is to offer new levels or areas on a slightly increasing curve, where each level takes a little bit longer than the last. This makes the player feel like they are moving ahead more rapidly at the start of the game when they haven’t become as emotionally invested in the outcome; a player can tolerate slightly longer stretches between transitions near the end of the game, especially if they are being led up to a huge plot point. Strangely, a lot of this can be done with just the visual design of the level, which is admittedly crossing from game design into the territory of game art: for example, if you have a really long dungeon the players are traversing, you can add things to make it feel like each region of the dungeon is different, maybe having the color or texture of the walls change as the player gets deeper inside, to give the player a sense that they are moving forward.

Rewarding the player with story progression

Progression through plot advancement is interesting to analyze, because in so many ways the story is separate from the gameplay: in most games, knowing the characters’ motivations or their feelings towards each other has absolutely no meaning when you’re dealing with things like combat mechanics. And yet, in many games (originally this was restricted to RPGs, but we’re seeing story integrated into all kinds of games these days), story progression is one of the rewards built into the reward cycle.

Additionally, the story itself has a “difficulty” of sorts (we call it “dramatic tension”), so another thing to consider in story-based games is whether the dramatic tension of the story overlaps well with the overall difficulty of the game. Many games do not: the story climax is at the end, but the hardest part of the game is in the middle somewhere, before you find an uber-powerful weapon that makes the rest of the game easy. In general, you want rising tension in your story while the difficulty curve is increasing, dramatic climaxes (climaxen? climaces?) at the hardest part, and so on; this makes the story feel more integrated with the mechanics, all thanks to game balance and math. It’s really strange to write that you get a better story by using math, but there you are. (I guess another way of doing this would be to force the story writers to put their drama in other places to match the game’s difficulty curve, but in practice I think it’s easier to change a few numbers than to change the story.)

Combining the types of rewards into a single reward schedule

Note that a reward is a reward, so you don’t just want to space each category of rewards out individually, but also interleave them. In other words, you don’t need to have too many overlaps, where you have a level transition, plot advancement, and a power level increase all at once.

Level transitions are fixed, so you tend to see the power rewards sprinkled throughout the levels as rewards between transitions. Strangely, in practice, a lot of plot advancement tends to happen at the same time as level transitions, which might be a missed opportunity. Some games take the chance to add some backstory in the middle of levels, in areas that are otherwise uninteresting… although then the danger is that the player is getting a reward arbitrarily when they feel like they weren’t doing anything except walking around and exploring. A common design pattern I see in this case is to split the difference by scripting the plot advancement so it immediately follows a fight of some kind. Even if it’s a relatively easy fight, if it’s one that’s scripted, the reward of revealing some additional story immediately after can make the player feel like they earned it.

Challenge Levels in PvP

If PvE games are all about progression and rewards, PvP games are about gains and losses relative to your opponents. Either directly or indirectly, the goal is to gain enough power to win the game, and there is some kind of tug-of-war between the players as each is trying to get there first. I’ll remind you that when I’m saying “power” in the context of progression, I’m talking about the sum of all aspects of the player’s position in the game, so this includes having more pieces and cards put into play, more resources, better board position, taking more turns or actions, or really anything that affects the player’s standing (other than the player’s skill level at playing the game). The victory condition for the game is sometimes to reach a certain level of power directly; sometimes it is indirect, where the actual condition is something abstract like Victory Points, and it is the player’s power in the game that merely enables them to score those Victory Points. And in some cases the players don’t gain power, they lose power, and the object of the game is to get the opponent(s) to run out first. In any case, gaining power relative to your opponents is usually an important player goal.

Tracking player power as the game progresses (that is, seeing how power changes over time in a real-time game, or how it changes each turn in a turn-based game) can follow a lot of different patterns in PvP games. In PvE you almost always see an increase in absolute player power level over time (even if their power level relative to the challenges around them may increase or decrease, depending on the game). In PvP there are more options to play with, since everything is relative to the opponents and not compared with some absolute “you must be THIS GOOD to win the game” yardstick.

Positive-sum, negative-sum, and zero-sum games

This seems as good a time as any to talk about an important distinction in power-based progression that we borrow from the field of Game Theory: whether the game is zero-sum, positive-sum, or negative-sum. If you haven’t heard these terms before:

  • Positive-sum means that the overall power in the game increases over time. Settlers of Catan is an example of a positive-sum game: with each roll of the dice, resources are generated for the players, and all players can gain power simultaneously without any of their opponents losing power. Monopoly is another example of a positive-sum game, because on average every trip around the board will give the player $200 (and that money comes from the bank, not from other players). While there are a few spaces that remove wealth from the game and are therefore negative-sum (Income Tax, Luxury Tax, a few of the Chance and Community Chest cards, unmortgaging properties, and sometimes Jail), on average these losses add up to less than $200, so on average more wealth is created than removed over time. Some players use house rules that give jackpots on Free Parking or landing exactly on Go, which make the game even more positive-sum. While you can lose lots of money to other players by landing on their properties, that activity itself is zero-sum (one player is losing money, another player is gaining the exact same amount). This helps explain why Monopoly feels to most people like it takes forever: it’s a positive-sum game so the average wealth of players is increasing over time, but the object of the game is to bankrupt your opponents which can only be done through zero-sum methods. And the house rules most people play with just increase the positive-sum nature of the game, making the problem worse!
  • Zero-sum means that the sum of all power in the game is a constant, and can neither be created nor destroyed by players. In other words, the only way for me to gain power is to take it from another player, and I gain exactly as much as they lose. Poker is an example of a zero-sum game, because the only way to win money is to take it from other players, and you win exactly as much as the total that everyone else loses. (If you play in a casino or online where the House takes a percentage of each pot, it actually becomes a negative-sum game for the players.)
  • Negative-sum means that over time, players actually lose more power than they gain; player actions remove power from the game without replacing it. Chess is a good example of a negative-sum game; generally over time, your force is getting smaller. Capturing your opponent’s pieces does not give those pieces to you, it removes them from the board. Chess has no zero-sum elements, where capturing an enemy piece gives that piece to you (although the related game Shogi does work this way, and has extremely different play dynamics as a result). Chess does have one positive-sum element, pawn promotion, but that generally happens rarely and only in the end game, and serves the important purpose of adding a positive feedback loop to bring the game to a close… something I’ll talk about in just a second.

An interesting property here is that changes in player power, whether zero-sum, positive-sum, or negative-sum, are the primary rewards in a PvP game. The player feels rewarded because they have gained power relative to their opponents, so they feel like they have a better chance of winning after making a particularly good move.

Positive and negative feedback loops

Another thing I should mention here is how positive and negative feedback loops fit in with this, because you can have either kind of feedback loop with a zero-sum, positive-sum or negative-sum game, but they work differently. In case you’re not familiar with these terms, “positive feedback loop” means that receiving a power reward makes it more likely that you’ll receive more, in other words it rewards you for doing well and punishes you for doing poorly; “negative feedback loop” is the opposite, where receiving a power reward makes it less likely you’ll receive more, so it punishes you for doing well and rewards you for doing poorly. I went into a fair amount of detail about these in last summer’s course, so I won’t repeat that here.

One interesting property of feedback loops is how they affect the player’s power curve. With negative feedback, the power curve of one player usually depends on their opponent’s power: they will increase more when behind, and decrease more when ahead, so a single player’s power curve can look very different depending on how they’re doing relative to their opponents, and this will look different from game to game.

With positive feedback, you tend to have a curve that gets more sharply increasing or decreasing over time, with larger swings in the endgame; unlike negative feedback, a positive feedback curve doesn’t always take the opponent’s standings into account… it can just reward a player’s absolute power.

Now, these aren’t hard-and-fast rules… a negative feedback loop can be absolute, which basically forces everyone to slow down around the time they reach the end game; and a positive feedback loop can be relative, where you gain power when you’re in the lead. However, if we understand the game design purpose that is served by feedback loops, we’ll see why positive feedback is usually independent of the opponents, while negative feedback is usually dependent.

The purpose of feedback loops in game design

The primary purpose of positive feedback is to get the game to end quickly. Once a winner has been decided and a player is too far ahead, you don’t want to drag it out because that wastes everyone’s time. Because of this, you want all players on an accelerating curve in the end game. It doesn’t really matter who is ahead; the purpose is to get the game to end, and as long as everyone gets more power, it will end faster.

By contrast, the primary purpose of negative feedback is to let players who are behind catch up, so that no one ever feels like they are in a position where they can’t possibly win. If everyone is slowed down in exactly the same fashion in the endgame, that doesn’t fulfill this purpose; someone who was behind at the beginning can still be behind at the end, and even though the gap appears to close, they are slowed down as much as anyone else. In order to truly allow those who are behind to catch up, the game has to be able to tell the difference between someone who is behind and someone who is ahead.

Power curves

So, what does a player’s power curve look like in a PvP game? Here are a few ways you might see a player’s power gain (or loss) over time:

  • In a typical positive-sum game, each player is gaining power over time in some way. That might be on an increasing, linear, or decreasing curve.
  • In a positive-sum game with positive feedback, the players are gaining power over time and the more power they gain, the more they have, so it’s an increasing curve (such as a triangular or exponential gain in power over time) for each player. If you subtract one player’s curve from another (which shows you who is ahead, who is behind, and how often the lead is changing), usually what happens is one player gets an early lead and then keeps riding the curve to victory, unless they make a mistake along the way. Such a game is usually not that interesting for the players who are not in the lead.
  • In a positive-sum game with negative feedback, the players are still on an increasing curve, but that curve is altered by the position of the other players, reducing the gains for the leader and increasing them further for whoever’s behind, so if you look at all the players’ power curves simultaneously you’ll see a sort of tangled braid where the players are constantly overtaking one another. Subtracting one player’s power from another over time, you’ll see that the players’ relative power swings back and forth, which is pretty much how any negative feedback should work.
  • In a typical zero-sum game, players take power from each other, and the sum of all player power is a constant. In a two-player game, that means you could derive either player’s power curve just by looking at the other one.
  • In a zero-sum game with positive feedback, the game may end quickly as one player takes an early advantage and presses it to gain even more of an advantage, taking all of the power from their opponents quickly. Usually games that fall into this category also have some kind of early-game negative feedback built in to prevent the game from coming to an end too early, unless the game is very short.
  • In a zero-sum game with negative feedback, we tend to see swings of power that pull the leader back to the center. This keeps the players close, but also makes it very hard for a single player to actually win; if the negative feedback is too strong, you can easily end in a stalemate where neither player can win, which tends to be unsatisfying. A typical design pattern for zero-sum games in particular is to have some strong negative feedback mechanisms in the early game that taper off towards the end, while positive feedback increases towards the end of the game. This can end up as a pretty exciting game of back-and-forth where each player spends some time in the lead before one final, spectacular, irreversible triumph that brings the game to a close.
  • In a typical negative-sum game, the idea is generally not for a player to acquire enough power to win, but rather for a player to lose the least power relative to their opponents. In negative-sum games, players are usually eliminated when they lose most or all of their power, and the object is either to be the last one eliminated, or to be in the best standing when the first opponent is eliminated. A player’s power curve might be increasing, decreasing or constant, sort of an inverse of the positive-sum game, and pretty much everything else looks like a positive-sum game turned upside down.
  • In a negative-sum game with positive feedback, players who are losing will lose even faster. The more power a player has left over, the slower they’ll tend to lose it, but once they start that slide into oblivion it happens more and more rapidly.
  • In a negative-sum game with negative feedback, players who are losing will lose slower, and players who have more power tend to lose it faster, so again you’ll see this “braid” shape where the players will chase each other downward until they start crashing.

Applied power curves

Now, maybe you can visualize what a power curve looks like in theory, showing how a player’s power goes up or down over time… but how do you actually make one for a real game?

The easiest way to construct a power curve is through playtest data. The raw numbers can easily allow you to chart something like this. The hardest part is coming up with some kind of numeric formula for “power” in the game: how well a player is actually doing in absolute terms. This is easier with some games than others. In a miniatures game like HeroClix or Warhammer 40K, each figurine you control is worth a certain number of points, so it is not hard to add your points together on any given turn to get at least a rough idea of where each player stands. In a real-time strategy game like Starcraft, adding a player’s current resources along with the resource costs of all of their units and structures would also give a reasonable approximation of their power over time. For a game like Chess where you have to balance a player’s remaining pieces, board position and tempo, this is a bit trickier. But once you have a “power formula” you can simply track it for all players over time through repeated playtests to see what kinds of patterns emerge.

Ending the game

One of the really important things to notice as you do this is the amount of time it takes to reach a certain point. You want to scale the game so that it ends in about as much time as you want it to.

The most obvious way to do that is by hard-limiting time or turns which guarantees a specific game length (“this game ends after 4 turns”); sometimes this is necessary and even compelling, but a lot of times it’s just a lazy design solution that says “we didn’t playtest this enough to know how long the game would take if you actually played to a satisfying conclusion.”

An alternative is to balance your progression mechanics to cause the game to end within your desired range. You can do this by changing the nature of how positive or negative sum your game is (that is, the base rate of combined player power gain or loss), or by adding, removing, strengthening or weakening your feedback loops. This part is pretty straightforward, if you collect all of the numbers that you need to analyze it. For example, if you take an existing positive feedback loop and make the effect stronger, the game will probably end earlier, so that is one way to shorten the game.

Game phases

I should note that some PvP games have well-defined transitions between different game phases. The most common pattern here is a three-phase structure where you have an early game, a mid-game and an endgame, as made famous by Chess –which has many entire books devoted to just a single phase of the game. If you become aware of these transitions (or if you design them into the game explicitly), you don’t just want to pay attention to the player power curve throughout the game, but also how it changes in each phase, and the relative length of each phase.

For example, a common finding in game prototypes is that the endgame isn’t very interesting and is mostly a matter of just going through the motions to reach the conclusion that you already arrived at in mid-game. To fix this, you might think of adding new mechanics that come into play in the endgame to make it more interesting. Or, you might try to find ways to either extend the mid-game or shorten the endgame by adjusting your feedback loops and the positive, negative or zero-sum nature of your game during different phases.

Another common game design problem is a game that’s great once the players ramp up in mid-game, but the early game feels like it starts too slowly. One way to fix this is to add a temporary positive-sum nature to the early game in order to get the players gaining power and into the mid-game quickly.

In some games, the game is explicitly broken into phases as part of the core design. One example is the board game Shear Panic, where the scoring track is divided into four regions, and each region changes the rules for scoring which gives the game a very different feel in each of the game’s four phases. In this game, you transition between phases based on the number of turns taken in the game, so the length of each phase is dependent on how many turns each player has had. In a game like this, you could easily change the length of time spent in each phase by just increasing the length of that phase, so it lasts more turns.

Other games have less sharp transitions between different phases, and those may not be immediately obvious or explicitly designed. Chess is one example I’ve already mentioned. Another is Netrunner, an asymmetric CCG where one player (the Corporation) is trying to put cards in play and then spend actions to score the points on those cards, and the other player (the Runner) is trying to steal the points before they’re scored. After the game had been released, players at the tournament level realized that most games followed three distinct phases: the early game when the Runner is relatively safe from harm and could try to steal as much as possible; then the mid-game when the Corporation sets up its defenses and makes it prohibitively expensive, temporarily  for the Runner to steal anything; and finally the endgame where the Runner puts together enough resources to break through the Corporation’s defenses to steal the remaining points needed for the win. Looked at in this way, the Corporation is trying to enter the second phase as early in the game as possible and is trying to stretch it out for as long as possible, while the Runner is trying to stay in the first phase as long as it can and then if it doesn’t win early, the Runner tries to transition from the second phase into the endgame before the Corporation has scored enough to win.

How do you balance the progression mechanics in something like this? One thing you can do, as was done with Netrunner, is to put the progression of the game under partial control of the players, so that it is the players collectively trying to push the game forward or hold it back. That creates an interesting meta-level of strategic tension.

Another thing you can do is include some mechanics that actually have some method of detecting what phase you’re in, or at the very least, that tend to work a lot better in some phases than others. Netrunner does this as well; for example, the Runner has some really expensive attack cards that aren’t so useful early on when they don’t have a lot of resources, but that help it greatly to end the game in the final phase. In this way, as players use new strategies in each phase, it tends to give the game a very different feel and offers new dynamics as the game progresses. And then, of course, you can use some of these mechanics specifically to adjust the length of each phase in order to make the game progress at the rate you desire. In Netrunner, the Corporation has some cheap defenses it can throw up quickly to try to transition to the mid-game quickly, and it also has more expensive defenses it can use to put up a high bar that the Runner has to meet in order to transition to the endgame. By adjusting both the relative and absolute lengths of each phase in the game, you can make sure that the game takes about as long as you want it to, and also that it is broken up into phases that last a good amount of time relative to each other.

Ideal game length

All of this assumes you know how long the game (and each phase of the game) should take, but how do you know that? Part of it depends on target audience: young kids need short games to fit their attention span. Busy working adults want games that can be played in short, bite-sized chunks of time. Otherwise, I think it depends mainly on the level and depth of skill: more luck-based or casual games tend to be shorter, while deeper strategic games can be a bit longer. Another thing to consider is at what point a player is far enough ahead that they’ve essentially won: you want this point to happen just about the time when the game actually ends so it doesn’t drag on.

For games that never end, like MMOs or Facebook games, you can think of the elder game as a final infinite-length “phase” of the game, and you’ll want to change the length of the progression portion of your game so that the transition happens at about the time you want it to. How long that is depends on how much you want to support the progression game versus the elder game. For example, if your progression game is very different from your elder game and you see a lot of “churn” (that is, a lot of players that leave the game) when they hit the elder game, and you’re using a subscription-based model where you want players to keep their accounts for as long as possible, you’ll probably want to do two things: work on softening the transition to elder game so you lose fewer people, and also find ways of extending the early game (such as issuing expansion sets that raise the level cap, or letting players create multiple characters with different race/class combinations so they can play through the progression game multiple times).

Another interesting case is story-based RPGs, where the story often outlasts the mechanics of the game. We see this all the time with console RPGs, where it says “100 hours of gameplay” right on the box. And on the surface that sounds like the game is delivering more value, but in reality if you’re just repeating the same tired old mechanics and mindlessly grinding for 95 of those hours, all the game is really doing is wasting your time. Ideally you want the player to feel like they’re progressing through learning new mechanics and progressing through the story at any given time; you don’t want the gameplay to drag on any more than you want filler plot that makes the story feel like it’s dragging on. These kinds of games are challenging to design because you want to tune the length of the game to match both story and gameplay, and often that either means lengthening the story or adding more gameplay, both of which tend to be expensive in development. (You can also shorten the story or remove depth from the gameplay, but when you’ve got a really brilliant plot or really inspired mechanics it can be hard to rip that out of the game just to save a few bucks; also, in this specific case there’s often a consumer expectation that the game is pretty long to give it that “epic” feel, so the tendency is to just keep adding on one side or the other.)

Flow Theory, Revisited

With all that said, let’s come back to flow. At the start of this blog post, I said there were two problems here that needed to be solved. One is that the player skill is increasing throughout the game, which tends to shift them from being in the flow to being bored. This is mostly a problem for longer PvE games, where the player has enough time and experience in the game to genuinely get better.

The solution, as we’ve seen when we talked about PvE games, is to have the game compensate by increasing its difficulty through play in order to make the game seem more challenging – this is the essence of what designers mean when they talk about a game’s “pacing.” For PvP games, in most cases we want the better player to win, so this isn’t seen as much of a problem; however, for games where we want the less-skilled player to have a chance and the highly-skilled player to still be challenged, we can implement negative feedback loops and randomness to give an extra edge to the player who is behind.

There was another problem with flow that I mentioned, which is that you can design your game at one level of difficulty, but players come to your game with a range of initial difficulty levels, and what’s easy for one player is hard for another.

With PvE games, as you might guess, the de facto standard is to implement a series of difficulty levels, with higher levels granting the AI power-based bonuses or giving the player fewer power-based bonuses, because that is relatively cheap and easy to design and implement. However, I have two cautions here:

  1. If you keep using the same playtesters they will become experts at the game, and thus unable to accurately judge the true difficulty of “easy mode”; easy should mean easy and it’s better to err on the side of making it too easy, than making it challenging enough that some players will feel like they just can’t play at all. The best approach is to use a steady stream of fresh playtesters throughout the playtest phase of development (these are sometimes referred to as “Kleenex testers” because you use them once then throw them away). If you don’t have access to that many testers, at least reserve a few of them for the very end of development, when you’re tuning the difficulty level of “easy.”
  2. Take care to set player expectations up front about higher difficulties, especially if the AI actually cheats. If the game pretends on the surface to be a fair opponent that just gets harder because it is more skilled, and then players find out that it’s actually peeking at information that’s supposed to be hidden, it can be frustrating. If you’re clear that the AI is cheating and the player chooses that difficulty level anyway, there are less hurt feelings: the player is expecting an unfair challenge and the whole point is to beat that challenge anyway. Sometimes this is as simple as choosing a creative name for your highest difficulty level, like “Insane.”

There are, of course, other ways to deal with differing player skill levels. Higher difficulty levels can actually increase the skill challenge of the game instead of the power challenge. Giving enemies a higher degree of AI, as I said before, is expensive but can be really impressive if pulled off correctly. A cheaper way to do this in some games is simply to modify the design of your levels by blocking off easier alternate paths, forcing the player to go through a harder path to get to the same end location when they’re playing at higher difficulty.

Then there’s Dynamic Difficulty Adjustment (DDA), which is a specialized type of negative feedback loop where the game tries to figure out how the player is doing and then adjusts the difficulty on the fly. You have to be very careful with this, as with all negative feedback loops, because it does punish the player for doing well and some players will not appreciate that if it isn’t set up as an expectation ahead of time.

Another way to do this is to split the difference, by offering dynamic difficulty changes under player control. Like DDA, try to figure out how the player is doing… but then, give the player the option of changing the difficulty level manually. One example of this is the game flOw, where the player can go to the next more challenging level or the previous easier level at just about any time, based on how confident they are in their skills. Another example, God of War did this and probably some other games as well, is if you die enough times on a level it’ll offer you the chance to drop the difficulty on the reload screen (which some players might find patronizing, but on the other hand it also gives the player no excuse if they die again anyway). Sid Meier’s Pirates actually gives the player the chance to increase the difficulty when they come into port after a successful mission, and actually gives the player an incentive: a higher percentage of the booty on future missions if they succeed.

The equivalent in PvP games is a handicapping system, where one player can start with more power or earn more power over the course of the game, to compensate for their lower level of skill. In most cases this should be voluntary, though; players entering a PvP contest typically expect the game to be fair by default.

Case Studies

With all of that said, let’s look at a few examples to see how we can use this to analyze games in practice.

Space Invaders (and other retro-arcade games)

This game presents the same skill-based challenge to you, wave after wave, and increasing the skill by making the aliens move and shoot faster and start lower. The player has absolutely no way to gain power in the game; you start with three lives and that’s all you get, and there are no powerups. On the other hand, you also don’t really lose power in the game, in a sense: whether you have one life remaining or all three, your offensive and defensive capabilities are the same. The player’s goal is not to win, but to survive as long as possible before the increasing challenge curve overwhelms them. Interestingly, the challenge curve does change over the course of a wave; early on there are a lot of aliens and they move slowly, so it’s very easy to hit a target. Later on you have fewer targets and they move faster, which makes the game more challenging, and of course if they ever reach the ground you lose all of your lives which makes this a real threat. Then the next wave starts and it’s a little harder than the last one, but the difficulty is still decreased initially. (You’d think that there would also be a tradeoff in that fewer aliens would have less firepower to shoot at you, but in the actual game I think the overall rate of fire was constant, it’s just how spread out it is, so this didn’t actually change much as each round progressed.)

Chess (and other squad-based wargames)

If I had to put Chess in a genre, yes, I’d call a squad-based wargame… which is kind of odd since we normally think of this as an entire army and not a squad. I mean this in the sense that you start with a set force and generally do not receive reinforcements, nor do you have any mechanics about resources, production, supply or logistics which you tend to see in more detailed army-level games.

Here, we are dealing with a negative-sum game that actually has a mild positive-feedback loop built into it: if you’re ahead in pieces, trades tend to be beneficial to you (other things being equal), and once you reach the endgame certain positions let you basically take an automatic win if you’re far enough ahead. This can be pretty demoralizing for the player who is losing, especially if you have two players with extremely unequal skill at the game, because they will tend to start losing early and just keep losing more and more as the game goes on.

The only reason this works is because against two equally-skilled players, there does tend to be a bit of back-and-forth as players trade off pieces for board control or tempo, so a player that appears to be losing has a number of opportunities to turn that around later before the endgame. Against well-matched opponents you will tend to see a variable rate of decrease as they trade pieces, based on how well they are playing, and if they play about as well we’ll see a game where the conclusion is uncertain until the endgame (and even then, if the players are really well-matched, we’ll see a stalemate).

Settlers of Catan (and other progression-based board games)

Here is a game where progression is gaining power, so it is positive-sum. There are only very limited cases where players can actually lose their progress; mostly, when you build something, that gain is permanent. Catan contains a pretty powerful positive feedback loop, in that building more settlements and cities gives you more resources which lets you build even more, and building is the primary victory condition. At first you’d think this means that the first player to get an early-game advantage automatically wins, and if players couldn’t trade with each other that would almost certainly be the case. The ability to trade freely with other players balances this aspect of the game, as trading can be mutually beneficial to both players involved in the trade; if players who are behind trade fairly with each other and refuse to trade with those in the lead at all (or only at exorbitant rates of exchange), they can catch up fairly quickly. If I were to criticize this game at all, it would be that the early game doesn’t see a lot of rapid progression because the players aren’t generating that many resources yet – and in fact, other games in the series fix this by giving players extra resources in the early game.

Mario Kart (and other racing games)

Racing games are an interesting case, because players are always progressing towards the goal of the finish line. Most racing video games include a strong negative feedback loop that keeps everyone feeling like they still have a chance, up to the end – usually through some kind of “rubber-banding” technique that causes the computer-controlled cars to speed up or slow down based on how the players are doing. Games like Mario Kart take this a step further, offering pickups that are weighted so that if you’re behind, you’re likely to get something that lets you catch up, while if you’re ahead you’ll get something less valuable, making it harder for you to fight for first. On the one hand, this provides an interesting tension: players in the lead know that they just have to keep the lead for a little bit longer, while players who are behind realize that time is running out and they have to close the gap quickly. On the other hand, the way most racing games do this feels artificial to a lot of players, because it feels like a player’s standing in the race is always being modified by factors outside their control. Since the game’s length is essentially capped by the number of laps, the players are trying to exchange positions before the race ends, so you get an interesting progression curve where players are all moving towards the end at about the same rate.

Notice that this is actually the same progression pattern as Catan: both games are positive-sum with negative feedback. And yet, it feels very different as a player. I think this is mostly because in Catan, the negative feedback is under player control, while in Mario Kart a lot of it is under computer control.

Interestingly, this is also the same pattern in stock car racing in real life. In auto racing, there’s also a negative feedback loop, but it feels a lot more fair: the person in the lead is running into a bunch of air resistance so they’re burning extra fuel to maintain their high speed, which means they need more pit stops; meanwhile, the people drafting behind them are much more fuel-efficient and can take over the lead later. This isn’t arbitrary, it’s a mechanic that affects all players equally, and it’s up to each driver how much of a risk they want to take by breaking away from the pack. So again, this is something that feels more fair because the negative feedback is under player control.

Final Fantasy (and other computer/console RPGs)

In these games, the player is mostly progressing through the game by increasing their power level more than their skill level. Older games on consoles like NES tended to be even more based in stats and less in skill than today’s games (i.e. they required a lot more grinding than players will put up with today). Most of these games made today do give the player more abilities as they progress through the experience levels, giving them more options and letting them increase their tactical/strategic skills. Progression and rewards also come from plot advancement and reaching new areas. Usually these games are paced on a slightly increasing curve, where each area takes a little more time than the last. As we discussed in an earlier week, there’s usually a positive feedback loop in that winning enough combats lets you level up, which in turn makes it easier for you to win even more combats, and that is counteracted by the negative feedback loop that your enemies also get stronger, and that you need more and more victories to level up again if you stay in the same area too long, which means the actual gains are close to linear.

World of Warcraft (and other MMORPGs)

Massively multiplayer online games have a highly similar progression to CRPGs, except they then transition at the end to this elder game state, and at that point the concept of “progression” loses a lot of meaning. So our analysis looks much the same as it does with the more traditional computer RPGs, up until that point.

Nethack (and other roguelikes)

There are the so-called “rogue-like” games, which are kind of like this weird fusion with the leveling-up and stat-based progression of an RPG, and the mercilessness of a retro arcade game. A single successful playthrough in Nethack looks similar to that of an RPG, with the player gaining power to meet the increasing level of challenge in the game, but actually reaching the level of skill to complete the game takes much, much longer. If you’ve never played these games, one thing you should know is that most of them have absolutely no problem killing you dead if you make the slightest mistake. And when I say “dead,” I mean they will literally delete your save file, permanently, and then you have to start over from scratch with a new character. So, like an arcade game, the player’s goal is to stay alive as long as possible and progress as far as possible, so progress is both a reward and a measure of skill. While there is a win condition, a lot of players simply never make it that far; keep in mind that taking a character all the way from the start to the end of the game may take dozens of hours, similar to a modern RPG, but with a ton of hard resets that collectively raise the player’s skill as they die (and then learn how not to die that way in the future).

Therefore, Nethack looks like an increasing player power and power/skill challenge over the course of a single playthrough… but over a player’s lifetime, you see these repeated increases punctuated by total restarts, with a slowly increasing player skill curve over time that lets the player survive for longer in each successive game.

FarmVille (and other social media games)

If roguelikes are the harsh sadists of the game design world, then the cute fluffy bunnies of the game world would be progression-based “social media” games like FarmVille. These are positive-sum games where you basically click to progress, it’s nearly impossible to lose any progress at all, and you’re always gaining something. More player skill simply means you progress at a faster rate. Eventually, you transition to the elder game, but from most of the games I’ve played this is a more subtle transition than with MMOs. FarmVille doesn’t have a level cap that I know of (if it does have one, it’s so ridiculously high that most people will never see it), it’ll happily let you keep earning experience and leveling up… although after a certain point you don’t really get any interesting rewards for doing so. But after awhile, the reward loop starts rewarding you less and less frequently, you finish earning all the ribbons or trophies or achievements or whatever, so it’s not that you can’t progress any further, but that the game doesn’t really reward you for progression as much, so at some point the player decides that further progression just isn’t worth it to them, and they either stop playing or they start playing in a different way. If they start playing differently that’s where the elder game comes in. Interestingly, the player’s actions in the elder game still do cause progression.

If You’re Working On a Game Now…

If you’re working on a game right now, and that game has progression mechanics, I want you to ask yourself some design questions about the nature of that progression:

  • What is the desired play length of this game? Why? Really challenge yourself here – could you justify a reason to make it twice as long or half as long? If your publisher (or whoever) demanded that you do so anyway, what else would need to change about the nature of the game in order to compensate?
  • Does the game actually play that long? How do you know?
  • If the game is split up into phases, areas, locations or whatever, how long are they? Do they tend to get longer over time, or are there some that are a lot longer (or shorter) than those that come immediately before or after? Is this intentional? Is it justifiable?
  • Is your game positive-sum, negative-sum, or zero-sum? Do any phases of the game have positive or negative feedback loops? How do these affect total play time?

Homework

Back in week 2, your “homework” was to analyze a progression mechanic in a game. In particular, you analyzed the progression of player power to the power level of the game’s challenges, over time, in order to identify weak spots where the player would either go through an area too quickly because they’re too powerful by the time they get there, or they’re stuck grinding because they’re behind the curve and have to catch up

It’s time to revisit that analysis with what we now know about pacing. This week, I’d like you to analyze the reward structure in the game. Consider all kinds of rewards: power-based, level progression, and plot advancement, and anything else you can identify as it applies to your game:

  • Which of these is truly random (such as randomized loot drops), and which of them only seem random to the player on their first playthrough (they’re placed in a specific location by the level designer, but the player has no way of knowing ahead of time how quickly they’ll find these things)… and are there any rewards that happen on a fixed schedule that’s known to the player?
  • How often do rewards happen? Do they happen more frequently at the beginning? Are there any places where the player goes a long time with relatively few rewards? Do certain kinds of rewards seem to happen more frequently than others, or at certain times?

Now, look back again at your original predictions, where you felt that the game either went too quickly, or more likely seemed to drag on forever, at certain points (based on your gut reaction and memory). Do these points coincide with more or fewer rewards in the game? Now ask yourself if the problem could have been solved by just adding a new power-up at a certain place, instead of fixing the leveling or progression curve.

Level 6: Situational Balance

August 11, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Question

If you want to check your answer from last week:

Analyzing Card Shuffles

For a 3-card deck, there are six distinct shuffling results, all equally likely. If the cards are A, B, and C, then these are: ABC, ACB, BAC, BCA, CAB, CBA. Thus, for a truly random shuffler, we would expect six outcomes (or a multiple of six), with each of these results being equally likely.

Analyzing Algorithm #1:

First you choose one of three cards in the bottom slot (A, B, or C). Then you choose one of the two remaining cards to go in the middle (if you already chose A for the bottom, then you would choose between B and C). Finally, the remaining card is put on top (no choice involved). These are separate, (pseudo)random, independent trials, so to count them we multiply: 3x2x1 = 6. If you actually go through the steps to enumerate all six possibilities, you’ll find they correspond to the six outcomes above. This algorithm is correct, and in fact is one of the two “standard” ways to shuffle a deck of cards. (The other algorithm is to generate a pseudorandom number for each card, then put the cards in order of their numbers. This second method is the easiest way to randomly order a list in Excel, using RAND(), RANK() and VLOOKUP().)

Analyzing Algorithm #2:

First of all, if a single shuffle is truly random, then repeating it 50 times is not going to make it any more random, so this is just a waste of computing resources. And if the shuffle isn’t random, then repeating may or may not make it any better than before, and you’d do better to fix the underlying algorithm rather than covering it up.

What about the inner loop? First we choose one of the three cards to go on bottom, then one of the three to go in the middle, and then one of the three to go on top. As before these are separate independent trials, so we multiply 3x3x3 = 27.

Immediately we know there must be a problem, since 6 does not divide evenly into 27. Therefore, without having to go any further at all, we know that some shuffles must be more likely than others. So it would be perfectly valid to stop here and declare this algorithm “buggy.” If you’re sufficiently determined, you could actually trace through this algorithm all 27 times to figure out all outcomes, and show which shuffles are more or less likely and by how much. A competitive player, upon learning the algorithm, might actually run such a simulation for a larger deck in order to gain a slight competitive advantage.

This Week

This is a special week. We spent two weeks near the start of the course talking about balancing transitive games, and then two more weeks talking about probability. This week we’re going to tie the two together, and put a nice big bow on it.

This week is about situational balancing. What is situational balancing? What I mean is that sometimes, we have things that are transitive, sort of, but their value changes over time or depends on the situation.

One example is area-effect damage. You would expect something that does 500 damage to multiple enemies at once is more valuable than something that does 500 damage just to a single target, other things being equal. But how much more valuable is it? Well, it depends. If you’re only fighting a single enemy, one-on-one, it isn’t any more valuable. If you’re fighting fifty enemies all clustered together in a swarming mass, it’s 50x more valuable. Maybe at some points in your game you have swarms of 50 enemies, and other times you’re only fighting a single lone boss. How do you balance something like that?

Or, consider an effect that depends on what your opponent does. For example, there’s a card in Magic: the Gathering called Karma, that does 1 damage to your opponent each turn for each of their Swamps in play. Against a player who has 24 Swamps in their deck, this single card can probably kill them very dead, very fast, all on its own. Against a player with no Swamps at all, the card is totally worthless. (Well, it’s worthless unless you have other cards in your deck that can turn their lands into Swamps, in which case the value of Karma is dependent on your ability to combine it with other card effects that you may or may not draw.) In either case, the card’s ability to do damage changes from turn to turn and game to game.

Or, think of healing effects in most games, which are completely worthless if you’re fully healed already, but which can make the difference between winning and losing if you’re fighting something that’s almost dead, and you’re almost dead, and you need to squeeze one more action out of the deal to kill it before it kills you.

In each of these cases, finding the right cost on your cost curve depends on the situation within the game, which is why I call it situational balancing. So it might be balanced, or underpowered or overpowered, all depending on the context. How do we balance something that has to have a fixed cost, even though it has a benefit that changes? The short answer is, we use probability to figure out the expected value of the thing, which is why I’ve spent two weeks building up to all of this. The long answer is… it’s complicated, which is why I’m devoting an entire tenth of this course to the subject.

Playtesting: The Ultimate Solution?

There are actually a lot of different methods of situational balancing. Unfortunately, since the answer to “how valuable is it?” is always “it depends!” the best way to approach this is thorough playtesting to figure out where various situations land on your cost curve. But as before, we don’t always have unlimited playtest budgets in the Real World, and even if we do have unlimited budgets we still have to start somewhere, so we at least need to make our best guess, and there are a few ways to do that. So “playtest, playtest, playtest” is good advice, and a simple answer, but not a complete answer.

A Simple Example: d20

Let’s start with a very simple situation. This is actually something I was asked once on a job interview (and yes, I got the job), so I know it must be useful for something.

What follows is a very, very oversimplified version of the d20 combat system, which was used in D&D 3.0 and up. Here’s how it works: each character has two stats, their Base Attack Bonus (or “BAB,” which defaults to 0) and their Armor Class (or “AC,” which defaults to 10). Each round, each character gets to make one attack against one opponent. To attack, they roll 1d20, add their BAB, and compare the total to the target’s AC. If the attacker’s total is greater or equal, they hit and do damage; otherwise, they miss and nothing further happens. So, by default with no bonuses, you should be hitting about 55% of the time.

Here’s the question: are BAB and AC balanced? That is, if I gave you an extra +1 to your attack, is that equivalent to +1 AC? Or is one of those more powerful than the other? If I were interviewing you for a job right now, what would you say? Think about it for a moment before reading on.

What’s the central resource?

Here’s my solution (yours may vary). First, I realized that I didn’t know how much damage you did, or how many hit points you had (that is, how many times could you survive being hit, and how many times would you have to hit something else to kill it). But assuming these are equal (or equivalent), it doesn’t actually matter. Whether you have to hit an enemy once or 5 times or 10 times to kill it, as long as you are equally vulnerable, you’re going to hit the enemy a certain percentage of the time. And they’re going to hit you a certain percentage of the time. What it comes down to is this: you want your hit percentage to be higher than theirs. Hit percentage is the central resource that everything has to be balanced against.

If both me and the enemy have a 5% chance of hitting each other, on average we’ll both hit each other very infrequently. If we both have a 95% chance of hitting each other, we’ll hit each other just about every turn. But either way, we’ll exchange blows about as often as not, so there’s no real advantage to one or the other.

Using the central resource to derive balance

So, are AC and BAB balanced? +1 BAB gives me +5% to my chance to hit, and +1 AC gives me -5% to my opponent’s chance to hit, so if I’m fighting against a single opponent on my own, one on one, the two are indeed equivalent. Either way, our relative hit percentages are changed by exactly the same amount. (One exception is if either hit percentage goes above 100% or below 0%, at which point extra plusses do nothing for you. This is probably why the default is +0 BAB, 10 AC, so that it would take a lot of bonuses and be exceedingly unlikely to ever reach that point. Let’s ignore this extreme for the time being.)

What if I’m not fighting one on one? What if my character is alone, and there are four enemies surrounding me? Now I only get to attack once for every four times the opponents attack, so +1 AC is much more powerful here, because I’m making a roll that involves my AC four times as often as I make a roll involving my BAB.

Or what if it’s the other way around, and I’m in a party of four adventurers ganging up on a lone giant? Here, assuming the giant can only attack one of us at a time, +1 BAB is more powerful because each of us is attacking every round, but only one of us is actually getting attacked.

In practice, in most D&D games, GMs are fond of putting their adventuring party in situations where they’re outnumbered; it feels more epic that way. (This is from my experience, at least.) This means that in everyday use, AC is more powerful than BAB; the two stats are not equivalent on the cost curve, even though the game behaves like they should be.

Now, as I said, this is an oversimplification; it does not reflect on the actual balance of D&D at all. But we can see something interesting even from this very simple system: the value of attacking is higher if you outnumber the opponent, and the value of defending is higher if you’re outnumbered. And if we attached numerical cost and benefit values to hit percentage, we could even calculate how much more powerful these values are, as a function of how much you outnumber or are outnumbered.

Implications for Game Design

If you’re designing a game where you know what the player will encounter ahead of time – say, an FPS or RPG with hand-designed levels – then you can use your knowledge of the upcoming challenges to balance your stats. In our simplified d20 system, for example, if you know that the player is mostly fighting combats where they’re outnumbered, you can change AC on your cost curve to be more valuable and thus more costly.

Another thing you can do, if you wanted AC and BAB to be equivalent and balanced with each other, is to change the mix of encounters in your game so that the player is outnumbered about half the time, and they outnumber the enemy about half the time. Aside from making your stats more balanced, this also adds some replay value to your game: going through the game with high BAB is going to give a very different experience than going through the game with a high AC; in each case, some encounters are going to be a lot harder than others, giving the player a different perspective and different level of challenge in each encounter.

The Cost of Switching

What if D&D worked in such a way that you could freely convert AC to BAB at the start of a combat, and vice versa? Now all of a sudden they are more or less equivalent to each other, and suddenly a +1 bonus to either one is much more powerful and versatile relative to any other bonuses in the rest of the game.

Okay, maybe you can’t actually do that in D&D, but there are plenty of games where you can swap out one situational thing for another. First-Person Shooters are a common example, where you might be carrying several weapons at a time: maybe a rocket launcher against big slow targets or clusters of enemies, a sniper rifle to use at a distance against single targets, and a knife for close-quarters combat. Each of these weapons is situationally useful some of the time, but as long as you can switch from one to another with minimal delay, it’s the sum of weapon capabilities that matters rather than individual weapon limitations.

That said, suppose we made the cost of switching weapons higher, maybe a 10-second delay to put one weapon in your pack and take out another (which when you think about it, seems a lot more realistic – I mean, seriously, if you’re carrying ten heavy firearms around with you and can switch without delay, where exactly are you carrying them all?). Now all of a sudden the limitations of individual weapons play a much greater role, and a single general-purpose weapon may end up becoming more powerful than a smorgasbord of situational weapons. But if instead you have instant real-time weapon change, a pile of weapons where each is the perfect tool for a single situation is much better than a single jack-of-all-trades, master-of-none weapon.

What’s the lesson here? We can mess with the situational balance of a game simply by modifying the cost of switching between different tools, weapons, stat distributions, or overall strategies.

That’s fine as a general theory, but how do the actual numbers work? Let’s see…

Example: Inability to Switch

Let’s take one extreme case, where you can’t switch strategies at all. An example might be an RPG where you’re only allowed to carry one weapon and equip one armor at a time, and whenever you acquire a new one it automatically gets rid of the old. Here, the calculation is pretty straightforward, because this is your only option, so we have to look at it across all situations. It’s a lot like an expected value calculation.

So, you ask: in what situations does this object have a greater or lesser value, and by how much? How often do you encounter those situations? Multiply and add it all together.

Here’s a simple, contrived example to illustrate the math: suppose you have a sword that does double damage against Dragons. Suppose 10% of the meaningful combats in your game are against Dragons. Let’s assume that in this game, damage has a linear relationship to your cost curve, so doubling the damage of something makes it exactly twice as good.

So, 90% of the time the sword is normal, 10% of the time it’s twice as good. 90%*1.0 + 10%*2.0 = 110% of the cost. So in this case, “double damage against dragons” is a +10% modifier to the base cost.

Here’s another example: you have a sword that is 1.5x as powerful as the other swords in its class, but it only does half damage against Trolls. And let’s further assume that half damage is actually a huge liability; it takes away your primary way to do damage, so you have to rely on other sources that are less efficient, and it greatly increases the chance you’re going to get yourself very killed if you run into a troll at a bad time. So in this case, let’s say that “half damage” actually makes the sword a net negative. But let’s also say that trolls are pretty rare, maybe only 5% of the encounters in the game are against trolls.

So if a typical sword at this level has a benefit of 100 (according to your existing cost curve), a 1.5x powerful sword would have a benefit of 150, and maybe a sword that doesn’t work actually has a cost of 250, because it’s just that deadly to get caught with your sword down, so to speak. The math says: 95%*150 + 5%*(-250) = 130. So this sword has a benefit of 130, or 30% more than a typical sword.

Again, you can see that there are actually a lot of ways you can change this, a lot of design “knobs” you can turn to mess with the balance here. You can obviously change the cost and benefit of an object, maybe adjusting the damage in those special situations to make it better or worse when you have those rare cases where it matters, or adjusting the base abilities to cover every other situation, just as you normally would with transitive mechanics. But with situational balance, you can also change the frequency of situations, say by increasing the number of trolls or dragons the player encounters – either in the entire game, or just in the area surrounding the place where they’d get that special sword. (After all, if the player is only expected to use this sword that loses to trolls in one region in the game that has no trolls, even if the rest of the game is covered in trolls, it’s not really much of a drawback, is it?)

Another Example: No-Cost Switching

Now let’s take the other extreme, where you can carry as many situational objects as you want and use or switch between them freely. In this case, the limitations don’t matter nearly as much as the strengths of each object, because there is no opportunity cost to gaining a new capability. In this case, we look at the benefits of all of the player’s objects collected so far, and figure out what this new object will add that can’t be done better by something else. Multiply the extra benefit by the percentage of time that the benefit is used, and there is your additional benefit from the new object. So it is a similar calculation, except in most cases we ignore the bad parts, because you can just switch away from those.

In practice, it’s usually not that simple. In a lot of games, the player may be able to use suboptimal strategies if they haven’t acquired exactly the right thing for this one situation (in fact, it’s probably better for most games to be designed that way). Also, the player may pick up new objects in a different order on each playthrough. End result: you don’t actually know how often something will be used, because it might be used more or less often depending on what other tools the player has already acquired, and also how likely they are to use this new toy in situations where it’s not perfect (they haven’t got the perfect toy for that situation yet) but it’s at least better than their other toys.

Let’s take an example. Maybe you have a variety of swords, each of which does major extra damage against a specific type of monster, a slight bump in damage against a second type of monster, and is completely ineffective against a third type of monster. Suppose there are ten of these swords, and ten monster types in your game, and the monster types are all about as powerful and encountered about as frequently. It doesn’t take a mathematician to guess that these swords should all cost the same.

However, we run into a problem. Playing through this game, we would quickly realize that they do not actually give equal value to the player at any given point in time.

For example, say I’ve purchased a sword that does double damage against Dragons and 1.5x damage against Trolls. Now there’s a sword out there that does double damage against Trolls, but that sword is no longer quite as useful to me as it used to be; I’m now going from a 1.5x multiplier to 2x, not 1x to 2x, so there’s less of a gain there. If I fully optimize, I can probably buy about half the swords in the game and have at least some kind of improved multiplier against most or all monsters, and from that point, extra swords have diminishing returns.

How do you balance a system like that? There are a few methods for this that I could think of, and probably a few more I couldn’t. It all depends on what’s right for your game.

  • Give a discount: One way is to actually change costs on the fly. Work it into your narrative that the more swords you buy from this merchant, the more he discounts additional swords because you’re such a good customer (you could even give the player a “customer loyalty card” in the game and have the merchant put stamps on it; some players love that kind of thing).
  • Let the player decide: Or, you could balance everything assuming the player has nothing, which means that yes, there will be a law of diminishing returns here, and it’s up to the player to decide how many is enough; consider that part of the strategy of the game.
  • Let the increasing money curve do the “discount” work for you: Maybe if the player is getting progressively more money over time, keeping the costs constant will itself be a “diminishing” cost to compensate, since each sword takes the player less time to earn enough to buy it. Tricky!
  • Discount swords found later in the game: Or, you can spread out the locations in the game where the player gets these swords, so that you know they’ll probably buy certain ones earlier in the game and other ones later. You can then cost them differently because you know that when the player finds certain swords, they’ll already have access to other ones, and you can reduce the costs of the newer ones accordingly.

Obviously, for games where you can switch between objects but there’s some cost to switching (a time cost, a money cost, or whatever), you’ll use a method that lies somewhere between the “can’t change at all” and “can change freely and instantly” extreme scenarios.

The Cost of Versatility

Now we’ve touched on this concept of versatility from a player perspective, if they are buying multiple items in the game that make their character more versatile and able to handle more situations. What about when the objects themselves are versatile? This happens a lot in real-time and turn-based strategy games, for example, where individual units may have several functions. So, maybe archers are really strong against fliers and really weak against footmen (a common RTS design pattern), but maybe you want to make a new unit type who are strong against both fliers and footmen, but not as strong as archers. So maybe an archer can take down a flier and take next to no damage, but this new unit would go down to about half HP in combat with a flier (it would win, but at a cost). This new unit isn’t as good against fliers, but it is good for other things, so they’re more versatile.

Taking another example, in a first-person shooter, knives and swords are usually the best weapons when you’re standing next to an opponent, while sniper rifles are great from a distance, but a machine gun is moderately useful at most ranges (but not quite as good as anything else). So you’ll never get caught with a totally ineffective weapon if you’ve got a machine gun, but you’ll also never have the perfect weapon for the job if you’re operating at far or close range a lot.

How much are these kinds of versatility worth?

Here’s the key: versatility has value in direct proportion to uncertainty. If you know ahead of time you’re playing on a small map with tight corridors and lots of twists and turns, knives are going to be a lot more useful than sniper rifles. On a map with large, open space, it’s the other way around. If you have a single map with some tight spaces and some open areas, a versatile weapon that can serve both roles (even if only mediocre) is much more valuable.

Suppose instead you have a random map, so there’s a 50/50 chance of getting either a map optimized for close quarters or a map optimized for distance attacks. Now what’s the best strategy? Taking the versatile weapon that’s mildly useful in each case but not as good as the best weapon means you’ll win against people who guessed poorly and lose against people who guessed well. There is no best strategy here; it’s a random guess. This kind of choice is actually not very interesting: the players must choose blindly ahead of time, and then most of the game comes down to who guessed right. Unless they’re given a mechanism for changing weapons during play in order to adjust to the map, or they can take multiple weapons with them, or something – ah, so we come back to the fact that versatility comes in two flavors:

  • The ability of an individual game object to be useful in multiple situations
  • The ability of the player to swap out one game object for another.

The more easily a player can exchange game objects, the less valuable versatility in a single object becomes.

Shadow Costs

Now, before we move on with some in-depth examples, I want to write a little bit about different kinds of costs that a game object can have. Strictly speaking, I should have talked about this when we were originally talking about cost curves, but in practice these seem to come up more often in situational balancing than other areas, so I’m bringing it up now.

Broadly speaking, we can split the cost of an object into two categories: the resource cost, and Everything Else. If you remember when doing cost curves, I generally said that any kind of drawback or limitation is also a cost, so that’s what I’m talking about here. Economists call these shadow costs, that is, they are a cost that’s hidden behind the dollar cost. If you buy a cheap clock radio for $10, there is an additional cost in time (and transportation) to go out and buy the thing, and if it doesn’t go off one morning when you really need it to because the UI is poorly designed and you set it for PM instead of AM then missing an appointment because of the poor design costs you additional time and money. If it then breaks in a few months because of its cheap components and you have to go replace or return it then that is an extra time cost, and so on… so it looks like it costs $10 but the actual cost is more because it has these shadow costs that a better-quality clock radio might not have.

In games, there are two kinds of shadow costs that seem to come up a lot in situational balance: sunk costs and opportunity costs. Let me explain each.

Sunk Costs

By sunk costs, I’m talking about some kind of setup cost that has to be paid first, before you gain access to the thing you want to buy in the first place. One place you commonly see these is in tech trees in RTSs, MMOs and RPGs. For example, in an RTS, in order to build certain kinds of units, you first typically need to build a structure that supports them. The structure may not do anything practical or useful for you, other than allowing you to build a special kind of unit. As an example, each Dragoon unit in StarCraft costs 125 minerals and 50 gas (that is its listed cost), but you had to build a Cybernetics Core to build Dragoons and that cost 200 minerals, and that cost is in addition to each Dragoon. Oh, and by the way, you can’t build a Cybernetics Core without also building a Gateway for 150 minerals, so that’s part of the cost as well. So if you build all these structures, use them for nothing else, and then create a single Dragoon, that one guy costs you a total of 475 minerals and 50 gas, which is a pretty huge cost compared to the listed cost of the unit itself!

Of course, if you build ten Dragoons, then the cost of each is reduced to 160 minerals and 50 gas each, a lot closer to the listed cost, because you only have to pay the build cost for those buildings once (well, under most cases anyway). And if you get additional benefits from those buildings, like them letting you build other kinds of units or structures or upgrades that you take advantage of, then effectively part of the cost of those buildings goes to other things so you can consider it to not even be part of the Dragoon’s cost.

But still, you can see that if you have to pay some kind of cost just for the privilege of paying an additional cost, you need to be careful to factor that into your analysis. When the cost may be “amortized” (spread out) over multiple purchases, the original sunk cost has to be balanced based on its expected value: how many Dragoons do you expect to build in typical play? When costing Dragoons, you need to factor in the up-front costs as well.

You can also look at this the other way, if you’re costing the prerequisite (such as those structures you had to build in order to buy Dragoon units): not just “what does this do for me now” but also “what kinds of options does this enable in the future”? You tend to see this a lot in tech trees. For example, in some RPGs or MMOs with tech trees, you might see some special abilities you can purchase on level-up that aren’t particularly useful on their own, maybe they’re even completely worthless… but they’re prerequisites for some really powerful abilities you can get later. This can lead to interesting kinds of short-term/long-term decisions, where you could take a powerful ability now, or a less powerful ability now to get a really powerful ability later.

You can see sunk costs in other kinds of games, too. I’ve seen some RPGs where the player has a choice between paying for consumable or reusable items. The consumables are much cheaper of course, but you only get to use them once. So for example, you can either buy a Potion for 50 Gold, or a Potion Making Machine for 500 Gold, and in that case you’d buy the machine if you expect to create more than ten Potions. Or you pay for a one-way ticket on a ferry for 10 Gold, or buy a lifetime pass for 50 Gold, and you have to ask yourself whether you expect to ride the ferry more than five times. Or you consider purchasing a Shop Discount Card which gives 10% off all future purchases, but it costs you 1000 Gold to purchase the discount in the first place, so you have to consider whether you’ll spend enough at that shop to make the discount card pay for itself (come to think of it, the choice to buy one of those discount cards at the real-world GameStop down the street requires a similar calculation). These kinds of choices aren’t always that interesting, because you’re basically asking the player to estimate how many times they’ll use something… but without telling them how much longer the game is or how many times they can expect to use the reusable thing, so it’s a kind of blind decision. Still, as designers, we know the answer, and we can do our own expected-value calculation and balance accordingly. If we do it right, our players will trust that the cost is relative to the value by the time they have to make the buy-or-not decision in our games.

Opportunity Costs

The second type of hidden cost, which I’m calling an opportunity cost here, is the cost of giving up something else, reducing your versatility. An example, also from games with tech trees, might be if you reach a point where you have to choose one branch of the tech tree, and if you take a certain feat or learn a certain tech or whatever, it prevents you from learning something else. If you learn Fire magic, you’re immediately locked out of all the Ice spells, and vice versa. This happens in questing systems, too: if you don’t blow up Megaton, you don’t get the Tenpenny Tower quest. These can even happen in tabletop games: one CCG I worked on had mostly neutral cards, but a few that were “good guy” cards and a few that were “bad guy” cards, and if you played any “good guy” cards it prevented you from playing “bad guy” cards for the rest of the game (and vice versa), so any given deck basically had to only use good or bad but not both. Basically, any situation where taking an action in the game prevents you from taking certain other actions later on, is an opportunity cost.

In this case, your action has a special kind of shadow cost: in addition to the cost of taking the action right now, you also pay a cost later in decreased versatility (not just resources). It adds a constraint to the player. How much is that constraint worth as a cost? Well, that’s up to you to figure out for your particular game situation. But remember that it’s not zero, and be sure to factor this into your cost curve analysis.

Versatility Example

How do the numbers for versatility actually work in practice? That depends on the nature of the versatility and the cost and difficulty of switching.

Here’s a contrived example: you’re going into a PvP arena, and you know that your next opponent either has an Ice attack or a Fire attack, but never both and never neither – always one or the other. You can buy a Protection From Ice enchantment which gives you protection from Ice attacks, or a Protection From Fire enchantment which gives you protection from Fire attacks (or both, if you want to be sure, although that’s kind of expensive). Let’s say both enchantments cost 10 Gold each.

Now, suppose we offer a new item, Protection From Elements, which gives you both enchantments as a package deal. How much should it cost? (“It depends!”) Okay, what does it depend on?

If you’ve been paying attention, you know the answer: it depends on how much you know about your next opponent up front, and it depends on the cost of switching from one to the other if you change your mind later.

If you know ahead of time that they will be, say, a Fire attack, then the package should cost the same as Protection from Fire: 10 Gold. The “versatility” here offers no added value, because you already know the optimal choice.

If you have no way of knowing your next opponent’s attack type until it’s too late to do anything about it, and you can’t switch protections once you enter the arena, then Protection From Elements should cost 20 Gold, the same as buying both. Here, versatility offers you exactly the same added value as just buying both things individually. There’s no in-game difference between buying them separately or together.

Ah, but what if you have the option to buy one before the combat, and then if the combat starts and you realize you guessed wrong, you can immediately call a time-out and buy the other one? In this case, you would normally spend 10 Gold right away, and there’s a 50% chance you’ll guess right and only have to spend 10 Gold, and a 50% chance you’ll guess wrong and have to spend an additional 10 Gold (or 20 Gold total) to buy the other one. The expected value here is (50%*10) + (50%*20) = 15 Gold, so that is what the combined package should cost in this case.

What if the game is partly predictable? Say you may have some idea of whether your opponent will use Fire or Ice attacks, but you’re not completely sure. Then the optimal cost for the package will be somewhere between the extremes, depending on exactly how sure you are.

Okay, so that last one sounds kind of strange as a design. What might be a situation in a real game where you have some idea but not a complete idea of what your opponent is bringing against you? As one example, in an RTS, I might see some parts of the army my opponent is fielding against me, so that gives me a partial (but not complete) sense of what I’m up against, and I can choose what units to build accordingly. Here, a unit that is versatile offers some value (my opponent might have some tricks up their sleeve that I don’t know about yet) but not complete value (I do know SOME of what the opponent has, so there’s also value in building troops that are strong against their existing army).

Case Studies in Situational Balance

So, with all of that said, let’s look at some common case studies.

Single-target versus area-effect (AoE)

For things that do damage to multiple targets instead of just one at a time, other things being equal, how much of a benefit is that splash damage?

The answer is generally, take the expected number of things you’ll hit, and multiply. So, if enemies come in clusters from 1 to 3 in the game, evenly distributed, then on average you’ll hit 2 enemies per attack, doing twice the damage you would normally, so splash damage is twice the benefit.

A word of warning: “other things being equal” is really tricky here, because generally other things aren’t equal in this case. For example, in most games, enemies don’t lose offensive capability until they’re completely defeated, so just doing partial damage isn’t as important as doing lethal damage. In this case, spreading out the damage slowly and evenly can be less efficient than using single-target high-power shots to selectively take out one enemy at a time, since the latter reduces the offensive power of the enemy force with each shot, while the area-effect attack doesn’t do that for awhile. Also, if the enemies you’re shooting at have varying amounts of HP, an area-effect attack might kill some of them off but not all, reducing a cluster of enemies to a few corpses and a smaller group (or lone enemy), which then reduces the total damage output of your subsequent AoE attacks – that is, AoE actually makes itself weaker over time as it starts working! So this is something you have to be careful of as well: looking at typical encounters, how often enemies will be clustered together, and also how long they’ll stay that way throughout the encounter.

Attacks that are strong (or weak) against a specific enemy type

We did an example of this before, with dragons and trolls. Multiply the extra benefit (or liability) as if it were always in effect during all encounters, by the expected percentage of the time it actually will matter (that is, how often do you encounter the relevant type of enemy).

The trick here, as we saw in that earlier example, is you have to be very careful of what the extra benefit or liability is really worth, because something like “double damage” or “half damage” is rarely double or half the actual value.

Metagame objects that you can choose to use or ignore

Sometimes you have an object that’s sometimes useful and sometimes not, but it’s at the player’s discretion whether to use it – so if the situation doesn’t call for it, they simply don’t spend the resources.

Examples are situational weapons in an FPS that can be carried as an “alternate” weapon, specialized units in an RTS that can be build when needed and ignored otherwise, or situational cards in a CCG that can be “sideboarded” against relevant opponents. Note that in these cases, they are objects that depend on things outside of player control: what random map you’re playing on, what units your opponent is building, what cards are in your opponent’s deck.

In these cases, it’s tempting to cost them according to the likelihood that they will be useful. For example, if I have a card that does 10 damage against a player who is playing Red in Magic, and I know that most decks are 2 or 3 colors so maybe 40-50% of the time I’ll play against Red in open play, then we would cost this the same as a card that did 4 or 5 damage against everyone. If the player must choose to use it or not before play begins, with no knowledge of whether the opponent is playing Red or not, this would be a good method.

But in some of these cases, you do know what your opponent is doing. In a Magic tournament, after playing the first game to best-of-3, you are allowed to swap some cards into your deck from a “Sideboard”. You could put this 10-damage-to-Red card aside, not play with it your first game, and then bring it out on subsequent games only if your opponent is playing Red. Played this way, you are virtually assured that the card will work 100% of the time; the only cost to you is a discretionary card slot in your sideboard, which is a metagame cost. As we learned a few weeks ago, trying to cost something in the game based on the metagame is really tricky. So the best we can say is that it should cost a little less to compensate for the metagame cost, but it probably shouldn’t be half off like it would be if the player always had to use it… unless we want to really encourage its use as a sideboard card by intentionally undercosting it.

Likewise with a specialized RTS unit, assuming it costs you nothing to earn the capability of building it. If it’s useless most of the time, you lose nothing by simply not exercising your option to build it. But when it is useful, you will build it, and you’ll know that it is useful in that case. So again, it should be costed with the assumption that whatever situation it’s built for, actually happens 100% of the time. (If you must pay extra for the versatility of being able to build the situational unit in the first place, that cost is what you’d want to adjust based on a realistic percentage of the time that such a situation is encountered in real play.)

With an alternate weapon in an FPS, a lot depends on exactly how the game is structured. If the weapons are all free (no “resource cost”) but you can only select one main and one alternate, then you need to make sure the alternates are balanced against each other, i.e. that each one is useful in equally likely situations, or at least that if you multiply the situational benefit by the expected probability of receiving that benefit, that should be the same across all weapons (so you might have a weapon that’s the most powerful in the game but only in a really rare situation, versus a weapon that’s mediocre but can be used just about anywhere, and you can call those balanced if the numbers come out right).

Metagame “combos”

Now, we just talked about situations where the player has no control. But what if they do have control… that is, if something isn’t particularly useful on its own, but it forms a powerful combo with something else? An example would be dual-wielding in an FPS, “support” character classes in an MMO or multiplayer FPS, situational cards that you build your deck around in a CCG, support towers in a Tower Defense game that only improve the towers next to them, and so on. These are situational in a different way: they reward the player for playing the metagame in a certain way.

To understand how to balance these, we first have to return to the concept of opportunity costs from earlier. In this case, we have a metagame opportunity cost: you have to take some other action in the game completely apart from the thing we’re trying to balance, in order to make that thing useful. There are a few ways we could go about balancing things like this, depending on the situation.

One is to take the combo in aggregate and balance that, then try to divide that up among the individual components based on how useful they are outside of the combo. For example, Magic had two cards, Lich and Mirror Universe:

  • Lich reduced you to zero life points, but added additional rules that effectively turned your cards into your life – this card on its own was incredibly risky, because if it ever left play you would still have zero life, and thus lose the game immediately! Even without that risk, it was of questionable value, because it basically just helped you out if you were losing, and cards that are the most useful when you’re losing mean that you’re playing to lose, which isn’t generally a winning strategy.
  • Mirror Universe was a card that would swap life totals with your opponent – not as risky as Lich since you’re in control of when to use it, but still only useful when you’re losing and not particularly easy to use effectively.
  • But combined… the two cards, if uncountered, immediately win you the game by reducing your life to zero and then swapping totals with your opponent: an instant win!

How do you cost this? Now, this is a pretty extreme example, where two cards are individually all but useless, don’t really work that well in any other context, but combined with each other are all-powerful. The best answer for a situation like this might be to err on the side of making their combined cost equal to a similarly powerful game-winning effect, perhaps marked down slightly because it requires a two-card combination (which is harder to draw than just playing a single card). How do you split the cost between them – should one be cheap and the other expensive, or should they both be about the same? Weight them according to their relative usefulness. Lich does provide some other benefits (like drawing cards as an effect when you gain life) but with a pretty nasty drawback. Mirror Universe has no drawback, and a kind of psychological benefit that your opponent might hold off attacking you because they don’t want to almost kill you, then have you use it and plink them to death. These are hard to balance against one another directly, but looking at what actually happened in the game, their costs are comparable.

How about a slightly less extreme example? A support character class in an MMO can offer lots of healing and attribute bonuses that help the rest of their team. On their own they do have some non-zero value (they can always attack an enemy directly if they have to, if they can heal and buff themselves they might even be reasonably good at it, and in any case they’re still a warm body that can distract enemies by giving them something else to shoot at). But their true value shows up in a group, where they can take the best members of a group and make them better. How do you balance something like this?

Let’s take a simple example. Suppose your support character has a special ability that increases a single ally’s attack value by 10%, and that they can only have one of these active at a time, and this is part of their tech tree; you want to find the expected benefit of that ability so you can come up with an appropriate cost. To figure out what that’s worth, we might assume a group of adventurers of similar level, and find the character class in that group with the highest attack value, and find our expected attack value for that character class. In a group, this “attack buff” support ability would be worth about 10% of that value. Obviously it would be less useful if questing solo, or with a group that doesn’t have any good attackers, so you’d have to figure the percentage of time that you can expect this kind of support character to be traveling with a group where this attack boost is useful, and factor that into your numbers. In this case, the opportunity cost for including an attacker in your party is pretty low (most groups will have at least one of those anyway), so this support ability is almost always going to be operating at its highest level of effectiveness, and you can balance it accordingly.

What do the Lich/Mirror Universe example and the support class example have in common? When dealing with situational effects that players have control over, a rule of thumb is to figure out the opportunity costs for them setting up that situation, and factoring that in as a “cost” to counteract the added situational benefit. Beyond that, the cost should be computed under best case situations, not some kind of “average” case: if the players are in control of whether they use each part of the combo, we can assume they’re going to use it under optimal conditions.

Multi-class characters

As long as we’re on the subject of character classes, how about “multi-class” characters that are found in many tabletop RPGs? The common pattern is that you gain versatility, in that you have access to the unique specialties of several character types… but in exchange for that, you tend to be lower level and less powerful in all of those types than if you were dedicated to a single class. How much less powerful do you have to be so that multi-classing feels like a viable choice (not too weak), but not one that’s so overpowered that there’s no reason to single-class?

This is a versatility problem. The player typically doesn’t know what kinds of situations their character will be in ahead of time, so they’re trying to prepare for a little of everything. After all, if they knew exactly what to expect, they would pick and choose the most effective single character class and ignore the other! However, they do probably have some basic idea of what they’re going to encounter, or at least what capabilities their party is going to need that are not yet accounted for, so a Level 5 Fighter/Thief is probably not as good as a Level 10 Fighter or Level 10 Thief. Since the player must choose ahead of time what they want and they typically can’t change their class in the middle of a quest, they are more constrained, so you’ll probably do well with a starting guess of making a single class 1.5x as powerful as multi-class and then adjusting downward from there as needed. That is, a Level 10 single-class is usually about as powerful as a Level 7 or 8 dual-class.

Either-or choices from a single game object

Sometimes you have a single object that can do one thing or another, player’s choice, but not both (the object is typically either used up or irreversibly converted as part of the choice). Maybe you have a card in a CCG that can bring a creature into play or make an existing one bigger. Or you have a lump of metal in an RPG that can be fashioned into a great suit of armor or a powerful weapon. Or you’re given a choice to upgrade one of your weapons in an FPS. In these kinds of cases, assuming the player knows the value of the things they’ll get (but they can only choose one), the actual benefit is probably going to be more than either option individually, but less than all choices combined, depending on the situation. What does it depend on? This is a versatility problem, so it depends on the raw benefit of each choice, the cost/difficulty of changing their strategy in mid-game, and the foreknowledge of the player regarding the challenges that are coming up later.

The Difference Between PvE and PvP

Designing PvE games (“Player versus Environment,” where it’s one or more players cooperating against the computer, the system, the AI or whatever) is different than PvP games (“Player versus Player,” where players are in direct conflict with each other) when it comes to situational balance.

PvE games are much easier. As the game’s designer, you’re designing the environment, you’re designing the levels, you’re designing the AI. You already know what is “typical” or “expected” in terms of player encounters. Even in games with procedurally-generated content where you don’t know exactly what the player will encounter, you know the algorithms that generate it (you designed them, after all) so you can figure out the expected probability that the content generator will spit out certain kinds of encounters, and within what range.

Because of this, you can do expected-value calculations for PvE games pretty easily to come up with at least a good initial guess for your costs and benefits when you’re dealing with the situational parts of your game.

PvP is a little trickier, because players can vary their strategies. “Expected value” doesn’t really have meaning when you don’t know what to expect from your opponent. In these cases, playtesting and metrics are the best methods we have for determining typical use, and that’s something we’ll discuss in more detail over the next couple of weeks.

If You’re Working on a Game Now…

Choose one object in your game that’s been giving you trouble, something that seems like it’s always either too good or too weak, and which has some kind of conditional or situational nature to it. (Since situational effects are some of the trickiest to balance, if something has been giving you trouble, it’s probably in that category anyway.)

First, do a thorough search for any shadow costs you may have. What opportunities or versatility do you have to give up in order to gain this object’s capabilities? What other things do you have to acquire first before you even have the option of acquiring this object? Ask yourself what those additional costs are worth, and whether they are factored in to the object’s resource cost.

Next, consider the versatility of the object itself. Is it something that’s useful in a wide variety of situations, or only rarely? How much control does the player have over their situation – that is, if the object is only useful in certain situations, can the player do anything to make those situations more likely, thus increasing the object’s expected value?

How easy is it for the player to change their mind (the versatility of the player versus versatility of the object, since a more versatile player reduces the value of object-based versatility) – if the player takes this object but then wants to replace it with something else, or use other objects or strategies when they need to, is that even possible… and if so, is it easy, or is there a noticeable cost to doing so? How much of a liability is it if the player is stuck in a situation where the object isn’t useful? Now, consider how the versatility of the game’s systems and the versatility of the individual objects should affect their benefits and costs.

See if looking at that object in a new way has helped to explain why it felt too weak or too powerful. Does this give you more insight into other objects as well, or the game’s systems overall?

Homework

For your “homework”, we’re going to look at Desktop Tower Defense 1.5, which was one of the games that popularized the genre of tower defense games. (I’ll suggest you don’t actually play it unless you absolutely have to, because it is obnoxiously addicting and you can lose a lot of otherwise productive time just playing around with the thing.)

DTD 1.5 is a great game for analysis of situational game balance, because nearly everything in the game is situational! You buy a tower and place it down on the map somewhere, and when enemies come into range the tower shoots at them. Buying or upgrading towers costs money, and you get money from killing the enemies with your towers. Since you have a limited amount of money at any time in the game, your goal is to maximize the total damage output of your towers per dollar spent, so from the player’s perspective this is an efficiency problem.

The situational nature of DTD

So, on the surface, all you have to do is figure out how much damage a single tower will do, divide by cost, and take the tower with the best damage-to-cost ratio. Simple, right?

Except that actually figuring out how much damage your towers do is completely situational! Each tower has a range; how long enemies stay within that range getting shot at depends entirely on where you’ve placed your towers. If you just place a tower in the middle of a bunch of open space, the enemies will walk right by it and not be in danger for long; if you build a huge maze that routes everyone back and forth in range of the tower in question, its total damage will be a lot higher.

Furthermore, most towers can only shoot one enemy at a time, so if a cluster of enemies walks by, its total damage per enemy is a lot smaller (one enemy gets shot, the others don’t). Other towers do area-effect or “splash” damage, which is great on clusters but pretty inefficient against individual enemies, particularly those that are spaced out because they move fast. One of the tower types doesn’t do much damage at all, but slows down enemies that it shoots, which keep them in range of other towers for longer, so the benefit depends on what else is out there shooting. Some towers only work against certain types of enemies, or don’t work against certain enemy types, so there are some waves where some of your towers are totally useless to you even if they have a higher-than-normal damage output at other times. And then there’s one tower that does absolutely nothing on its own, but boosts the damage output of all adjacent towers… so this has a variable cost-to-benefit ratio depending on what other towers you place around it. Even more interesting, placing towers in a giant block (to maximize the effectiveness of this boost tower) has a hidden cost itself, in that it’s slightly less efficient in terms of usage of space on the board, since there’s this big obstacle that the enemies get to walk around rather than just having them march through a longer maze. So, trying to balance a game like this is really tough, because everything depends on everything else!

Your mission, should you choose to accept it…

Since this is a surprisingly deep game to analyze, I’m going to constrain this to one very small part of the game. In particular, I want you to consider two towers: the Swarm tower (which only works against flying enemies but does a lot of damage to them) and the Boost tower (that’s the one that increases the damage of the towers around it). Now, the prime spot to put these is right in the center of the map, in this little 4×3 rectangular block. Let’s assume you’ve decided to dedicate that twelve-tower area to only Swarm and Boost towers, in order to totally destroy the flying enemies that come your way. Assuming that you’re trying to minimize cost and maximize damage, what’s the optimal placement of these towers?

To give you some numbers, a fully-upgraded Swarm tower does a base of 480 damage per hit, and costs $640 in the game. A fully-upgraded Boost tower costs $500 and does no damage, but improves all adjacent towers (either touching at a side or a corner) by +50%, so in practical terms a Boost tower does 240 damage for each adjacent Swarm tower. Note that two Boost towers adjacent to each other do absolutely nothing for each other – they increase each other’s damage of zero by +50%, which is still zero.

Assume all towers will be fully upgraded; the most expensive versions of each tower have the most efficient damage-to-cost ratios.

The most certain way to solve this, if you know any scripting or programming, is to write a brute-force program that runs through all 3^12 possibilities (no tower, Swarm tower or Boost tower in each of the twelve slots). For each slot, count a damage of 480 if a Swarm tower, 240*(number of adjacent Swarm towers) for a Boost tower, or 0 for an empty slot; for cost, count 640 per Swarm tower, 500 for each Boost tower, and 0 for an empty slot. Add up the total damage and cost for each scenario, and keep track of the best damage-to-cost ratio (that is, divide total damage by total cost, and try to get that as high as possible).

If you don’t have the time or skills to write a brute-force program, an alternative is to create an Excel spreadsheet that calculates the damage and cost for a single scenario. Create a 4×3 block of cells that are either “B” (boost tower), “S” (swarm tower), or blank.

Below that block, create a second block of cells to compute the individual costs of each cell. The formula might be something like:

=IF(B2=”S”,640,IF(B2=”B”,500,0))

Lastly, create a third block of cells to compute the damage of each cell:

=IF(B2=”S”,480,IF(B2=”B”,IF(A1=”S”,240,0)+IF(A2=”S”,240,0)+IF(A3=”S”,240,0)+IF(B1=”S”,240,0)+IF(B3=”S”,240,0)+IF(C1=”S”,240,0)+IF(C2=”S”,240,0)+IF(C3=”S”,240,0),0))

Then take the sum of all the damage cells, and divide by the sum of all the cost cells. Display that in a cell of its own. From there, all you need to do is play around with the original cells, changing them by hand from S to B and back again to try to optimize that one final damage-to-cost value.

The final deliverable

Once you’ve determined what you think is the optimal damage-to-cost configuration of Swarm and Boost towers, figure out the actual cost and benefit from the Swarm towers only, and the cost and benefit contributed by the Boost towers. Assuming optimal play, and assuming only this one very limited situation, which one is more powerful – that is, on a dollars-for-damage basis, which of the two types of tower (Swarm or Boost) contributes more to your victory for each dollar spent?

That’s all you have to do, but if you want more, you can then take it to any level of analysis you want – as I said, this game is full of situational things to balance. Flying enemies only come every seventh round, so if you want to compute the actual damage efficiency of our Swarm/Boost complex, you’d have to divide by 7. Then, compare with other types of towers and figure out if some combination of ground towers (for the 6 out of 7 non-flying levels) and the anti-flying towers should give you better overall results than using towers that can attack both ground and air. And then, of course, you can test out your theories in the game itself, if you have the time. I look forward to seeing some of your names in the all-time high score list.

Level 5: Probability and Randomness Gone Horribly Wrong

August 4, 2010

Readings/Playings

None this week (other than this blog post).

Answers to Last Week’s Questions

If you want to check your answers from last week:

Dragon Die

First off, note that the so-called “Dragon Die” is really just a “1d6+1” in disguise. If you think of the Dragon as a “7” since it always wins, the faces are 2-3-4-5-6-7, so really this is just asking how a +1 bonus to a 1d6 die roll affects your chance of rolling higher. It turns out the answer is: a lot more than most people think!

If you write out all 36 possibilities of 2-7 versus 1-6, you find there are 21 ways to lose to the House (7-1, 7-2, 7-3, 7-4, 7-5, 7-6, 6-1, 6-2, 6-3, 6-4, 6-5, 5-1, 5-2, 5-3, 5-4, 4-1, 4-2, 4-3, 3-1, 3-2, 2-1), 5 ways to draw (6-6, 5-5, 4-4, 3-3, 2-2), and 10 ways to win (5-6, 4-5, 4-6, 3-4, 3-5, 3-6, 2-3, 2-4, 2-5, 2-6). We ignore draws, since a draw is just a re-roll, which we would keep repeating until we got a win or loss event, so only 31 results end in some kind of resolution. Of those, 21 are a loss and 10 are a win, so this game only gives a 10/31 chance of winning. In other words, you win slightly less than 1 time in 3.

Chuck-a-luck

Since there are 216 ways to get a result on 3d6, it is easier to do this in Excel than by hand. If you count it up, though, you will find that if you choose the number 1 to pay out, there are 75 ways to roll a single win (1-X-X where “X” is any of the five losing results, so there are 25 ways to do this; then X-1-X and X-X-1 for wins with the other two dice, counting up to 75 total). There are likewise 15 ways to roll a double win (1-1-X, 1-X-1, X-1-1, five each of three different ways = 15), and only one way to roll a triple win (1-1-1). Since all numbers are equally likely from 1 to 6, the odds are the same no matter which of the six numbers you choose to pay out.

To get expected value, we multiply each of the 216 results by its likelihood; since all 216 die roll results are equally likely, we simply add all results together and divide by 216 to get the expected win or loss percentage.

(75 ways to win * $1 winnings) + (15 ways for double win * $2) + (1 way for triple win * $3) = $108 in winnings. Since we play 216 times and 108 is half of 216, at first this appears to be even-odds.

Not so fast! We still must count up all the results when we lose, and we lose more than 108 times. Out of the 216 ways to roll the dice, there are only 91 ways to win, but 125 ways to lose (the reason for the difference is that while doubles and triples are more valuable when they come up on your number, there are a lot more ways to roll doubles and triples on a number that isn’t yours). Each of those 125 losses nets you a $1 loss.

Adding everything up, we get an expected value of negative $17 out of 216 plays, or an expected loss of about 7.9 cents each time you play this game with a $1 bet. That may not sound like much (7.9 cents is literally pocket change), but keep in mind this is per dollar, so the House advantage of this game is actually 7.9%, one of the worst odds in the entire casino!

Royal Flush

In this game, you’re drawing 5 cards from a 52-card deck, sequentially. The cards must be 10-J-Q-K-A but they can be in any order. The first card can be any of those 5 cards of any suit, so there are 20 cards total on your first draw that make you potentially eligible for a royal flush (20/52). For the second card, it must match the suit of the first card you drew, so there are only 4 cards out of the remaining 51 in the deck that you can draw (4/51). For the third card, there are only 3 cards out of the remaining 50 that you need for that royal flush (3/50). The fourth card has 2 cards out of 49, and the final card must be 1 card out of 48. Multiplying all of these together, we get 480 / 311,875,200, or 1 in 649,740. If you want a decimal, this divides to 0.0000015 (or if you multiply by 100 to get a percentage, 0.00015%, a little more than one ten-thousandth of one percent). For most of us, this means seeing a “natural” Royal Flush from 5 cards is a once in a lifetime event, if that.

IMF Lottery

If you check the comments from last week, several folks found solutions for this that did not require a Monte Carlo simulation. The answer is 45 resources, i.e. the card will stay in play for an average of 10 turns. Since it has a 10% chance of leaving play each turn, this actually ends up being rather intuitive… but, as we’ve seen from probability, most things are not intuitive. So the fact that this one problem ends up with an answer that’s intuitive, is itself counterintuitive.

This Week’s Topic

Last week, I took a brief time-out from the rest of the class where we’ve been talking about how to balance a game, to draw a baseline for how much probability theory I think every designer needs to know, just so we can compute some basic odds. This week I’m going to blow that all up by showing you two places where the true odds can go horribly wrong: human psychology, and computers.

Human Psychology

When I say psychology, this is something I touched on briefly last week: most people are really terrible at having an intuition for true odds. So even if we actually make the random elements of our games perfectly fair, which as we’ll see is not always trivial, an awful lot of players will perceive the game as being unfair. Therefore, as game designers, we must be careful to understand not just true probability, but also how players perceive the probability in our games and how that differs from reality, so that we can take that into account when designing the play we want them to experience.

Computers

Most computers are deterministic machines; they are just ones and zeros, following deterministic algorithms to convert some ones and zeros into other ones and zeros. Yet somehow, we must get a nondeterministic value (a “random number”) from a deterministic system. This is done through some mathematical sleight-of-hand to produce what we call pseudorandom numbers: numbers that sorta kinda look random, even though in reality they aren’t. Understanding the difference between random and pseudorandom has important implications for video game designers, and even board game designers if they ever plan to make a computer version of their game (which happens often with “hit” board games), or if they plan to include electronic components in their board games that have any kind of randomness.

But First… Luck Versus Skill

Before we get into psychology and computers, there’s this implicit assumption that we’ve mostly ignored for the past week, that’s worth discussing and challenging. The assumption is that adding some randomness to a game can be good, but too much randomness is a bad thing… and maybe we have some sense that all games fall along some kind of continuum between two extremes of “100% skill-based” (like Chess or Go) and “100% luck-based” (like Chutes & Ladders or Candyland).

If these are the only games we look at, we might go so far as to think of a corresponding split between casual and hardcore: the more luck in a game, the more casual the audience; the more a game’s outcome relies on skill, the more we think of the game as hardcore.

This is not always the case, however. For example, Tic-Tac-Toe has no randomness at all, but we don’t normally think of it as a game that requires a lot of skill. Meanwhile, each hand of Poker is highly random, but we still think of it as a game where skill dominates. And yet, the game of Blackjack is also random, but aside from counting cards, we see that more as a game of chance than Poker.

Then we get into physical contests like professional sports. On the one hand, we see these as games of skill. Yet, enthusiasts track all kinds of statistics on players and games, we talk about percentage chances of a player making a goal or missing a shot or whatever, sports gamblers make cash bets on the outcomes of games, as if these were not games of skill but games of chance.

What’s going on here? There are a few explanations.

Poker vs. Blackjack

Why the difference in how we perceive Poker and Blackjack? The difference is in when the player makes their bet, and what kind of influence the player’s choices have on the outcome of the game.

In Poker, a successful player computes the odds to come up with a probability calculation that they have the winning hand, and they factor that into their bet along with their perceived reactions of their opponents. As more cards are revealed, the player adjusts their strategy. The player’s understanding of the odds and their ability to react to changes has a direct relation to their performance in the game.

In Blackjack, by contrast, you place a bet at the beginning of the hand before you know what cards you’re dealt, and you generally don’t have the option to “raise” or “fold” as you see more cards revealed.

Blackjack does have some skill, but it’s a very different kind of skill than Poker. Knowing when to hit or stand or split or double down based on your total, the dealer’s showing card, and (if you’re counting cards) the remaining percentage of high cards in the deck… these things are skill in the same sense as skill at Pac-Man. You are memorizing and following a deterministic pattern, but you are not making any particularly interesting decisions. You simply place your bet according to an algorithm, and expect that over time you do as well as you can, given the shuffle of the cards. It’s the same reason we don’t think of the casino as having a lot of “skill” at Craps or Roulette, just because it wins more than it loses.

Professional Sports

What about the sports problem, where a clearly skill-based game seems like it involves random die-rolls? The reason for the paradox is that it all depends on your frame of reference. If you are a spectator, by definition you have no control over the outcome of a game; as far as you’re concerned the outcome is a random event. If you are actually a player on a sports team, the game is won or lost partly by your level of skill. This is why on the one hand sports athletes get paid based on how much they win (it’s not a gamble for them), but sports fans can still gamble on the random (from their perspective) outcome of the game.

Action Games

Randomness works a little differently in action-based video games (like typical First-Person Shooter games), where the players are using skill in their movement and aiming to shoot their opponents and avoid getting shot. We think of these as skill-based games, and in fact these games seem largely incompatible with randomness. There is enough chaos in the system without random die-rolls: if I’m shooting at a moving target, I’ll either hit or miss, based on a number of factors that are difficult to have complete control of. Now, suppose instead the designer thought they’d be clever and add a small perturbation on bullet fire from a cheap pistol, to make it less accurate. You line up someone in your sights, pull the trigger… and miss, because it decided to randomly make the bullet fly too far to the left. How would the player react to that?

Well, they might not notice. The game is going so fast, you’re running, they’re running, you squeeze off a few shots, you miss, you figure you must have just not been as accurate as you thought (or they dodged well). Or if they’re standing still, you sneak up behind them, and you’re sure you have the perfect shot and still miss, you’ll feel like the game just robbed you; that’s not fun and it doesn’t make the game more interesting, it just makes you feel like you’re being punished arbitrarily for being a good shot.

Does that mean luck plays no role in action games? I think you can increase the luck factor to even the playing field, but you have to be very careful about how you do it. Here’s a common example of how a lot of FPSs increase the amount of luck in the game: headshots. The idea here is that if you shoot someone in the head rather than the rest of the body, it’s an instant kill, or something like that.

Now, you might be thinking… wait, isn’t that a skill thing? You’re being rewarded for accuracy, hitting a small target and getting bonus damage because you’re just that good… right? In some cases that is true, depending on the game, but in a lot of games (especially older ones) that kind of accuracy just really isn’t possible in most situations; you’re moving, they’re moving, the head has a tiny hit box, the gun doesn’t let you zoom in enough at a distance to be really sure if you aren’t off by one or two pixels in any direction… so from a distance, at least, a head shot isn’t something that most players can plan on. Sometimes it’ll happen anyway, just by accident if you’re shooting in the right general direction… so sometimes, through no fault of the player, they’ll get a headshot. This evens the playing field slightly; without headshots, if it takes many shots to score a frag, the more skilled players will almost always win because they are better at dodging, circle-strafing, and any other number of techniques that allow them to outmaneuver and outplay a weaker player. With headshots, the weaker player will occasionally just get an automatic kill by accident, so it makes it more likely that the weaker player will see some successes, which is usually what you want as a designer.

Shifting Between Luck and Skill

As we just saw, with action games, adding more of a luck-factor to the game is tricky but still possible, by creating these unlikely (but still possible to perform by accident) events.

With slower, more strategic games, adding luck (or skill) is more straightforward. To increase the level of luck in the game:

  • Change some player decisions into random events.
  • Reduce the number of random events in the game (this way the Law of Large Numbers doesn’t kick in as much, thus the randomness is less likely to be evenly distributed).
  • Increase the impact that random events have on the game state.
  • Increase the range of randomness, such as changing a 1d6 roll to a 1d20 roll.
  • If you want to increase the level of skill in the game, do the reverse of any or all of the above.

What is the “best” mix of luck and skill for any given game? That depends mostly on your target audience. Very young children may not be able to handle lots of choices and risk/reward or short/long term tradeoffs, but they can handle throwing a die or spinning a spinner and following directions. Competitive adults and core gamers often prefer games that skew more to the skill side of things so they can exercise their dominance, but (except in extreme cases) not so much that they feel like they can’t ever win against a stronger opponent. Casual gamers may see the game as a social experience more than a chance to exercise the strategic parts of their brains, so they actually prefer to think less and make fewer decisions so they can devote more of their limited brainpower to chatting with their friends. There is no single right answer here for all games; recognize that a certain mix of luck and skill is best for your individual game, and part of your job as a game designer is to listen to your game to find out where it needs to be. Sometimes that means adding more randomness, sometimes it means removing it, and sometimes it means keeping it there but changing the nature of the randomness. The tools we gained from last week should give you enough skills to be able to assess the nature of randomness in your game, at least in part, and make appropriate changes.

And Now… Human Psychology

As we saw last week, human intuition is generally terrible when it comes to odds estimation. You might have noticed this with Dragon Die or Chuck-a-Luck; intuitively, both games seem to give better odds of winning than they actually do. Many people also have a flawed understanding of how probability works, as we saw last week with the gambler’s fallacy (expecting that previous independent events like die-rolls have the power to influence future ones). Let’s dig into these errors of thought, and their implications to gamers and game designers.

Selection Bias

When asked to do an intuitive odds estimation, where does our intuition come from? The first heuristic most people use is to check their memory recall: how easy is it to recall different events? The easier it is to remember many examples, the more we assume that event is likely or probable. This usually gives pretty good results; if you’re rolling a weighted die a few hundred times and seem to remember the number 4 coming up more often, you’ll probably have a decent intuition for how often it actually comes up. As you might guess, this kind of intuition will fail whenever it’s easier to recall a rare event than a common one.

Why would it be easier to recall rare events than common events? For one thing, rare events that are sufficiently powerful tend to stick in our minds (I bet you can remember exactly where you were when the planes struck the towers). We sometimes see rare events happen more often than common ones due to media coverage. For example, many people are more afraid of dying in a plane crash than dying in an auto accident, even though an auto fatality is far more likely. There are a few reasons for this, but one is that any time a plane crashes anywhere, it’s international news; car crashes, by contrast, are so common that they’re not reported… so it is much easier to remember a lot of plane crashes than a lot of car crashes. Another example is the lottery; lotto winners are highly publicized while the millions of losers are not, leading us to assume that winning is more probable than it actually is.

What does any of this have to do with games? For one thing, we tend to remember our epic wins much more easily than we remember our humiliating losses (another trick our brains play on us just to make life more bearable). People tend to assume they’re above average at most things, so absent of actual hard statistics, players will tend to overestimate their own win percentage / skill. This is dangerous, in games where players can set their difficulty level or choose their opponents. In general, we want a player to succeed a certain percentage of the time, and tune the difficulty of our games accordingly; if a player chooses a difficulty that’s too hard for them, they’ll struggle a bit more and be more likely to give up in frustrations. By being aware of this tendency, we can try (for example) to force players into a good match for their actual skills – through automated matchmaking, dynamic difficulty adjustment, or other tricks.

Self-Serving Bias

There’s a certain point where an event is unlikely but still possible, where players will assume it is much more likely than it actually is. In Sid Meier’s GDC keynote this year, he placed this from experience at somewhere around 3:1 or 4:1… that is, if the player had a 75 to 80% chance of winning or greater, and they did win exactly that percentage of the time, it would feel wrong to them, like they were losing more than they should. His playtesters expected to win nearly all of the time, I’d guess around 95% of the time, if the screen displayed a 75% or 80% chance.

Players also have a self-serving bias, that probably ties into what I said before about how everyone thinks they’re above-average. So while players are not okay with losing a quarter of the time when they have a 75% win advantage, they are perfectly okay winning a quarter of the time if they are at a 1:3 disadvantage.

Attribution Bias

In general, players are much more likely to accept a random reward than a random setback or punishment. And interestingly, they interpret these random events very differently.

With a random reward, players have a tendency to internalize the event, to believe that they earned the reward through superior decision-making and strategy in play. Sure, maybe it was a lucky die roll, but they were the ones who chose to make the choices that led to the die roll, and their calculated risk paid off, so clearly this was a good decision on their part.

With a random setback, players tend to externalize the event; they blame the dice or cards, they say they were just unlucky. If it happens too much, they might go so far as to say that they don’t like the game because it’s unfair. If they’re emotionally invested enough in the game, such as a high-stakes gambling game, they might even accuse other players of cheating! With video games the logic and random-number generation is hidden, so we see some even stranger player behavior. Some players will actually believe that the AI is peeking at the game data or altering the numbers behind their back and cheating on purpose, because after all it’s the computer so it can theoretically do that.

Basically, people handle losing very differently from winning, in games and in life.

Anchoring

Another way people get odds wrong is this phenomenon called anchoring. The idea is, whatever the first number is that people see, they latch onto that and overvalue it. So for example, if you got to a casino and look at any random slot machine, probably the biggest, most attention-grabbing thing on there is the number of coins you can win with a jackpot. Because people look at that, and concentrate on it, and it gives them the idea that their chance of winning is much bigger than it actually is.

Sid Meier mentioned a curious aspect of that during his keynote. Playtesters – the same ones that were perfectly happy losing a third of the time when the had a 2:1 advantage just like they were supposed to – would feel the game was unfair if they lost a third of the time when they had a 20:10 advantage. Why? Because the first number they see is that 20, which seems like a big number so they feel like they’ve got a lot of power there… and it feels a lot bigger than 10, so they feel like they should have this overwhelming advantage. (Naturally, if they have a 10:20 disadvantage, they are perfectly happy to accept one win in three.)

It also means that a player who has, say, a small amount of base damage and then a bunch of bonuses may underestimate how much they do.

The Gambler’s Fallacy

Now we return to the gambler’s fallacy, which is that people expect random numbers to look random. Long streaks make people nervous and make them question whether the numbers are actually random.

One statistic I found from the literature is that if you ask a person to generate a “random” list of coin flips from their head, they tend to not be very random. Specifically, if a person’s previous item was Heads, they have a 60% chance of picking Tails for the next flip, and vice versa (this is assuming you’re simply asking them to say “Heads” or “Tails” from their head when they are instructed to come up with a “random” result, not when they’re actually flipping a real coin). In a string of merely 10 coin flips, it is actually pretty likely you’ll get 4 Heads or 4 Tails in a row (since the probability of 4 in a row is 1 in 8), but if you ask someone to give you ten “random” numbers that are either 0 or 1 from their head, they will probably not give you even 3 in a row.

This can lead players astray in less obvious ways. Here’s an example, also from Sid’s keynote. Remember how players feel like a 3:1 advantage means they’ll almost always win, but they’re okay with losing a 2:1 contest about a third of the time? It turns out that if they lose two 2:1 contests in a row, this will feel wrong to a lot of people; they don’t expect unlikely events to happen multiple times in a row, even though by the laws of probability they should.

Here’s another example, to show you why as game designers we need to be keenly aware of this. Suppose you design a video game that involves a series of fair coin flips as part of its core mechanics (maybe you use this to determine who goes first each turn, or something). Probability tells you that 1 out of every 32 plays, the first six coin flips will be exactly the same result. If a player sees this as their first introduction to your game, they may perceive this event as so unlikely that the “random” number generator in the game must be busted somehow. If the coins come up in their favor, they won’t complain… but when regression to the mean kicks in and they start losing half the time like they’re supposed to, they’ll start to feel like the game is cheating, and it will take them awhile to un-learn their (incorrect) first impression that they are more likely to win a flip. Worse, if the player sees six losses in a row, right from the beginning, you can bet that player is probably going to think your game is unfair. To see how much of a problem this can potentially become, suppose your game is a modest hit that sells 3.2 Million units. In that case, one hundred thousand players are going to experience a 6-long streak as their first experience on their first game. That’s a lot of players that are going to think your game is unfair!

The Gambler’s Fallacy is something we can exploit as gamers. People assume that long streaks do not appear random, so when trying to “play randomly” they will actually change values more often than not. Against a non-championship opponent, you can win more than half the time at Rock-Paper-Scissors by knowing this. Insist on playing to best 3 of 5, or 4 of 7, or something. Since you know your opponent is unlikely to repeat their last throw, on subsequent rounds you should throw whatever would have lost to your opponent’s last throw, because your opponent probably won’t do the same thing twice, so you probably won’t lose (the worst you can do is draw).

The Hot-Hand Fallacy

There’s a variant of the Gambler’s Fallacy that mostly applies to sports and other action games. The Hot-Hand fallacy is so called because in the sport of Basketball fans started getting this idea that if a player made two or three baskets in a row, they were “running hot” and more likely to score additional baskets and not miss. (We even see this in sports games like NBA Jam, where becoming “on fire” is actually a mechanic that gives the player a speed and accuracy advantage… and some cool effects like making the basket explode in a nuclear fireball.)

When probability theorists looked at this, their first reaction was that each shot is an independent event, like rolling dice, so there’s no reason why previous baskets should influence future ones at all. They expected that a player would be exactly as likely to make a basket, regardless of what happened in the players’ previous attempts.

Not so fast, said Basketball fans. Who says they’re completely independent events? Psychology plays a role in sports performance. Maybe the player has more confidence after making a few successful shots, and that causes them to play better. Maybe the fans cheering them on gives them a little extra mental energy. Maybe the previous baskets are a sign that the player is hyper-focused on the game and in a really solid flow state, making it more likely they’ll continue to perform well. Who knows?

Fair enough, said the probability theorists, so they looked at actual statistics from a bunch of games to see if previous baskets carried any predictive value for future performance.

As it turned out, both the theorists and sports fans were wrong. If a player made several baskets in a row, it slightly increased their chance of missing next time – the longer the streak, the greater the chance of a miss (relative to what would be expected by random chance). Why? I don’t think we know for sure, but presumably there is some kind of negative psychological effect. Maybe the player got tired. Maybe the other team felt that player was more of a threat, and played a more aggressive defense when that player had the ball. Maybe the crowd’s cheering broke the player’s flow state, or maybe the player gets overconfident and starts taking more unnecessary risks.

Whatever the case, this is something that works against us when players can build up a win streak in our games – especially if we tie that to social rewards, such as achievements, trophies, or leaderboards that give special attention to players with long streaks. Why is this dangerous? Because at best, even if each game is truly an independent random event, we know that a win streak is anomalous. If a player’s performance overall falls on some kind of probability curve (usually a bell curve) and they happen to achieve uncharacteristically high performance in a single game or play session or whatever, odds are their next game will fall lower on the curve. The player is probably erroneously thinking their skills have greatly improved; when they start losing again, they’ll feel frustrated, because they know they can do better. Thus, when the streak inevitably comes to an end, the whole thing is tainted in the player’s mind. It’s as if the designer has deliberately built a system that automatically punishes the player after every reward.

Houston, We Have a Problem…

To sum up, here are the problems we face as designers, when players encounter our random systems:

  • Selection bias: improbable but memorable events are perceived as more likely than they actually are.
  • Self-serving bias: an “unlikely loss” is interpreted as a “nearly impossible loss” when the odds are in the player’s favor. However, an “unlikely win” is still correctly interpreted as an “unlikely but possible win” when the odds are against the player.
  • Attribution bias: a positive random result is assumed to be due to player skill; a negative random result is assumed to be bad luck (or worse, cheating).
  • Anchoring: players overvalue the first or biggest number seen.
  • Gambler’s fallacy: assuming that a string of identical results reduces the chance that the string will continue.
  • Hot-hand fallacy: assuming that a string of identical results increases the chance that the string will continue.

The lesson here is that if you expose the actual probabilities of the game to your players, and your game produces fair, random numbers, players will complain because according to them and their flawed understanding of probability, the game feels wrong.

As designers, what do we do about this? We can complain to each other about how all our stupid players are bad at math. But is there anything we can do to take advantage of this knowledge, that will let us make better games?

When Designers Turn Evil

One way to react to this knowledge is to exploit it in order to extract large sums of money from people. Game designers who turn to the Dark Side of the Force tend to go into the gambling industry, marketing and advertising, or political strategy. (I say this with apologies to any honest designers who happen to work in these industries.)

Gambling

Lotteries and casinos regularly take advantage of selection bias by publicizing their winners, making it seem to people like winning is more likely than it really is. Another thing they can do if they’re more dishonest is to rig their machines to give a close but not quite result more often than would be predicted by random chance, such as having two Bars and then a blank come up on a slot machine, or having four out of five cards for a royal flush come up in a video poker game. These give the players a false impression that they’re closer to winning more often than they actually are, increasing their excitement and anticipation of hitting a jackpot and making it more likely they’ll continue to play.

Marketing and Advertising

Marketers use the principle of anchoring all the time to change our expectations of price. For example, your local grocery store probably puts big discount stickers all over the place to call your attention to the lower prices on select items, and our brains will assume that the other items around them are also less expensive by comparison… even if they’re actually not.

Another example of anchoring is a car dealership that might put two nearly identical models next to each other, one with a really big sticker price and one with a smaller (but still big) price. Shoppers see the first big price and anchor to that, then they see the smaller price and feel like by comparison they’re getting a really good deal… even though they’re actually getting ripped off.

Political Strategy

There are a ton of tricks politicians can use to win votes. A common one these days is to play up people’s fears of vastly unlikely but well-publicized events like terrorism or hurricanes, and make campaign promises that they’ll protect you and keep your family safe. Odds are, they’re right, because the events are so unlikely that they probably won’t happen again anyway.

Scam Artists

The really evil game designers use their knowledge of psychology to do things that are highly effective, and highly illegal. One scam I’ve heard involves writing a large number of people offering your “investment advice” and telling them to watch a certain penny stock between one day and the next. Half of the letters predict the stock will go up, the other half say it’ll go down. Then, whatever actually happens, you take that half where you guessed right, and make another prediction. Repeat that four or five times, and eventually you get down to a handful of people for whom you’ve predicted every single thing right. And those people figure there’s no way that could just be random chance, you must have a working system, and they give you tons of money, and you skip town and move to Fiji or something.

What About Good Game Designers?

All of that is fine and good for those of us who want to be scam artists (or for those of us who don’t want to get scammed). But what about those of us who still want to, you know, make games for entertainment?

We must remember that we’re crafting a player experience. If we want that experience to be a positive one, we need to take into account that our players will not intuitively understand the probabilities expressed in our game, and modify our designs accordingly.

Skewing the Odds

One way to do this is to tell our players one thing, and actually do something else. If we tell the player they have a 75% chance of winning, under the hood we can actually roll it as if it were a 95% chance. If the player gets a failure, we can make the next failure less likely, cumulatively; this makes long streaks of losses unlikely, and super-long streaks impossible.

Random Events

We can use random events with great care, especially those with major game-changing effects, and especially those that are not in the player’s favor. In general, we can avoid hosing the player to a great degree from a single random event; otherwise, the player may think that they did something wrong (and unsuccessfully try to figure out what), or they may feel annoyed that their strategy was torn down by a single bad die-roll and not want to play anymore. Being very clear about why the bad event happened (and what could be done to prevent it in future plays) helps to keep the player feeling in control.

Countering the Hot Hand

To counter the hot-hand problem (where a streak of wins makes it more likely that a player will screw up), one thing to do is to downplay the importance of “streaks” in our games, so that the players don’t notice when they’re on a streak in the first place (and therefore won’t notice, or feel as bad, when the streak ends).

If we do include a streak mechanism, one thing we can do is embed them in a positive feedback loop, giving the player a gameplay advantage to counteract the greater chance of a miss after a string of hits. For example, in Modern Warfare 2, players get certain bonuses if they continue a kill streak (killing multiple enemies without being killed themselves), including better weapons, air support, and even a nuclear strike. With each bonus, it’s more likely their streak will continue because they are now more powerful.

Summary

In short, we know that players have a flawed understanding of probability. If we understand the nature of these flaws, we can change our game’s behavior to conform to player expectations. This will make the game feel more fun and more fair to the players. This was, essentially, one of the big takeaways from Sid Meier’s GDC keynote this year.

A Question of Professional Ethics

Now, this doesn’t sit well with everyone. Post-GDC, there were at least a few people that said: wait, isn’t that dishonest? As game designers, we teach our players through the game’s designed systems. If we bend the rules of probability in our games to reinforce players’ flawed understanding of how probability works, are we not doing a disservice to our players? Are we not taking something that we already know is wrong and teaching it to players as if it were right?

One objection might be that if players aren’t having fun (regardless of whether that comes from our poor design, or their poor understanding of math), then the designer must design in favor of the player – especially if they are beholden to a developer and publisher that expect to see big sales. But is this necessarily valid? Poker, for example, is incredibly popular and profitable… even though it will mercilessly punish any player that dares harbor any flaws in how they think about probability.

To this day, I think it is still an open question, and something each individual designer must decide for themselves, based on their personal values and the specific games they are designing. Take a moment to think about these questions yourself:

  • Are the probabilities in our games something that we can frame as a question of professional ethics?
  • How do we balance the importance of giving the player enjoyment (especially when they are paying for it), versus giving them an accurate representation of reality?
  • What’s more important: how the game actually works, or how players perceive it as working?

One More Solution

I can offer one other solution that works in some specific situations, and that’s to expose not just the stated probabilities to the player, but the actual results as well. For example, if you ask players to estimate their win percentage of a game when it’s not tracked by the game, they’ll probably estimate higher than the real value. If their wins, losses and percentage are displayed to them every time they go to start a game, they have a much more accurate view of their actual skill.

What if you have a random number generator? See if this sounds familiar to you: you’re playing a game of Tetris and you become convinced at some point that the game is out to get you. You never seem to get one of those long, straight pieces until just after you need it, right? My favorite version of Tetris to this day, the arcade version, had a neat solution to this: in a single-player game, your game took up the left half of the screen, and it used the right half to keep track of how many of each type of brick fell down. So if you felt like you were getting screwed, you could look over there and see if the game was really screwing you, or if it was just your imagination. And if you kept an eye on those over time, you’d see that yes, occasionally you might get a little more of one piece than another on average over the course of a single level, but over time it would balance out, and most of the time you’d get about as much of each brick as any other. The game was fair, and it could prove it with cold, hard facts displayed to the player in real time.

How could this work in other games? In a Poker video game against AI opponents, you could let the player know after each hand if they actually held the winning hand or not, and keep ongoing track of their percentage of winning hands, so that they know the deck shuffling is fair. (This might be more controversial with human opponents, as it gives you some knowledge of your opponents’ bluffing patterns.) If you’re making a version of the board game RISK that has simulated die rolls in it, have the game keep track of how frequently each number or combination is rolled, and let the player access those statistics at any time. And so on.

These kinds of things are surprisingly reassuring to a player who can never know for sure if the randomness inside the computer is fair or not.

When Randomness Isn’t Random

This brings us to the distinction of numbers that are random versus numbers that are pseudorandom. Pseudorandom literally means “fake random” and in this case it means it’s not actually entirely random, it just looks that way.

Now, most of the things we use even in physical games for randomness are not perfectly random. Balls with numbers painted on them in a hopper that’s used for Bingo might be slightly more or less likely to come up because the paint gives them slightly different weights. 6-sided dice with recessed dots may be very slightly weighted towards one side or another since they’ve actually got matter that’s missing from them, so their center of gravity is a little off. Also, a lot of 6-sided dice have curved edges, and if those curves are slightly different then it’ll be slightly more likely to keep rolling when it hits certain faces, and thus a little more or less likely to land on certain numbers. 20-sided dice can be slightly oblong rather than perfectly round (due to how they’re manufactured), making it slightly less likely to roll the numbers on the edges. All this is without considering dice that are deliberately loaded, or the person throwing the dice has practiced how to throw what they want!

What about cards? All kinds of studies have shown that the way we shuffle a deck of cards in a typical shuffle is not random, and in fact if you shuffle a certain way for a specific number of times (I forget the exact number but it’s not very large), the deck will return almost exactly to its original state; card magicians use this to make it look like they’re shuffling a deck when they’re actually stacking it. Even an honest shuffle isn’t perfectly random; if you think about it, for example, depending on how you shuffle either the top or bottom card will probably stay in the same position after a single riffle-shuffle, so you have to shuffle a certain number of times before the deck is sufficiently randomized. Even without stacking the deck deliberately, the point is that all shuffles are not equally likely.

In Las Vegas, they have to be very careful about these things, which is why you’ll notice that casino dice are a lot different from those white-colored black-dotted d6s you have at home. One slightly unfair die can cost the casino its gambling license (or cost them money from sufficiently skilled players who know how to throw dice unfairly), which is worth billions of dollars just from their honest, long-term House advantage, which is why they want to be very sure that their dice are as close to perfectly random as possible.

Shuffling cards is another thing casinos have to be careful of. If the cards are shuffled manually, you can run into a few problems with randomness (aside from complaints of carpal tunnel from your dealers). A dealer who doesn’t shuffle enough to sufficiently randomize the deck – because they’re trying to deal more hands so they get lazy with the shuffling – can be exploited by a player who knows there’s a better-than-average chance of some cards following others in sequence. And that’s not even counting dealers who collude with players to do this intentionally.

Casinos have largely moved to automated shufflers to deal with these problems, but those have problems of their own; for example, mechanical shuffles are potentially less random than human shuffles, so a careful gambler can use a hidden camera to analyze the machine and figure out which cards are more likely to clump together from a fresh deck, giving themselves an advantage. These days, some of the latest automated shufflers don’t riffle-shuffle, they actually stack the deck according to a randomized computer algorithm, but as we’ll see shortly, even those algorithms can have problems, and those problems can cost the casinos a lot of money if they’re not careful.

The point here is, even the events in physical games that we think of as “random,” aren’t always as random as we give them credit for. There’s not necessarily a lot we can do about this, mind you, at least not without going to great expense to get super-high-quality game components. Paying through the nose for machine-shopped dice is a bit much for most of us that just want a casual game of Catan, so as gamers we have to accept that our games aren’t always completely random and fair… but they’re close enough, they affect all players equally, so we disregard.

Psuedorandom Numbers

Computers have similar problems, because as I mentioned in the introduction, a computer doesn’t have any kind of randomness inside it. It’s all ones and zeros, high and low voltage going through wires or being stored electromagnetically in memory or on a disk somewhere, it’s completely deterministic. And unless you’re willing to get some kind of special hardware that measures some kind of physical phenomenon that varies like some kind of Geiger counter tracking the directions of released radiation or something, which most of us aren’t, your computer is pretty much stuck with this problem that you have to use a deterministic machine to play a non-deterministic game of chance.

We do this through a little bit of mathematics that I won’t cover here (you can do a Google search for pseudorandom number algorithms on your own if you care). All you need to know is that there are some math functions that behave very erratically, without an apparent pattern, and so you just take one of the results from that function and call it a random number.

How do you know which result to take from the function? Well, you determine that randomly. Just kidding; as we said, you can’t really do that. So instead what you do is you have to tell the computer which one to take, and then it’ll start with that one, and then next time you need a random number it’ll take the next one in sequence, then the next, and so on. But because we told it where to start, this is no longer actually random, even though it might look that way to a casual player. The number you tell the computer to start with is called a random number seed, and once you give the computer one seed, it just starts picking random numbers with its formula sequentially from there. So you only have to seed it once. But… this is important… if you give it the same seed you’ll get exactly the same results. Remember, it’s deterministic!

Usually we get around this by picking a random number seed that’s hard for a player to intentionally replicate, like the number of milliseconds that have elapsed since midnight or something. You have to choose carefully though. If, for example, you pick a random number seed that’s just the fractional milliseconds in the system clock (from 0 to 999), then really your game only has 1000 ways of “shuffling”, which is enough that over repeated play a player might see two games that seem suspiciously identical. If your game is meant to be played competitively, a sufficiently determined player could study your game and reach a point where they could predict which random numbers happen and when, and use that to gain an unfair advantage. So we have to be careful when creating these systems.

Pseudorandomness in Online Games: Keeping Clients in Sync

You have to be extra careful with random numbers in an online game, if your players’ machines are generating their own random numbers. I’ve worked on games at two companies that were architected that way, for better or worse. What could happen was this: you would of course have one player (or the server) generate a random number seed, and that seed would be used for both players in a head-to-head game. Then, when either player needed a random number, both machines would have to generate that number so that their random number seed was kept in sync. Occasionally due to a bug, one player might generate a random number and forget to inform their opponent, and now their random number generators are out of sync. The game continues for a few turns until suddenly, one player takes an action that requires a random element, their machine rolls a success, the opponent’s machine (with a different random number) rolls a failure, the two clients compare checksums and fail because they now have a different game state, and both players become convinced the other guy is trying to cheat. Oops. (A better way to do this for PC games is to put all the game logic on the server and use thin clients; for networked handheld console or phone games where there is no server and it’s direct-connect, designate one player’s device to handle all these things and broadcast the game state to the other players.)

Pseudorandomness in Single-Player Games: Saving and Loading

You also have to be careful with pseudorandom numbers in a single-player game, because of the potential for exploits. This is a largely unsolved problem in game design. You can’t really win here, but you can at least pick your poison.

Save Anywhere

Suppose you have a game where the player can save anywhere, any time. Many games do this, because it is convenient for the player. However, nothing stops the player from saving just before they have to make a big random roll, maybe something where they’re highly unlikely to succeed but there’s a big payoff if they do, and keep reloading from save until they succeed. If you re-generate your random number seed each time they reload from save, they will eventually succeed, and they’re not really playing the game that you designed at that point… but on the other hand, they’re using the systems you designed, so they’re not really cheating either. Your carefully balanced probabilities suddenly become unbalanced when a player can keep rerolling until they win.

Save Anywhere, Saved Seed

Okay, so you say, let’s fix that: what if we save the random number seed in the saved game file? Then, if you try to save and reload, you’ll get the same result every time! First, that doesn’t eliminate the problem, it just makes it a little harder; the player just has to find one other random thing to do, like maybe drinking a potion that restores a random number of HP, or maybe choosing their combat actions in a different order or something, and keep trying until they find a combination of actions that works. Second, you’ve now created a new problem: after the player saves, they know exactly what the enemy AI will do on every turn, because once you start with the same random number seed the game now becomes fully deterministic! Sometimes this foreknowledge of exactly how an enemy will act in advance is even more powerful than being able to indefinitely reroll.

Save Points

So you say, okay, let’s limit where the player can save, so that they’ll have to go through some nontrivial amount of gameplay between saves. Now they can theoretically exploit the save system, but in reality they have to redo too much to be able to fully optimize every last action. And then we run into a problem with the opposite type of player: while this mechanism quells the cheating, the honest players now complain that your save system won’t let them walk away when they want, that the game holds them hostage between save points.

Quicksave

Maybe you think to try a save system where the player can save any time, but it erases their save when they start, so they can’t do the old save/reload trick. This seems to work… until the power goes out just as an honest player reaches the final boss, and now they have to start the entire game over from scratch. And they hire a hit man to kill you in your sleep because you deserve it, you evil, evil designer.

Save Anywhere, Limited Times

You give the player the ability to save anywhere, but limit the total number of saves. The original Tomb Raider did this, for example. This allows some exploits, but at least not on every last die-roll. Is this a good compromise?

Oh, by the way, I hope you gave the player a map and told them exactly how far apart they can save on average, and drew some BIG ARROWS on the map pointing to places where the big battles are going to happen, so a player doesn’t have to replay large sections of the map just because they didn’t know ahead of time where the best locations were to save. And then your players will complain that the game is too easy because it gives them all this information about where the challenge is.

Pick Your Poison

As I said, finding the perfect save system is one of those general unsolved problems in game design, just from the perspective of what kind of system is the most fun and enjoyable from the player, and that’s just for deterministic games! When you add pseudorandom numbers, you can see how the problem can get much thornier, so this is something you should be thinking about as a designer while designing the load/save system… because if you don’t, then it’ll be left to some programmer to figure out, God help you, and it’ll probably be based on whatever’s easiest to code and not what’s best for the game or the player.

When Pseudorandom Numbers Fail

Even if you choose a good random number seed, and even if you ignore player exploits and bugs, there are other ways that randomness can go wrong if you choose poor algorithms. For example, suppose you have a deck of cards and want to shuffle them. Here’s a naïve algorithm that most budding game programmers have envisioned at some point:

  1. Start with an unshuffled deck.
  2. Generate a pseudorandom number that corresponds to a card in the deck (so if the deck is 52 cards, generate a whole number between 1 and 52… or 0 and 51, depending on what language you’re using). Call this number A.
  3. Generate a second pseudorandom number, the same way. Call this number B.
  4. Swap the cards in positions A and B in the deck.
  5. Repeat steps 2-4 lots and lots of times.

The problem here is, first off, it takes an obnoxiously long time to get anything resembling a random shuffle. Second, because the card positions start out fixed, and you’re swapping random pairs one at a time, no matter how many times you repeat this there is always a slightly greater chance that you’ll find each card in its original position in the deck, than anywhere else. Think of it this way: if a card is ever swapped, it’ll swap to a random position, so all positions are equally likely for any card that has been swapped at all. Swapping multiple times makes no difference; you’re going from one random place to another, so you still end up equally likely to be in any position. So any card that’s swapped is equally likely to end up anywhere, with equal frequency, as a baseline (this is good). However, there is some non-zero chance that a card will not be swapped, in which case it will remain in its original position, so it is that much more likely to stay where it is. The more times you perform swaps, the lower the chance that a card will remain in its original position, but no matter how much you swap you can never get that probability all the way down to zero. Ironically, this means that the most likely shuffle from this algorithm is to see all cards in exactly the same position they started!

So you can see that even if the pseudorandom numbers generated for this shuffle are perfectly random (or close enough), the shuffle itself isn’t.

Are Your Pseudorandom Numbers Pseudorandom Enough?

There is also, of course, the question of whether your pseudorandom number generator function itself actually produces numbers that are pretty random, or if there are actually some numbers that are more or less likely than others, either due to rounding error or just a poor algorithm. A simple test for this is to use your generator to generate a few thousand pairs of random coordinates on a 2d graph, and plot that graph to see if there’s any noticeable patterns (like the numbers showing up in a lattice pattern, or with noticeable clusters of results or empty spaces). You can expect to see some clustering, of course, because as we learned that’s how random numbers work. But if you repeat the experiment a few times you should see clusters in different areas. This is a way of using Monte Carlo simulation to do a quick visual test for if your pseudorandom numbers are actually random. (There are other more mathematical ways to calculate the exact level of randomness from your generator, but that requires actual math; this is an easier, quick-and-dirty “good enough” test for game design purposes.)

Most of the time this won’t be an issue. Most programming languages and game libraries come with their own built-in pseudorandom number generation functions, and those all use established algorithms that are known to work, and that’s what most programmers use. But if your programmer, for some reason, feels the need to implement a custom pseudorandom number generation function, this is something you will want to test carefully!

Homework

In past weeks, I’ve given you something you can do right now to improve the balance of a game you’re working on, and also a task you can do later for practice. This week I’m going to reverse the order. Get some practice first, then apply it to your project once you’re comfortable with the idea.

For this week’s “homework” I’m going to go over two algorithms for shuffling cards. I have seen some variant of both of these used in actual working code in shipped games before (I won’t say which games, to protect the innocent). In both cases, we saw the perennial complaints from the player base that the deck shuffler was broken, and that there were certain “hotspots” where if you placed a card in that position in your deck, it was more likely to get shuffled to the top and show up in your opening hand. What I want you to do is think about both of these algorithms logically, and then figure out if they work. I’ll give you a hint: one works and one doesn’t. I’ll actually give you the source code, but I’ll also explain both for you in case you don’t know how to program. Keep in mind that this might look like a programming problem, but really it’s a probability calculation: how many ways can you count the different ways a shuffling algorithm shuffles, and do those ways line up evenly with the different permutations of cards in a deck?

Algorithm #1

The first algorithm looks like this:

  1. Start with an unshuffled deck.
  2. Choose a random card from all available cards (so if the deck has 60 cards, choose a pseudorandom number between 1 and 60). Take the card in that position, and swap with card #60. Essentially, this means choose one card randomly to put on the bottom of the “shuffled” deck, then lock it there in place.
  3. Now, take a random card from the remaining cards (between 1 and 59), and swap that card with position #59, putting it on top of the previous one.
  4. Then take another random card from the remaining ones (between 1 and 58), swap that with position #58, and so on.
  5. Keep repeating this until eventually you get down to position #1, which swaps with itself (so it does nothing), and then we’re done.

This is clearly different from how humans normally shuffle a deck, but remember, the purpose here isn’t to emulate a human shuffle; it’s to get a random shuffle, that is, a random ordering of cards.

Algorithm #2

The second algirhtm is similar, but with two minor changes.

  1. Start with an unshuffled deck.
  2. Choose a random card from all available cards (in a 60 card deck, that means a card from 1 to 60). Swap with position #60, putting it on the bottom of the deck.
  3. Choose a random card from all available cards, including those that have been chosen already (so, choose another random card from 1 to 60). Swap with position #59.
  4. Choose another random card from 1 to 60, and swap with position #58.
  5. Keep repeating this until eventually you choose a random number from 1 to 60, swap that card with position #1, and you’re done.
  6. Oh yeah – one last thing. Repeat this entire process (steps 2-5) fifty times. That’ll make it more random.

Hints

How do you approach this, when there are too many different shuffles in a 60-card deck to count? The answer is that you start much simpler. Assume a deck with only three cards in it (assume these cards are all different; call them A, B and C if you want).

First, figure out how many ways there are to order a 3-card deck. There are mathematical tricks for doing this that we haven’t discussed, but you should be able to do this just by trial and error.

Next, look at both algorithms and figure out how many different ways there are for each algorithm to produce a shuffle. So in the first case, you’re choosing from 3 cards, then choosing from 2, then choosing from 1. In the second case you’re choosing from 3, then choosing from 3, then choosing from 3. Compare the list of actual possible orderings of the deck, that is, all the different unique ways the deck can be shuffled… then compare to all the different ways a deck can be shuffled by these algorithms. You’ll find that one of them produces a random shuffle (well, as random as your pseudorandom number generator is, anyway) and one actually favors certain shuffles over others. And if it works that way for a 3-card deck, assume it’s similarly random (or not) for larger decks. If you want to go through the math with larger decks, be my guest, but you shouldn’t have to.

If You’re Working on a Game Now…

Video Games

Once you’ve done that, if you’re working on a game that involves computers, take a look at the pseudorandom numbers and how your program uses them. In particular, if you use a series of pseudorandom numbers to do something like deck shuffling, make sure you’re doing it in a way that’s actually random; and if you’re using a nonstandard way to generate pseudorandom numbers, check that by graphing a bunch of pairs of random coordinates to check for undesirable patterns. Lastly, examine how your random number seed is stored in the game state; if it’s a multiplayer game, is it stored separately in different clients or on a single server? If it’s a single-player game, does your save-game system work in such a way that a player can save, attempt a high-risk random roll, and keep reloading from save until it succeeds?

All Games, Digital or Non-Digital

Another thing to do, whether you’re designing a board game or video game, is to take a look at the random mechanics in your game (if there are any) and ask yourself some questions:

  • Is the game dominated more by skill or luck, or is it an even mix?
  • Is the level of skill and luck in the game appropriate to the game’s target audience, or should the game lean a bit more to one side or the other?
  • What kinds of probability fallacies are your players likely to observe when they play? Can you design your game differently to change the player perception of how fair and how random your game is? Should you?

Level 4: Probability and Randomness

July 28, 2010

Readings/Playings

Read this article on Gamasutra by designer Tyler Sigman. I affectionately refer to it as the “Orc Nostril Hair” article, but it provides a pretty good primer for probabilities in games.

This Week’s Topic

Up until now, nearly everything we’ve talked about was deterministic, and last week we really went deep into transitive mechanics and took them about as far as I can go with them. But so far we’ve ignored a huge aspect of many games, the non-deterministic aspects: in other words, randomness. Understanding the nature of randomness is important for game designers because we make systems in order to craft certain player experiences, so we need to know how those systems work. If a system includes a random input, we need to understand the nature of that randomness and how to modify it to get the results that we want.

Dice

Let’s start with something simple: a die-roll. When most people think of dice, they are thinking of six-sided dice, also known as d6s. But most gamers have encountered plenty of other dice: four-sided (d4), eight-sided (d8), d12, d20… and if you’re really geeky, you might have a d30 or d100 lying around. In case you haven’t seen this terminology before, “d” with a number after it means a die with that many sides; if there is a number before the “d” that is the number of dice rolled – so for example, in Monopoly you roll 2d6.

Now, when I say “dice” here, that is shorthand. We have plenty of other random-number generators that aren’t globs of plastic but that still serve the same function of generating a random number from 1 to n. A standard coin can be thought of as a d2. I’ve seen two designs for a d7, one which looks like a die and the other which looks more like a seven-sided wooden pencil. A four-sided dreidel (also known as a teetotum) is equivalent to a d4. The spinner that comes with Chutes & Ladders that goes from 1 to 6 is equivalent to a d6. A random-number generator in a computer might create a random number from 1 to 19 if the designer wants it to, even though there’s no 19-sided die inside there (I’ll actually talk a bit more about numbers from computers next week). While all of these things look different, really they are all equivalent: you have an equal chance of choosing one of several outcomes.

Dice have some interesting properties that we need to be aware of. The first is that each side is equally likely to be rolled (I’m assuming you’re using a fair die and not a rigged one). So, if you want to know the average value of a roll (also known as the “expected value” by probability geeks), just add up all the sides and divide by the number of sides. The average roll for a standard d6 is 1+2+3+4+5+6 = 21, divided by the number of sides (6), which means the average is 21/6 = 3.5. This is a special case, because we assume all results are equally likely.

What if you have custom dice? For example, I’ve seen one game with a d6 with special labels: 1, 1, 1, 2, 2, 3, so it behaves sort of like this weird d3 where you’re more likely to get a 1 than a 2 and more likely to get 2 than 3. What’s the average roll for this die? 1+1+1+2+2+3 = 10, divided by 6, equals 5/3 or about 1.66. So if you’ve got that custom die and you want players to roll three of them and add the results, you know they’ll roll an average of about a total of 5, and you can balance the game on that assumption.

Dice and Independence

As I said before, we’re going on the assumption that each roll is equally likely. This is true no matter how many dice are rolled. Each die roll is what we call independent, meaning that previous rolls do not influence later rolls. If you roll dice enough times you definitely will see “streaks” of numbers, like a run of high or low numbers or something, and we’ll talk later about why that is, but it doesn’t mean the dice are “hot” or “cold”; if you roll a standard d6 and get two 6s in a row, the probability of rolling another 6 is… exactly 1/6. It is not more likely because the die is “running hot”. It is not less likely because “we already got two 6s, so we’re due for something else”. (Of course, if you roll it twenty times and get 6 on each time, the odds of getting a 6 on the twenty-first roll are actually pretty good… because it probably means you have a rigged die!) But assuming a fair die, each roll is equally likely, independent of the others. If it helps, assume that we’re swapping out dice each time, so if you roll a 6 twice, remove that “hot” die from play and replace with a new, “fresh” d6. For those of you that knew this already, I apologize, but I need to be clear about that before we move on.

Making Dice More or Less Random

Now, let’s talk about how you can get different numbers from different dice. If you’re only making a single roll or a small number of rolls, the game will feel more random if you use more sides of the dice. The more you roll a die, or the more dice you roll, the more things will tend towards the average. For example, rolling 1d6+4 (that is, rolling a standard 6-sided die and adding 4 to the result) generates a number between 5 and 10. Rolling 5d2 also generates a number between 5 and 10. But the single d6 roll has an equal chance of rolling a 5, or 8, or 10; the 5d2 roll will tend to create more rolls of 7 and 8 than any other results. Same range, and even the same average value (7.5 in both cases), but the nature of the randomness is different.

Wait a minute. Didn’t I just say that dice don’t run hot or cold? And now I’m saying if you roll a lot of them, they tend towards an average? What’s going on?

Let me back up. If you roll a single die, each roll is equally likely. That means if you roll a lot, over time, you’ll roll each side about as often as the others. The more you roll, the more you’ll tend towards the average, collectively. This is not because previous numbers “force” the die to roll what hasn’t been rolled before. It’s because a small streak of 6s (or 20s or whatever) ends up not being much influence when you roll another ten thousand times and get mostly average rolls… so you might get a bunch of high numbers now, but you might also get a bunch of low numbers later, and over time it all tends to go towards the mean. Not because the die is being influenced by previous rolls (seriously, the die is a piece of plastic, is doesn’t exactly have a brain to be thinking “gosh, I haven’t come up 2 in awhile”), but because that’s just what tends to happen in large sets of rolls. Your small streak in a large ocean of die-rolls will be mostly drowned out.

So, doing the math for a random die roll is pretty straightforward, at least in terms of finding the average roll. There are also ways to quantify “how random” something is, a way of saying that 1d6+4 is “more random” than 5d2 in that it gives a more even spread, mostly you do that by computing something called “standard deviation” and the larger that is the more random it is, but that takes more computation than I want to get into today (I’ll get into it later on). All I’m asking you to know is that in general, fewer dice rolled = more random. While I’m on the subject, more faces on a die is also more random since you have a wider spread.

Computing Probability through Counting

You might be wondering: how can we figure out the exact probability of getting a specific roll? This is actually pretty important in a lot of games, because if you’re making a die roll in the first place there is probably some kind of optimal result. And the answer is, we count two things. First, count the total number of ways to roll dice (no matter what the result). Then, count the number of ways to roll dice that get the result you actually want. Divide the first number by the second and you’ve got your probability; multiply by 100 if you want the percentage.

Examples

Here’s a very simple example. You want to roll 4 or more on 1d6. There are 6 total possible results (1, 2, 3, 4, 5, or 6). Of those, 3 of the results (4, 5, or 6) are a success. So your probability is 3 divided by 6, or 0.5, or 50%.

Here’s a slightly more complicated example. You want to roll an even number on 2d6. There are 36 total results (6 for each die, and since neither die is influenced by the other you multiply 6 results by 6 to get 36). The tricky thing with questions like this, is that it’s easy to double-count. For example, there are actually two ways to roll the number 3 on 2d6: 1+2 and 2+1. Those look the same, but the difference is which number appears on the first die, and which appears on the second die. If it helps, think of each die as having a different color, so maybe you have a red die and a blue die in this case. And then you can count the ways to roll an even number: 2 (1+1), 4 (1+3), 4 (2+2), 4 (3+1), 6 (1+5), 6 (2+4), 6 (3+3), 6 (4+2), 6 (5+1), 8 (2+6), 8 (3+5), 8 (4+4), 8 (5+3), 8 (6+2), 10 (4+6), 10 (5+5), 10 (6+4), 12 (6+6). It turns out there are exactly 18 ways to do this out of 36, also 0.5 or 50%. Perhaps unexpected, but kind of neat.

Monte Carlo Simulations

What if you have too many dice to count this way? For example, say you want to know the odds of getting a combined total of 15 or more on a roll of 8d6. There are a LOT of different individual results of eight dice, so just counting by hand takes too long. Even if we find some tricks to group different sets of rolls together, it still takes a really long time. In this case the easiest way to do it is to stop doing math and start using a computer, and there are two ways to do this.

The first way gets you an exact answer but takes a little bit of programming or scripting. You basically have the computer run through every possibility inside a for loop, evaluating and counting up all the iterations total and also all the iterations that are a success, and then have it spit out the answers at the end. Your code might look something like this:

int wincount=0, totalcount=0;

for (int i=1; i<=6; i++) {

for (int j=1; j<=6; j++) {

for (int k=1; k<=6; k++) {

… // insert more loops here

if (i+j+k+… >= 15) {

wincount++;

}

totalcount++;

}

}

}

float probability = wincount/totalcount;

If you don’t know programming but you just need a ballpark answer and not an exact one, you can simulate it in Excel by having it roll 8d6 a few thousand times and take the results. To roll 1d6 in Excel, use this formula:

=FLOOR(RAND()*6)+1

When you don’t know the answer so you just try it a lot, there’s a name for that: Monte Carlo simulation, and it’s a great thing to fall back on when you’re trying to do probability calculations and you find yourself in over your head. The great thing about this is, we don’t have to know the math behind why it works, and yet we know the answer will be “pretty good” because like we learned before, the more times you do something, the more it tends towards the average.

Combining Independent Trials

If you’re asking for several repeated but independent trials, so the result of one roll doesn’t affect other rolls, we have an extra trick we can use that makes things a little easier.

How do you tell the difference between something that’s dependent and something that’s independent? Basically, if you can isolate each individual die-roll (or series) as a separate event, then it is independent. For example, rolling a total of 15 on 8d6 is not something that can be split up into several independent rolls. Since you’re summing the dice together, what you get on one die affects the required results of the others, because it’s only all of them added together to give you a single result.

Here’s an example of independent rolls: say you have a dice game where you roll a series of d6s. On the first roll, you have to get 2 or higher to stay in the game. On the second roll you have to get 3 or higher. Third roll requires 4 or higher, fourth roll is 5 or higher, and fifth roll requires a 6. If you make all five rolls successfully, you win. Here, the rolls are independent. Yes, if you fail one roll it affects the outcome of the entire dice game, but each individual roll is not influenced by the others; for example, if you roll really well on your second roll, that doesn’t make you any more or less likely to succeed on future rolls. Because of this, we can consider the probability of each roll separately.

When you have separate, independent probabilities and you want to know what is the probability that all of them will happen, you take each of the individual probabilities and multiply them together. Another way to think about this: if you use the word “and” to describe several conditions (as in, “what is the probability that some random event happens and that some other independent random event happens?”), figure out the individual probabilities and multiply.

No matter what you do, do not ever add independent probabilities together. This is a common mistake. To see why it doesn’t work, imagine a 50/50 coin flip, and you’re wondering what is the probability that you’ll get Heads twice in a row. The probability of each is 50%, so if you add those together you’d expect a 100% chance of getting Heads, but we know that’s not true, because you could get Tails twice. If instead you multiply, you get 50%*50% = 25%, which is the correct probability of getting Heads twice.

Example

Let’s go back to our d6 game where you have to roll higher than 2, then higher than 3, and so on up to 6. In this series of 5 rolls, what are the chances you make all of them?

As we said before, these are independent trials, so we just compute the odds of each roll and then multiply together. The first roll succeeds 5/6 times. The second roll, 4/6. The third, 3/6. The fourth, 2/6, and the fifth roll 1/6. Multiplying these together, we get about 1.5%… So, winning this game is pretty rare, so you’d want a pretty big jackpot if you were putting that in your game.

Negation

Here’s another useful trick: sometimes it’s hard to compute a probability directly, but you can figure out the chance that the thing won’t happen much easier.

Here’s an example: suppose we make another game where you roll 6d6, and you win if you roll at least one 6. What are your chances of winning?

There are a lot of things to compute here. You might roll a single 6, which means one of the dice is showing 6 and the others are all showing 1-5, and there are 6 different ways to choose which die is showing 6. And then you might roll two 6s, or three, or more, and each of those is a separate computation, and it gets out of hand pretty quickly.

However, there’s another way to look at this, by turning it around. You lose if none of the dice are showing 6. Here we have six independent trials, each of which has a probability of 5/6 (the die can roll anything except 6). Multiply those together and you get about 33%. So you have about a 1 in 3 chance of losing.

Turning it around again, that means a 67% (or 2 in 3) chance of winning.

The most obvious lesson here is that if you take a probability and negate it, just subtract from 100%. If the odds of winning are 67%, the odds of not winning are 100% minus 67%, or 33%. And vice versa. So if you can’t figure out one thing but it’s easy to figure out the opposite, figure out that opposite and then subtract from 100%.

Combining Conditions Within a Single Independent Trial

A little while ago, I said you should never add probabilities together when you’re doing independent trials. Are there any cases where you can add probabilities together? Yes, in one special situation.

When you are trying to find the probability for several non-overlapping success criteria in a single trial, add them together. For example, the probability of rolling a 4, 5 or 6 on 1d6 is equal to the probability of rolling 4 plus the probability of rolling 5 plus the probability of rolling 6. Another way of thinking about this: when you use the word “or” in your probability (as in, “what is the probability that you will get a certain result or a different result from a single random event?”), figure out the individual probabilities and add them together.

One important trait here is that you add up all possible outcomes for a game, the combined probabilities should add up to exactly 100%. If they don’t, you’ve done your math wrong, so this is a good reality check to make sure you didn’t miss anything. For example, if you were analyzing the probability of getting all of the hands in Poker, if you add them all up you should get exactly 100% (or at least really close – if you’re using a calculator you might get a very slight rounding error, but if you’re doing exact numbers by hand it should be exact). If you don’t, it means there are probably some hands that you haven’t considered, or that you got the probabilities of some hands wrong, so you need to go back and check your figures.

Uneven Probabilities

So far, we’ve been assuming that every single side on a die comes up equally often, because that’s how dice are supposed to work. But occasionally you end up with a situation where there are different outcomes that have different chances of coming up. For example, there’s this spinner in one of the Nuclear War card game expansions that modifies the result of a missile launch: most of the time it just does normal damage plus or minus a few, but occasionally it does double or triple damage, or blows up on the launchpad and damages you, or whatever. Unlike the spinner in Chutes & Ladders or A Game of Life, the Nuclear War spinner has results that are not equally probable. Some results have very large sections where the spinner can land so they happen more often, while other results are tiny slivers that you only land on rarely.

Now, at first glance this is sort of like that 1, 1, 1, 2, 2, 3 die we were talking about earlier, which was sort of like a weighted 1d3, so all we have to do is divide all these sections evenly, find the smallest unit that everything is a multiple of, and then make this into a d522 roll (or whatever) with multiple sides of the die showing the same thing for the more common results. And that’s one way to do it, and that would technically work, but there’s an easier way.

Let’s go back to our original single standard d6 roll. For a normal die, we said to add up all of the sides and then divide by the number of sides, but what are we really doing there? We could say this another way. For a 6-sided die, each side has exactly 1/6 chance of being rolled. So we multiply each side’s result by the probability of that result (1/6 for each side in this case), then add all of these together. Doing this, we get (1*1/6) + (2*1/6) + (3*1/6) + (4*1/6) + (5*1/6) + (6*1/6), which gives us the same result (3.5) as we got before. And really, that’s what we’re doing the whole time: multiplying each outcome by the probability of that outcome.

Can we do this with the Nuclear War spinner? Sure we can. And this will give us the average result if we add these all together. All we have to do is figure out the probability of each spin, and multiply by the result.

Another Example

This technique of computing expected value by multiplying each result by its individual probability also works if the results are equally probable but weighted differently, like if you’re rolling dice but you win more on some rolls than others. As an example, here’s a game you might be able to find in some casinos: you place a wager, and roll 2d6. If you roll the lowest three numbers (2, 3 or 4) or the highest four numbers (9, 10, 11 or 12), you win an amount equal to your wager. The extreme ends are special: if you roll 2 or 12, you win double your wager. If you roll anything else (5, 6, 7 or 8), you lose your wager. This is a pretty simple game. But what is the chance of winning?

We can start by figuring out how many times you win:

  • There are 36 ways to roll 2d6, total. How many of these are winning rolls?
  • There’s 1 way to roll two, and 1 way to roll twelve.
  • There are 2 ways to roll three and eleven.
  • There are 3 ways to roll four, and 3 more ways to roll ten.
  • There are 4 ways to roll nine.
  • Adding these all up, there are 16 winning rolls out of 36.

So, under normal conditions, you win 16 times out of 36… slightly less than 50%.

Ah, but two of those times, you win twice as much, so that’s like winning twice! So if you play this game 36 times with a wager of $1 each time, and roll each possible roll exactly once, you’ll win $18 total (you actually win 16 times, but two of those times it counts as two wins). Since you play 36 times and win $18, does that mean these are actually even odds?

Not so fast. If you count up the number of times you lose, there are 20 ways to lose, not 18. So if you play 36 times for $1 each, you’ll win a total of $18 from the times when you win… but you’ll lose $20 from the twenty times you lose! As a result, you come out very slightly behind: you lose $2 net, on average, for every 36 plays (you could also say that on average, you lose 1/18 of a dollar per play). You can see how easy it is to make one misstep and get the wrong probability here!

Permutations

So far, all of our die rolls assume that order doesn’t matter. Rolling a 2+4 is the same as rolling a 4+2. In most cases, we just manually count the number of different ways to do something, but sometimes that’s impractical and we’d like a math formula.

Here’s an example problem from a dice game called Farkle. You start each round by rolling 6d6. If you’re lucky enough to roll one of each result, 1-2-3-4-5-6 (a “straight”), you get a huge score bonus. What’s the probability that will happen? There are a lot of different ways to have one of each!

The answer is to look at it this way: one of the dice (and only one) has to be showing 1. How many ways are there to do that? Six – there are 6 dice, and any of them can show the 1. Choose that and put it aside. Now, one of the remaining dice has to show 2. There’s five ways to do that. Choose it and put it aside. Continuing along these lines, four remaining dice can show 3, three dice of the remaining ones after that can show 4, two of the remaining dice after that can show 5, and at the end you’re left with a single die that must show 6 (no choice involved in that last one). To figure out how many ways there are to roll a straight, we multiply all the different, independent choices: 6x5x4x3x2x1 = 720 – that seems like a lot of ways to roll a straight.

To get the probability of rolling a straight, we have to divide 720 by the number of ways to roll 6d6, total. How many ways can we do that? Each die can show 6 sides, so we multiply 6x6x6x6x6x6 = 46656 (a much larger number!). Dividing 720/46656 gives us a probability of about 1.5%. If you were designing this game, that’s good to know so you can design the scoring system accordingly. We can see why Farkle gives you such a high score bonus for rolling a straight; it only happens very rarely!

This result is interesting for another reason. It shows just how infrequently we actually roll exactly according to probability in the short term. Sure, if we rolled a few thousand dice, we would see about as many of each of the six numbers on our rolls. But rolling just six dice, we almost never roll exactly one of each! We can see from this, another reason why expecting dice to roll what hasn’t been rolled yet “because we haven’t rolled 6 in awhile so we’re about due” is a fool’s game.

Dude, Your Random Number Generator Is Broken…

This brings us to a common misunderstanding of probability: the assumption that everything is split evenly in the short term, which it isn’t. In a small series of die-rolls, we expect there to be some unevenness.

If you’ve ever worked on an online game with some kind of random-number generator before, you’ve probably heard this one: a player writes tech support to tell you that your random number generator is clearly broken and not random, and they know this because they just killed 4 monsters in a row and got 4 of the exact same drop, and those drops are only supposed to happen 10% of the time, so this should almost never happen, so clearly your die-roller is busted.

You do the math. 1/10 * 1/10 * 1/10 * 1/10 is 1 in 10,000, which is pretty infrequent. This is what the player is trying to tell you. Is there a problem?

It depends. How many players are on your server? Let’s say you’ve got a reasonably popular game, and you get 100,000 daily players. How many of those kill four monsters in a row? Maybe all of them, multiple times per day, but let’s be conservative and say that half of them are just there to trade stuff in the auction house or chat on the RP servers or whatever, so only half of them actually go out monster-hunting. What’s the chance this will happen to someone? On a scale like that, you’d expect it to happen several times a day, at least!

Incidentally, this is why it seems like every few weeks at least, someone wins the lottery, even though that someone is never you or anyone you know. If enough people play each week, the odds are you’ll have at least one obnoxiously lucky sap somewhere… but that if you play the lottery, you’ve got worse odds of winning than your odds of being hired at Infinity Ward.

Cards and Dependence

Now that we’ve talked about independent events like die-rolling, we have a lot of powerful tools to analyze the randomness of many games. Things get a little more complicated when we talk about drawing cards from a deck, because each card you draw influences what’s left in the deck. If you have a standard 52-card deck and draw, say, the 10 of Hearts, and you want to know the probability that the next card is also a heart, the odds have changed because you’ve already removed a heart from the deck. Each card that you remove changes the probability of the next card in the deck. Since each card draw is influenced by the card draws that came before, we call this dependent probability.

Note that when I say “cards” here I am talking about any game mechanic where you have a set of objects and you draw one of them without replacing, so in this case “deck of cards” is mechanically equivalent to a bag of tiles where you draw a tile and don’t replace it, or an urn where you’re drawing colored balls from (I’ve never actually seen a game that involves drawing balls from an urn, but probability professors seem to have a love of them for some reason).

Properties of Dependence

Just to be clear, with cards I’m assuming that you are drawing cards, looking at them, and removing them from the deck. Each of these is an important property.

If I had a deck with, say, six cards numbered 1 through 6, and I shuffled and drew a card and then reshuffled all six cards between card draws, that is equivalent to a d6 die roll; no result influences the future ones. It’s only if I draw cards and don’t replace them that pulling a 1 on my first roll makes it more likely I’ll draw 6 next time (and it will get more and more likely until I finally draw it, or until I reshuffle).

The fact that we are looking at the cards is also important. If I pull a card from the deck but don’t look at it, I have no additional information, so the probabilities haven’t really changed. This is something that may sound counterintuitive; how does just flipping a card over magically change the probabilities? But it does, because you can only calculate the probability of unknown stuff based on what you do know. So, for example, if you shuffle a standard deck, reveal 51 cards and none of them is the Queen of Clubs, you know with 100% certainty that this is what the missing card is. If instead you shuffle a standard deck, and take 51 cards away without revealing them, the probability that the last card is the Queen of Clubs is still 1/52. For each additional card you reveal, you get more information.

Calculating probabilities for dependent events follows the same principles as independent, except it’s a little trickier because the probabilities are changing whenever you reveal a card. So you have to do a lot of multiplying different things together rather than multiplying the same thing against itself if you want to repeat a challenge. Really, all this means is we have to put together everything we’ve done already, in combination.

Example

You shuffle a standard 52-card deck, and draw two cards. What’s the probability that you’ve drawn a pair? There are a few ways to compute this, but probably the easiest is to say this: what’s the probability that the first card you draw makes you totally ineligible to draw a pair? Zero, so the first card doesn’t really matter as long as the second card matches it. No matter what we draw for our first card, we’re still in the running to draw a pair, so we have a 100% chance that we can still get a pair after drawing the first card.

What’s the probability that the second card matches? There are 51 cards remaining in the deck, and 3 of them match (normally it’d be 4 out of 52, but you already removed a “matching” card on your first draw!) so the probability ends up being exactly 1/17. (So, the next time that guy sitting across the table from you in Texas Hold ‘Em says “wow, another pocket pair? Must be my lucky day” you know there’s a pretty good chance he’s bluffing.)

What if we add two jokers so it’s now a 54-card deck, and we still want to know the chance of drawing a pair? Occasionally your first card will be a joker, and there will only be one matching card in the rest of the deck, rather than 3. How do we figure this out? By splitting up the probabilities and then multiplying each possibility.

Your first card is either going to be a Joker, or Something Else. Probability of a Joker is 2/54, probability of Something Else is 52/54.

If the first card is a Joker (2/54), then the probability of a match on the second card is 1/53. Multiplying these together (we can do that since they’re separate events and we want both to happen), we have 1/1431 – less than a tenth of a percent.

If the first card is Something Else (52/54), the probability of a match on the second card is up to 3/53. Multiplying these together, we have 78/1431 (a little more than 5.5%).

What do we do with these two results? Since they do not overlap, and we want to know the probability of either of them, we add! 79/1431 (still around 5.5%) is the final answer.

If we really wanted to be careful, we could calculate the probability of all other possible results: drawing a Joker and not matching, or drawing Something Else and not matching, and adding those together with the probability of winning, and we should get exactly 100%. I won’t do the math here for you, but feel free to do it yourself to confirm.

The Monty Hall Problem

This brings us to a pretty famous problem that tends to really confuse people, called the Monty Hall problem. It’s called that because there used to be this game show called Let’s Make a Deal, with your host, Monty Hall. If you’ve never seen the show, it was sort of like this inverse The Price Is Right. In The Price Is Right, the host (used to be Bob Barker, now it’s… Drew Carey? Anyway…) is your friend. He wants to give away cash and fabulous prizes. He tries to give you every opportunity to win, as long as you’re good at guessing how much their sponsored items actually cost.

Monty Hall wasn’t like that. He was like Bob Barker’s evil twin. His goal was to make you look like an idiot on national television. If you were on the show, he was the enemy, you were playing a game against him, and the odds were stacked in his favor. Maybe I’m being overly harsh, but when your chance of being selected as a contestant seems directly proportional to whether you’re wearing a ridiculous-looking costume, I tend to draw these kinds of conclusions.

Anyway, one of the biggest memes from the show was that you’d be given a choice of three doors, and they would actually call them Door Number 1, Door Number 2 and Door Number 3. They’d give you a door of your choice… for free! Behind one door, you’re told, is a fabulous prize like a Brand New Car. Behind the other doors, there’s no prize at all, no nothing, those other two doors are worthless. Except the goal is to humiliate you, so they wouldn’t just have an empty door, they’d have something silly-looking back there like a goat, or a giant tube of toothpaste, or something… something that was clearly not a Brand New Car.

So, you’d pick your door, and Monty would get ready to reveal if you won or not… but wait, before we do that, let’s look at one of the other doors that you didn’t choose. Since Monty knows where the prize is, and there’s only one prize and two doors you didn’t choose, no matter what he can always reveal a door without a prize. Oh, you chose Door Number 3? Well, let’s reveal Door Number 1 to show you that there was no prize there. And now, being the generous guy he is, he gives you the chance to trade your Door Number 3 for whatever’s behind Door Number 2 instead. And here’s where we get into probability: does switching doors increase your chance of winning, or decrease it, or is it the same? What do you think?

The real answer is that switching increases your chance of winning from 1/3 to 2/3. This is counterintuitive. If you haven’t seen this problem before, you’re probably thinking: wait, just by revealing a door we’ve magically changed the odds? But as we saw with our card example earlier, that is exactly what revealed information does. Your odds of winning with your first pick are obviously 1/3, and I think everyone here would agree to that. When that new door is revealed, it doesn’t change the odds of your first pick at all – it’s still 1/3 –but that means the other door now has a 2/3 chance of being the right one.

Let’s look at it another way. You choose a door. Chance of winning: 1/3. I offer to swap you for both of the other doors, which is basically what Monty Hall is doing. Sure, he reveals one of them to not be a prize, but he can always do that, so that doesn’t really change anything. Of course you’d want to switch!

If you’re still wondering about this and need more convincing, clicking here will take you to a wonderful little Flash app that lets you explore this problem. You can actually play, starting with something like 10 doors and eventually working down your way to 3; there’s also a simulator where you can give it any number of doors from 3 to 50 and just play on your own, or to have it actually run a few thousand simulations and give you how many times you would have won if you stayed versus when you switched.

Monty Hall, Redux

Now, in practice on the actual show, Monty Hall knew this, because he was good at math even if his contestants weren’t. So here’s what he’d do to change the game a little. If you picked the door with the prize behind it, which does happen 1/3 of the time, he’d always offer you the chance to switch. After all, if you’ve got a car and then you give it away for a goat, you’re going to look pretty dumb, which is exactly what he wants, because that’s the kind of evil guy he is. But if you pick a door with no prize behind it, he’ll only offer you the chance to switch about half of those times, and the other half he’ll just show you your Brand New Goat and boot you off the stage. Let’s analyze this new game, where Monty can choose whether or not to give you the chance to switch.

Suppose he follows this algorithm: always let you switch if you picked the door with the car, otherwise he has a 50/50 chance of giving you your goat or giving you the chance to switch. Now what are your chances of winning?

1/3 of the time, you pick the prize right away and he offers you to switch.

Of the remaining 2/3 of the time (you pick wrong initially), half of the time he’ll offer to switch, half the time he won’t. Half of 2/3 is 1/3, so basically 1/3 of the time you get your goat and leave, 1/3 of the time you picked wrong and he offers the switch, and 1/3 of the time you picked right and he offers the switch.

If he offers an exchange, we already know that the 1/3 of the time when he gives you your goat and you leave didn’t happen. That is useful information, because it means our chance of winning has now changed. Of the 2/3 of the time where we’re given a choice, 1/3 means we guessed right, and the other 1/3 means we guessed wrong, so if we’re given a choice at all it means our probability of winning is now 50/50, and there’s no mathematical advantage to keeping or switching.

Like in Poker, this is no longer a game of math and now a game of psychology. Did Monty offer you a choice because he thinks you’re a sucker who doesn’t know that switching is the “right” choice, and that you’ll stubbornly hold onto the door you picked because psychologically it’s worse to have a car and then lose it? Or does he think you’re smart and that you’ll switch, and he’s offering you the chance because he knows you guessed right at the beginning and you’ll take the bait and fall into his trap? Or maybe he’s being uncharacteristically nice, and goading you into doing something in your own best interest, because he hasn’t given away a car in awhile and his producers are telling him the audience is getting bored and he’d better give away a big prize soon so their ratings don’t drop?

In this way, Monty manages to offer a choice (sometimes) while still keeping the overall probability of winning at 1/3. Remember, a third of the time you’ll just lose outright. A third of the time you’ll guess right initially, and 50% of that time you’ll win (1/3 x 1/2 = 1/6). And a third of the time, you’ll guess wrong initially but be given the choice to switch, and 50% of that time you’ll win (also 1/6). Add the two non-overlapping win states together and you get 1/3, so whether you switch or stay your overall odds are 1/3 throughout the whole game… no better than if you just guessed and he showed you the door, without any of this switching business at all! So the point of offering to switch doors is not done for the purpose of changing the odds, but simply because drawing out the decision makes for more exciting television viewing.

Incidentally, this is one of the same reasons Poker can be so interesting, is that most of the formats involve slowly revealing cards in between rounds of betting (like the Flop, Turn and River in Texas Hold ‘Em), because you start off with a certain probability of winning and that probability is changing in between each betting round as more cards are revealed.

The Sibling Problem

And that brings us to another famous problem that tends to throw people, the Siblings problem. This is about the only thing I’m writing about today that isn’t directly related to games (although I guess that just means I should challenge you to come up a game mechanic that uses this). It’s more a brain teaser, but a fun one, and in order to solve it you really have to be able to understand conditional probability like we’ve been talking about.

The question is this: I have a friend with two kids, and at least one of them is a girl. What is the probability that the other one is also a girl? Assume that in the normal human population, there’s a 50/50 chance of having a boy or a girl, and assume that this is universally true for any child (in reality some men actually do produce more X or Y sperm, so that would skew the odds a bit where if you know one of their kids is already a girl, that the odds are slightly higher they’ll have more girls, and then there are conditions like hermaphrodism, but for our purposes let’s ignore that and assume that each kid is an independent trial with an equal chance of being male or female).

Intuitively, since we’re dealing with a core 1/2 chance, we would expect the answer would be something like 1/2 or 1/4 or some other nice, round number that’s divisible by 2. The actual answer is 1/3. Wait, what?

The trick here is that the information we were given narrows down the possibilities. Let’s say the parents are Sesame Street fans and so no matter what the sex, they name their kids A and B. Under normal conditions, there are four possibilities that are equally likely: A and B are both boys, A and B are both girls, A is boy and B is girl, or A is girl and B is boy. Since we know at least one of them is a girl, we can eliminate the possibility that A and B are both boys, so we have three (still equally likely) scenarios remaining. Since they’re equally likely and there are three of them, we know each one has a probability of 1/3. Only one of those three scenarios involves two girls, so the answer is 1/3.

The Sibling Problem, Redux

It gets weirder. Suppose instead I tell you my friend has two children, and one is a girl who was born on a Tuesday. Assume that under normal conditions, a child is equally likely to be born on any of the seven days of the week. What’s the probability the other child is also a girl? You’d think the answer would still be 1/3; what does Tuesday have to do with anything? But again, intuition fails. The actual answer is 13/27, which isn’t just unintuitive, it’s plain old weird-looking. What’s going on here?

Tuesday actually changes the odds, again because we don’t know which child it was, or if both children were born on Tuesday. By the same logic as earlier, we count all valid combinations of children where at least one is a Tuesday girl. Again assuming the children are named A and B, the combinations are:

  • A is a Tuesday girl, B is a boy (there are 7 possibilities here, one for each day of the week that B could be born on).
  • B is a Tuesday girl, A is a boy (again, 7 possibilities).
  • A is a Tuesday girl, B is a girl born on a different day of the week (6 possibilities).
  • B is a Tuesday girl, A is a non-Tuesday girl (again, 6 possibilities).
  • A and B are both girls born on Tuesday (1 possibility, but we have to take care not to double-count this).

Adding it up, there are 27 different, equally likely combinations of children and days with at least one Tuesday girl. Of those, 13 possibilities involve two girls. Again, this is totally counterintuitive, and apparently designed for no other reason than to make your brain hurt. If you’re still scratching your head, ludologist Jesper Juul has a nice explanation of this problem on his website.

If You’re Working on a Game Now…

If a game you’re designing has any randomness, this is a great excuse to analyze it. Choose a random element you want to analyze. For that element, first ask yourself what kind of probability you’re expecting to see, what makes sense to you in the context of the game. For example, if you’re making an RPG and looking at the probability that the player will hit a monster in combat, ask yourself what to-hit percentage feels right to you. Usually in console RPGs, misses by the player are very frustrating, so you wouldn’t usually want them to miss a lot… maybe 10% of the time or less? If you’re an RPG designer you probably know better than I, but you should have some basic idea of what feels right.

Then, ask yourself if this is something that’s dependent (like cards) or independent (like dice). Break down all possible results, and the probabilities of each. Make sure your probabilities sum to 100%. And lastly, of course, compare the actual numbers to the numbers you were expecting. Is this particular random die-roll or card-draw acting how you want it to, or do you see signs that you need to adjust the numbers? And of course, if you do find something to adjust, you can use these same calculations to figure out exactly how much to adjust it!

Homework

Your “homework” this week is meant to help you practice your probability skills. I have two dice games and a card game for you to analyze using probability, and then a weird mechanic from a game I once worked on that provides a chance to try out a Monte Carlo simulation.

Game #1: Dragon Die

This is a dice game that I invented with some co-workers one day (thanks Jeb Havens and Jesse King!) specifically to mess with people’s heads on probability. It’s a simple casino game called Dragon Die, and it’s a dice gambling contest between you and the House. You are given a standard 1d6, and you roll it. You’re trying to roll higher than the House. The House is given a non-standard 1d6 – it’s similar to yours, but instead of a 1 it has a Dragon on it (so the House die is Dragon-2-3-4-5-6). If the House rolls a Dragon, then the House automatically wins and you automatically lose. If you both roll the same number, it’s a push, and you both re-roll. Otherwise, the winner is whoever rolls highest.

Obviously, the odds are slightly against the player here, because the House has this Dragon advantage. But how much of an advantage is it? You’re going to calculate it. But first, before you do, exercise your intuition. Suppose I said this game was offered with a 2 to 1 payout. That is, if you win, you keep your bet and get twice your bet in winnings. So, if you bet $1 and win, you keep your $1 and get $2 extra, for a total of $3. If you lose, you just lose your standard bet. Would you play? That is, intuitively, do you think the odds are better or worse than 2 to 1? Said another way, for every 3 games you play, do you expect to win more than once, or less than once, or exactly once, on average?

Once you’ve used your intuition, do the math. There are only 36 possibilities for both dice, so you should have no problem counting them all up. If you’re not sure about this “2 to 1” business, think of it this way: suppose you played the game 36 times (wagering $1 each time). A win nets you $2 up, a loss causes you to lose $1, and a push is no change. Count up your total winnings and losses and figure out if you come out ahead or behind. And then ask yourself how close your intuition was. And then realize how evil I am.

And yes, if you’re wondering, the actual dice-roll mechanics here are something I’m intentionally obfuscating, but I’m sure you’ll all see through that once you sit down and look at it. Try and solve it yourself. I’ll post all answers here next week.

Game #2: Chuck-a-Luck

There is a gambling dice game called Chuck-a-Luck (also known as Birdcage, because sometimes instead of rolling dice they’re placed in a wire cage that somewhat resembles a Bingo cage). This is a simple game that works like this: place your bet (say, $1) on any number from 1 to 6. You then roll 3d6. For each die that your number shows up on, you get $1 in winnings (and you get to keep your original bet). If no dice show your number, the house takes your $1 and you get nothing. So, if you place on 1 and you roll triple 1s, you actually win $3.

Intuitively, it seems like this is an even-odds game. Each die is individually a 1/6 chance of winning, so adding all three should give you a 3/6 chance of winning. But of course, if you calculate that way you’re adding when these are separate die-rolls, and remember, you’re only allowed to add if you’re talking about separate win conditions from the same die. You need to be multiplying something.

When you count out all possible results (you’ll probably find it easier to do this in Excel than by hand since there are 216 results), it still looks at first like an even-odds game. But in reality, the odds of winning are actually slightly in favor of the House; how much? In particular, on average, how much money do you expect to lose each time you play this game? All you have to do is add up the gains and losses for all 216 results, then divide by 216, so this should be simple… but as you’ll see, there are a few traps you can fall into, which is why I’m telling you right now that if you think it’s even-odds, you’ve got it wrong.

Game #3: 5-Card Stud Poker

When you’ve warmed up with the previous two exercises, let’s try our hand at dependent probability by looking at a card game. In particular, let’s assume Poker with a 52-card deck. Let’s also assume a variant like 5-card Stud where each player is dealt 5 cards, and that’s all they get. No ability to discard and draw, no common cards, just a straight-up you get 5 cards and that’s what you get.

A “Royal Flush” is the 10-J-Q-K-A in the same suit, and there are four suits, so there are four ways to get a Royal Flush. Calculate the probability that you’ll get one.

One thing I’ll warn you about here: remember that you can draw those five cards in any order. So you might draw an Ace first, or a Ten, or whatever. So the actual way you’ll be counting these, there are actually a lot more than 4 ways to get dealt a Royal Flush, if you consider the cards to be dealt sequentially!

Game #4: IMF Lottery

This fourth question is one that can’t easily be solved through the methods we’ve talked about today, but you can simulate it pretty easily, either with programming or with some fudging around in Excel. So this is a way to practice your Monte Carlo technique.

In a game I worked on that I’ve mentioned before called Chron X, there was this really interesting card called IMF Lottery. Here’s how it worked: you’d put it into play. At the end of your turn, the game would roll a percentile, and there was a 10% chance it would leave play, and a random player would gain 5 of each resource type for every token on the card. The card didn’t start with any tokens, but if it stayed around then at the start of each of your turns, it gained a token. So, there is a 10% chance you’ll put it into play, end your turn, and it’ll leave and no one gets anything. If that doesn’t happen (90% chance), then there is a further 10% chance (actually 9% at this point, since it’s 10% of 90%) that on the very next turn, it’ll leave play and someone will get 5 resources. If it leaves play on the turn after that (10% of the remaining 81%, so 8.1% chance) someone gets 10 resources, then the next turn it would be 15, then 20, and so on. The question is, what is the expected value of the number of total resources you’ll get from this card when it finally leaves play?

Normally, we’d approach this by finding the probability of each outcome, and multiplying by the outcome. So there is a 10% chance you get 0 (0.1*0 = 0). There’s a 9% chance you get 5 resources (that’s 9%*5 = 0.45 resources). There’s an 8.1% chance you get 10 resources (8.1%*10 = 0.81 resources total, expected value). And so on. And then we add all of these up.

Now, you can quickly see a problem: there is always going to be a chance that it will not leave play, so this could conceivably stay in play without leaving forever, for an infinite number of turns, so there’s no actual way to write out every single possibility. The techniques we learned today don’t give us a way to deal with infinite recursion, so we’ll have to fake it.

If you know enough programming or scripting to feel comfortable doing this, write a program to simulate this card. You should have a while loop that initializes a variable to zero, rolls a random number, and 10% of the time it exits the loop. Otherwise it adds 5 to the variable, and iterates. When it finally breaks out of the loop, have it increment the total number of trials by 1, and the total number of resources by whatever the variable ended up as. Then, re-initialize the variable and try again. Run this a few thousand times. At the end, divide total number of resources by total number of trials, and that’s your Monte Carlo expected value. Run the program a few times to see if the numbers you’re getting are about the same; if there’s still a lot of variation in your final numbers, increase the number of iterations in the outer loop until you start getting some consistency. And you can be pretty sure that whatever you come up with is going to be about right.

If you don’t know programming (or even if you do), this is an excuse to exercise your Excel skills. You can never have enough Excel skills as a game designer.

Here you’ll want to make a good use of the IF and RAND statements. RAND takes no values, it just returns a random decimal number between 0 and 1. Usually we combine it with FLOOR and some plusses or minuses to simulate a die roll, as I mentioned earlier. In this case, though, we just have a 10% check for the card leaving play, so we can just check if RAND is less then 0.1 and not mess with this other stuff.

IF takes in three values. In order: a condition that’s either true or false, and then a value to return if it’s true, and then a value to return if it’s false. So the following statement will return 5 ten percent of the time, and 0 the other ninety percent of the time:

=IF(RAND()<0.1,5,0)

There are a lot of ways to set this up, but if I were doing it, I’d use a formula like this for the cell that represents the first turn, let’s say this is cell A1:

=IF(RAND()<0.1,0,-1)

Here I’m using negative one as shorthand for “this card hasn’t left play and given out any resources yet.” So if the first turn ended and the card left play right away, A1 would be zero; otherwise it’s -1.

For the next cell, representing the second turn:

=IF(A1>-1, A1, IF(RAND()<0.1,5,-1))

So if the first turn ended and the card left play right away, A1 would be 0 (number of resources), and this cell would just copy that value. Otherwise A1 would be -1 (hasn’t left play yet), and this cell proceeds to roll randomly again: 10% of the time it returns 5 (for the 5 resources), the rest of the time it is still -1. Continuing this formula for additional cells simulates additional turns, and whatever cell is at the end gives you a final result (or -1 if it never left play after all of the turns you’re simulating).

Take this row of cells, which represents a single play of the card, and copy and paste for a few hundred (or a few thousand) rows. We might not be able to do an infinite test for Excel (there are only so many cells that fit in a spreadsheet), but we can at least cover the majority of cases. Then, have a single cell where you take the average of the results of all the turns (Excel helpfully provides the AVERAGE() function for this).

In Windows, at least, you can hit F9 to reroll all your random numbers. As before, do that a few times and see if the values you get are similar to each other. If there’s too much variety, double the number of trials and try again.

Unsolved Problems

If you happen to have a Ph.D. in Probability already and the problems above are too easy for you, here are two problems that I’ve wondered about for years, but I don’t have the math skills to solve them. If you happen to know how to do these, post as a comment; I’d love to know how.

Unsolved #1: IMF Lottery

The first unsolved problem is the previous homework. I can do a Monte Carlo simulation (either in C++ or Excel) pretty easily and be confident of the answer for how many resources you get, but I don’t actually know how to come up with a definitive, provable answer mathematically (since this is an infinite series). If you know how, post your math… after doing your own Monte Carlo simulation to verify the answer, of course.

Unsolved #2: Streaks of Face Cards

This problem, and again this is way beyond the scope of this blog post, is a problem I was posed by a fellow gamer over 10 years ago. They witnessed a curious thing while playing Blackjack in Vegas: out of an eight-deck shoe, they saw ten face cards in a row (a face card is 10, J, Q or K, so there are 16 of them in a standard 52-card deck, which means there are 128 of them in a 416-card shoe). What is the probability that there is at least one run of ten or more face cards, somewhere, in an eight-deck shoe? Assume a random, fair shuffle. (Or, if you prefer, what are the odds that there are no runs of ten or more face cards, anywhere in the sequence?)

You can simplify this. There’s a string of 416 bits. Each bit is 0 or 1. There are 128 ones and 288 zeros scattered randomly throughout. How many ways are there to randomly interleave 128 ones and 288 zeros, and how many of those ways involve at least one clump of ten or more 1s?

Every time I’ve sat down to solve this problem, it seems like it should be really easy and obvious at first, but then once I get into the details it suddenly falls apart and becomes impossible. So before you spout out a solution, really sit down to think about it and examine it, work out the actual numbers yourself, because every person I’ve ever talked to about this (and this includes a few grad students in the field) has had that same “it’s obvious… no, wait, it’s not” reaction. This is a case where I just don’t have a technique for counting all of the numbers. I could certainly brute-force it with a computer algorithm, but it’s the mathematical technique that I’d find more interesting to know.

Level 3: Transitive Mechanics and Cost Curves

July 21, 2010

Readings/Playings

None for this week, other than this post. This is a pretty long post, though, so it should be enough. As with last week, you’ll be doing a bit of outside research to compensate.

This Week’s Topic

This week is one of the most exciting for me, because we really get to dive deep into nuts-and-bolts game balance in a very tangible way. We’ll be talking about something that I’ve been doing for the past ten years, although until now I’ve never really written down this process or tried to communicate it to anyone. I’m going to talk about how to balance transitive mechanics within games.

As a reminder, intransitive is like Rock-Paper-Scissors, where everything is better than something else and there is no single “best” move. In transitive games, some things are just flat out better than others in terms of their in-game effects, and we balance that by giving them different costs, so that the better things cost more and the weaker things cost less in the game. How do we know how much to cost things? That is a big problem, and that is what we’ll be discussing this week.

Examples of Transitive Mechanics

Just to contextualize this, what kinds of games do we see that have transitive mechanics? The answer is, most of them. Here are some examples:

  • RPGs often have currency costs to upgrade your equipment and buy consumable items. Leveling is also transitive: a higher-level character is better than a lower-level character in nearly every RPG I can think of.
  • Shooters with progression mechanics like BioShock and Borderlands include similar mechanics. In BioShock, for example, there are costs to using vending machines to get consumable items, and you also spend ADAM to buy new special abilities; higher-level abilities are just better (e.g. doing more damage) than their lower-level counterparts, but they cost more to buy.
  • Professional sports in the real world do this with monetary costs: a player who is better at the game commands a higher salary.
  • Sim games (like The Sims and Sim City) have costs for the various objects you can buy, and often these are transitive. A really good bed in The Sims costs more than a cheap bed, but it also performs its function of restoring your needs much more effectively.
  • Retro arcade games generally have a transitive scoring mechanism. The more dangerous or difficult an enemy, the more points you get for defeating it.
  • Turn-based and real-time strategy games may have a combination of transitive and intransitive mechanics. Some unit types might be strong or weak against others inherently (in an intransitive fashion), like the typical “footmen beat archers, archers beat fliers, fliers beat footmen” comparison. However, you also often see a class of units that all behave similarly, but with stronger and more expensive versions of weaker ones… such as light infantry vs. heavy infantry.
  • Tower Defense games are often intransitive in that certain tower types are strong against certain kinds of attacks, like splash damage is strong against enemies that come clustered together (intransitive), but in most of these games the individual towers are upgradeable to stronger versions of themselves and the stronger versions cost more (transitive).
  • Collectible-card games are another example where there may be intransitive mechanics (and there are almost always intransitive elements to the metagame, thank goodness – that is, a better deck isn’t just “more expensive”), but the individual cards themselves generally have some kind of cost and they are all balanced according to that cost, so that more expensive cards are more useful or powerful.

You might notice something in common with most of these examples: in nearly all cases, there is some kind of resource that is used to buy stuff: Gil in Final Fantasy, Mana in Magic: the Gathering, ADAM in BioShock. Last week we talked about relating everything to a single resource in order to balance different game objects against each other, and as you might expect, this is an extension of that concept.

However, another thing we said last week is that a central resource should be the win or loss condition for the game, and we see that is no longer the case here (the loss condition for an RPG is usually running out of Hit Points, not running out of Gold Pieces). In games that deal with costs, it is common to make the central resource something artificially created for that purpose (some kind of “currency” in the game) rather than a win or loss condition, because everything has a monetary cost.

Costs and Benefits

With all that said, let’s assume we have a game with some kind of currency-like resource, and we want to balance two things where one might be better than the other but it costs more. I’ll start with a simple statement: in a transitive mechanic, everything has a set of costs and a set of benefits, and all in-game effects can be put in terms of one or the other.

When we think of costs we’re usually thinking in terms of resource costs, like a sword that costs 250 Gold. But when I use this term, I’m defining it more loosely to be any kind of drawback or limitation. So it does include resource costs, because that is a setback in the game. But for example, if the sword is only half as effective against demons, that is part of a cost as well because it’s less powerful in some situations. If the sword can only be equipped by certain character classes, that’s a limitation (you can’t just buy one for everyone in your party). If the sword disintegrates after 50 encounters, or if it does 10% of damage dealt back to the person wielding it, or if it prevents the wielder from using magic spells… I would call all of these things “costs” because they are drawbacks or limitations to using the object that we’re trying to balance.

If costs are everything bad, then benefits are everything good. Maybe it does a lot of damage. Maybe it lets you use a neat special ability. Maybe it offers some combination of increases to your various stats.

Some things are a combination of the two. What if the sword does 2x damage against dragons? This is clearly a benefit (it’s better than normal damage sometimes), but it’s also a limitation on that benefit (it doesn’t do double damage all the time, only in specific situations). Or maybe a sword prevents you from casting Level 1 spells (obviously a cost), but if most swords in the game prevent you from casting all spells, this is a less limiting limitation that provides a kind of net benefit. How do you know whether to call something a “cost” or a “benefit”? For our purposes, it doesn’t matter: a negative benefit is the same as a cost, and vice versa, and our goal is to equalize everything. We want the costs and benefits to be equal, numerically. Whether you add to one side or subtract from the other, the end result is the same.

Personally, I find it easiest to keep all numbers positive and not negative, so if something would be a “negative cost” I’ll call it a benefit. That way I only have to add numbers and never subtract them. Adding is easier. But if you want to classify things differently, go ahead; the math works out the same anyway.

So, this is the theory. Add up the costs for an object. Add up the benefits. The goal is to get those two numbers to be equal. If the costs are less than the benefits, it’s too good: add more costs or remove some benefits. If the costs are greater than the benefits, it’s too weak; remove costs or add benefits. You might be wondering how we would relate two totally different things (like a Gold cost and the number of Attack Points you get from equipping a sword). We will get to that in a moment. But first, there’s one additional concept I want to introduce.

Overpowered vs. Underpowered vs. Overcosted vs. Undercosted

Let’s assume for now that we can somehow relate everything back to a single resource cost so they can be directly compared. And let’s say that we have something that provides too many benefits for its costs. How do we know whether to reduce the benefits, increase the costs, or both?

In most cases, we can do either one. It is up to the designer what is more important: having the object stay at its current cost, or having it retain its current benefits. Sometimes it’s more important that you have an object within a specific cost range because you know that’s what the player can afford when they arrive in the town that sells it. Sometimes you just have this really cool effect that you want to introduce to the game, and you don’t want to mess with it. Figure out what you want to stay the same… and then change the other thing.

Sometimes, usually when you’re operating at the extreme edges of a system, you don’t get a choice. For example, if you have an object that’s already free, you just can’t reduce the cost anymore, so it is possible you’ve found an effect that is just too weak at any cost. Reducing the cost further is impossible, so you have no choice: you must increase the benefits. We have a special term for this: we say the object is underpowered, meaning that it is specifically the level of benefits (not the cost) that must be adjusted.

Likewise, some objects are just too powerful to exist in the game at any cost. If an object has an automatic “I win / you lose” effect, it would have to have such a high cost that it would be essentially unobtainable. In such cases we say it is overpowered, that is, that the level of benefits must be reduced (and that a simple cost increase is not enough to solve the problem).

Occasionally you may also run into some really unique effects that can’t easily be added to, removed, or modified; the benefits are a package deal, and the only thing you can really do is adjust the cost. In this case, we might call the object undercosted if it is too cheap, or overcosted if it is too expensive.

I define these terms because it is sometimes important to make the distinction between something that is undercosted and something that’s overpowered. In both cases the object is too good, but the remedy is different.

There is a more general term for an object that is simply too good (although the cost or benefits could be adjusted): we say it is above the curve. Likewise, an object that is too weak is below the curve. What do curves have to do with anything? We’ll see as we talk about our next topic.

Cost Curves

Let’s return to the earlier question of how to relate things as different as Gold, Attack Points, Magic Points, or any other kinds of stats or abilities we attach to an object. How do we compare them directly? The answer is to put everything in terms of the resource cost. For example, if we know that each point of extra Attack provides a linear benefit and that +1 Attack is worth 25 Gold, then it’s not hard to say that a sword that gives +10 Attack should cost 250 Gold. For more complicated objects, add up all the costs (after putting them in terms of Gold), add up all the benefits (again, converting them to their equivalent in Gold), and compare. How do you know how much each resource is worth? That is what we call a cost curve.

Yes, this means you have to take every possible effect in the game, whether it be a cost or a benefit, and find the relative values of all of these things. Yes, it is a lot of work up front. On the bright side, once you have this information about your game, creating new content that is balanced is pretty easy: just put everything into your formula and you can pretty much guarantee that if the numbers add up, it’s balanced.

Creating a Cost Curve

The first step, and the reason it’s called a “cost curve” and not a “cost table” or “cost chart” or “cost double-entry accounting ledger” is that you need to figure out a relationship between increasing resource costs and increasing benefits. After that, you need to figure out how all game effects (positive and negative) relate to your central resource cost. Neither of these is usually obvious.

Defining the relationship between costs and benefits

The two might scale linearly: +1 cost means +1 benefit. This relationship is pretty rare.

Costs might be on an increasing curve, where each additional benefit costs more than the last, so incremental gains get more and more expensive as you get more powerful. You see this a lot in RPGs, for example. The amount of currency you receive from exploration or combat encounters is increasing over time. As a result, if you’re getting more than twice as much Gold per encounter as you used to earlier, even if a new set of armor costs you twice as much as your old one, it would actually take you less time to earn the gold to upgrade. Additionally, the designer might want incremental games for other design reasons, such as to create more interesting choices. For example, if all stat gains cost the same amount, it’s usually an obvious decision to dump all of your gold into increasing your one or two most important stats while ignoring the rest; but if each additional point in a stat costs progressively more, players might consider exploring other options. Either way, you might see an increasing curve (such as a triangular or exponential curve), where something twice as good actually costs considerably more than twice as much.

Some games have costs on a decreasing curve instead. For example, in some turn-based strategy games, hoarding resources has an opportunity cost. In the short term, everyone else is buying stuff and advancing their positions, and if you don’t make purchases to keep up with them, you could fall hopelessly behind. This can be particularly true in games where purchases are limited: wait too long to buy your favorite building in Puerto Rico and someone else might buy it first; or, wait too long to build new settlements in Settlers of Catan and you may find that other people have built in the best locations. In cases like this, if the designer wants resource-hoarding to be a viable strategy, they must account for this opportunity cost by making something that costs twice as much be more than twice as good.

Some games have custom curves that don’t follow a simple, single formula or relationship. For example, in Magic: the Gathering, your primary resource is Mana and you generally are limited to playing one Mana-generating card per turn. If a third of your deck is cards that generate Mana, you’ll get (on average) one Mana-genrating card every three card draws. Since your opening hand is 7 cards and you typically draw one card per turn, this means a player would typically gain one Mana per turn for the first four turns, and then one mana every three turns thereafter. Thus, we might expect to see a shift in the cost curve at or around five mana, where suddenly each additional point of Mana is worth a lot more, which would explain why some of the more expensive cards have crazy-huge gameplay effects.

In some games, any kind of cost curve will be potentially balanced, but different kinds of curves have different effects. For example, in a typical Collectible Card Game, players are gaining new resources at a constant rate throughout the game. If a game has an increasing cost curve where higher costs give progressively smaller gains, it puts a lot of focus on the early game: cheap cards are almost as good as the more expensive ones, so bringing out a lot of forces early on provides an advantage over waiting until later to bring out only slightly better stuff. If instead you feature a decreasing cost curve where the cheap stuff is really weak and the expensive stuff is really powerful, this instead puts emphasis on the late game, where the really huge things dominate. You might have a custom curve that has sudden jumps or changes at certain thresholds, to guide the play of the game into definite early-game, mid-game and late-game phases. None of these are necessarily “right” or “wrong” in a universal sense. It all depends on your design goals, in particular your desired game length, number of turns, and overall flow of the gameplay.

At any rate, this is one of your most important tasks when balancing transitive systems: figuring out the exact nature of the cost curve, as a numeric relationship between costs and benefits.

Defining basic costs and benefits

The next step in creating a cost curve is to make a complete list of all costs and benefits in your game. Then, starting with common ones that are used a lot, identify those objects that only do one thing and nothing else. From there, try to figure out how much that one thing costs. (If you were unsure about the exact mathematical nature of your cost curve, something like this will probably help you figure that out.)

Once you’ve figured out how much some of the basic costs and benefits are worth, start combining them. Maybe you know how much it costs to have a spell that grants a damage bonus, and also how much it costs to have a spell that grants a defense bonus. What about a spell that gives both bonuses at the same time? In some games, the cost for a combined effect is more than their separate costs, since you get multiple bonuses for a single action. In other games, the combined cost is less than the separate costs, since both bonuses are not always useful in combination or might be situational. In other games, the combined cost is exactly the sum of the separate costs. For your game, get a feel for how different effects combine and how that influences their relative costs. Once you know how to cost most of the basic effects in your game and how to combine them, this gives you a lot of power. From there, continue identifying how much new things cost, one at a time.

At some point you will start also identifying non-resource costs (drawbacks and limitations) to determine how much they cost. Approach these the same way: isolate one or more objects where you know the numeric costs and benefits of everything except one thing, and then use basic arithmetic (or algebra, if you prefer) to figure out the missing number.

Another thing you’ll eventually need to examine are benefits or costs that have limitations stacked on them. If a benefit only works half of the time because of a coin-flip whenever you try to use it, is that really half of the cost compared to if it worked all the time, or is it more or less than half? If a benefit requires you to meet conditions that have additional opportunity costs (“you can only use this ability if you have no Rogues in your party”), what is that tradeoff worth in terms of how much it offsets the benefit?

An Example: Cost Curves in Action

To see how this works in practice, I’m going to use some analysis to derive part of the cost curve for Magic 2011, the recent set that was just promoted recently for Magic: the Gathering. The reason I’m choosing this game is that CCGs are among the most complicated games to balance in these terms – a typical base or expansion set may have hundreds of cards that need to be individually balanced – so if we can analyze Magic then we can use this for just about anything else. Note that by necessity, we’re going into spoiler territory here, so if you haven’t seen the set and are waiting for the official release, consider this your spoiler warning.

For convenience, we’ll examine Creature cards specifically, because they are the type of card that is the most easily standardized and directly compared: all Creatures have a Mana cost (this is the game’s primary resource), Power and Toughness, and usually some kind of special ability. Other card types tend to only have special, unique effects that are not easily compared.

For those of you who have never played Magic before, that is fine for our purposes. As you’ll see, you won’t need to understand much of the rules in order to go through this analysis. For example, if I tell you that the Flying ability gives a benefit equivalent to 1 mana, you don’t need to know (or care) what Flying is or what it does; all you need to know is that if you add Flying to a creature, the mana cost should increase by 1. If you see any jargon that you don’t recognize, assume you don’t need to know it. For those few parts of the game you do need to know, I’ll explain as we go.

Let us start by figuring out the basic cost curve. To do this, we first examine the most basic creatures: those with no special abilities at all, just a Mana cost, Power and Toughness. Of the 116 creatures in the set, 11 of them fall into this category (I’ll ignore artifact creatures for now, since those have extra metagame considerations).

Before I go on, one thing you should understand about Mana costs is that there are five colors of Mana: White (W), Green (G), Red (R), Black (B), and Blue (U). There’s actually a sixth “type” called colorless which means any color you want. Thus, something with a cost of “G4” means five mana, one of which must be Green, and the other four can be anything (Green or otherwise). We would expect that colored Mana has a higher cost than colorless, since it is more restrictive.

Here are the creatures with no special abilities:

  • W, 2/1 (that is, a cost of one White mana, power of 2, toughness of 1)
  • W4, 3/5
  • W1, 2/2
  • U4, 2/5
  • U1, 1/3
  • B2, 3/2
  • B3, 4/2
  • R3, 3/3
  • R1, 2/1
  • G1, 2/2
  • G4, 5/4

Looking at the smallest creatures, we immediately run into a problem with three creatures (I’m leaving the names off, since names aren’t relevant when it comes to balance):

  • W, 2/1
  • R1, 2/1
  • G1, 2/2

Apparently, all colors are not created equal: you can get a 2/1 creature for either W (one mana) or R1 (two mana), so an equivalent creature is cheaper in White than Red. Likewise, R1 gets you a 2/1 creature, but the equivalent-cost G1 gets you a 2/2, so you get more creature for Green than Red. This complicates our analysis, since we can’t use different colors interchangeably. Or rather, we could, but only if we assume that the game designers made some balance mistakes. (Such is the difficulty of deriving the cost curve of an existing game: if the balance isn’t perfect, and it’s never perfect, your math may be slightly off unless you make some allowances.) Either way, it means we can’t assume every creature is balanced on the same curve.

In reality, I would guess the designers did this on purpose to give some colors an advantage with creatures, to compensate for them having fewer capabilities in other areas. Green, for example, is a color that’s notorious for having really big creatures and not much else, so it’s only fair to give it a price break since it’s so single-minded. Red and Blue have lots of cool spell toys, so their creatures might be reasonably made weaker as a result.

Still, we can see some patterns here just by staying within colors:

  • W, 2/1
  • W1, 2/2
  • B2, 3/2
  • B3, 4/2

Comparing the White creatures, adding 1 colorless is equivalent to adding +1 Toughness. Comparing the Black creatures, adding 1 colorless mana is equivalent to adding +1 Power. We might guess, then, that 1 colorless (cost) = 1 Power (benefit) = 1 Toughness (benefit).

We can also examine similar creatures across colors to take a guess:

  • W, 2/1
  • R1, 2/1
  • W4, 3/5
  • U4, 2/5

From these comparisons, we might guess that Red and Blue seem to have an inherent -1 Power or -1 Toughness “cost” compared to White, Black and Green.

Is the cost curve linear, +1 benefit for each additional colored mana? It seems to be up to a point, but there appears to be a jump around 4 or 5 mana:

  • W, 2/1 (3 power/toughness for W)
  • W4, 3/5 (5 additional power/toughness for 4 additional colorless mana)
  • G1, 2/2 (4 power/toughness for G1)
  • G4, 5/4 (5 additional power/toughness for 3 additional colorless mana)

As predicted earlier, there may be an additional cost bump at 5 mana, since getting your fifth mana on the table is harder than the first four. Green seems to get a larger bonus than White.

From all of this work, we can take our first guess at a cost curve. Since we have a definite linear relationship between colorless mana and increased power/toughness, we will choose colorless mana to be our primary resource, with each point of colorless representing a numeric cost of 1. We know that each point of power and toughness provides a corresponding benefit of 1.

Our most basic card, W for 2/1, shows a total of 3 benefits (2 power, 1 toughness). We might infer that W must have a cost of 3. Or, using some knowledge of the game, we might instead guess that W has a cost of 2, and that all cards have an automatic cost of 1 just for existing – the card takes up a slot in your hand and your deck, so it should at least do something useful, even if its mana cost is zero, to justify its existence.

Our cost curve, so far, looks like this:

  • Cost of 0 provides a benefit of 1.
  • Increased total mana cost provides a linear benefit, up to 4 mana.
  • The fifth point of mana provides a double benefit (triple for Green), presumably to compensate for the difficulty in getting that fifth mana on the table.

Our costs are:

  • Baseline cost = 1 (start with this, just for existing)
  • Each colorless mana = 1
  • Each colored mana = 2
  • Total mana cost of 5 or more = +1 (or +2 for Green creatures)

Our benefits are:

  • +1 Power or +1 Toughness = 1
  • Being a Red or Blue creature = 1 (apparently this is some kind of metagame privilege).

We don’t have quite enough data to know if this is accurate. There may be other valid sets of cost and benefit numbers that would also fit our observations. But if these are accurate, we could already design some new cards.

How much would a 4/3 Blue creature cost? The benefit is 1 (Blue) + 4 (Power) + 3 (Toughness) = 8. Our baseline cost is 1, our first colored mana (U) is 2, and if we add four colorless mana that costs an extra 4… but that also makes for a total mana cost of 5, which would give an extra +1 to the cost for a total of 8. So we would expect the cost to be U4.

What would a 4/1 Green creature cost? The benefit is 5 (4 Power + 1 Toughness). A mana cost of G2 provides a cost of 5 (1 as a baseline, 2 for the colored G mana, and 2 for the colorless mana).

What if I proposed this card: W3 for a 1/4 creature. Is that balanced? We can add it up: the cost is 1 (baseline) + 2 (W) + 3 (colorless) = 6. The benefit is 1 (power) + 4 (toughness) = 5. So this creature is exactly 1 below the curve, and could be balanced by either dropping the cost to W2 or increasing it to 2/4 or 1/5.

So you can see how a small amount of information lets us do a lot, but also how we are limited: we don’t know what happens when we have several colored mana, we don’t know what happens when we go above 5 (or below 1) total mana, and we don’t know how to cost any special abilities. We could take a random guess based on our intuition of the game, but first let’s take a look at some more creatures. In particular, there are 18 creature cards in this set that only have standard special abilities on them:

  • W3, 3/2, Flying
  • WW3, 5/5, Flying, First Strike, Lifelink, Protection from Demons and Dragons
  • WW2, 2/3, Flying, First Strike
  • WW3, 4/4, Flying, Vigilance
  • W1, 2/1, Flying
  • WW, 2/2, First Strike, Protection from Black
  • W2, 2/2, Flying
  • U3, 2/4, Flying
  • BB, 2/2, First Strike, Protection from White
  • B2, 2/2, Swampwalk
  • B1, 2/1, Lifelink
  • R3, 3/2, Haste
  • GG5, 7/7, Trample
  • GG, 3/2, Trample
  • G3, 2/4, Reach
  • GG3, 3/5, Deathtouch
  • G, 0/3, Defender, Reach
  • GG4, 6/4, Trample

How do we proceed here? The easiest targets are those with only a single ability, like all the White cards with just Flying. It’s pretty clear from looking at all of those that Flying has the same benefit of +1 power or +1 toughness, which in our math has a benefit of 1.

We can also make some direct comparisons to the earlier list of creatures without abilities to derive benefits of several special abilities:

  • B2, 3/2
  • B2, 2/2, Swampwalk
  • R3, 3/3
  • R3, 3/2, Haste

Swampwalk and Haste (whatever those are) also have a benefit of 1. And we can guess from the B1, 2/1, Lifelink card and our existing math that Lifelink is also a benefit of 1.

We run into something curious when we examine some red and blue creatures at 4 mana. Compare the following:

  • W3, 3/2, Flying
  • W4, 3/5 (an extra +1 cost but +2 benefit, due to crossing the 5-mana threshold)
  • U3, 2/4 Flying (identical total cost to the W3 but +1 benefit… in Blue?)
  • R3, 3/3 (identical total cost and benefit to the W3, but Red?)

It appears that perhaps Red and Blue get their high-cost-mana bonus at a threshold of 4 mana rather than 5. Additionally, Flying may be cheaper for Blue than it is for White… but given that it would seem to have a cost of zero here, we might instead guess that the U3 creature is slightly above the curve.

We find another strange comparison in Green:

  • G3, 2/4, Reach (cost of 6, benefit of 6+Reach?)
  • G4, 5/4 (cost of 8, benefit of 9?)

At first glance, both of these would appear to be above the curve by 1. Alternatively, since the extra bonus seems to be consistent, this may have been intentional. We might guess that Green gets a high-cost bonus not just at 5 total mana, but also at 4 total mana, assuming that Reach (like the other abilities we’ve seen) has a benefit of 1. (In reality, if you know the game, Reach gives part of the bonus of Flying but not the other part, so it should probably give about half the benefit of Flying. Unfortunately, Magic does not offer half-mana costs in standard play, so the poor G3 is probably destined to be either slightly above or below the curve.)

Let’s assume, for the sake of argument, that the benefit of Reach is 1 (or that the original designers intended this to be the benefit and balanced the cards accordingly, at least). Then we can examine this card to learn about the Defender special ability:

  • G, 0/3, Defender, Reach

The cost is 1 (baseline) + 2 (G mana) = 3. The benefit is 3 (toughness) + 1 (Reach) + ? (Defender). From this, it would appear Defender would have to have a benefit of negative 1 for the card to be balanced. What’s going on?

If you’ve played Magic, this makes sense. Defender may sound like a special ability, but it’s actually a limitation: it means the card is not allowed to attack. We could therefore consider it as an additional cost of 1 (rather than a benefit of -1) and the math works out.

We’ve learned a lot, but there are still some things out of our immediate grasp right now. We’d love to know what happens when you have a second colored mana (does it also have a +2 cost like the first one?), and we’d also like to know what happens when you get up to 6 or 7 total mana (are there additional “high cost” bonus adjustments?). While we have plenty of cards with two colored mana in their cost, and a couple of high-cost Green creatures, all of these also have at least one other special ability that we haven’t costed yet. We can’t derive the costs and benefits for something when there are multiple unknown values; even if we figured out the right total level of benefits for our GG4 creature, for example, we wouldn’t know how much of that benefit was due to the second Green mana cost, how much came from being 6 mana total, and how much came from its Trample ability. Does this mean we’re stuck? Thankfully, we have a few ways to proceed.

One trick is to find two cards that are the same, except for one thing. Those cards may have several things we don’t know, but if we can isolate just a single difference then we can learn something. For example, look at these two cards:

  • GG4, 6/4, Trample
  • GG5, 7/7, Trample

We don’t know the cost of GG4 or GG5, and we don’t know the benefit of Trample, but we can see that adding one colorless mana that takes us from 6 to 7 gives us a power+toughness benefit of 4. A total cost of 7 must be pretty hard to get to!

We can also examine these two cards that have the same mana cost:

  • WW3, 5/5, Flying, First Strike, Lifelink, Protection from Demons and Dragons
  • WW3, 4/4, Flying, Vigilance

From here we might guess that Vigilance is worth +1 power, +1 toughness, First Strike, Lifelink, and the Protection ability, making Vigilance a really freaking awesome special ability that has a benefit of at least 4. Or, if we know the game and realize Vigilance just isn’t that great, we can see that the 5/5 creature is significantly above the curve relative to the 4/4.

We still don’t know how much two colored mana costs, so let’s use another trick: making an educated guess, then trying it out through trial and error. As an example, let’s take this creature:

  • GG, 3/2, Trample

We know the power and toughness benefits are 5, and since most other single-word abilities (Flying, Haste, Swampwalk, Lifelink) have a benefit of 1, we might guess that Trample also has a benefit of 1, giving a total benefit of 6. If that’s true, we know that the cost is 1 (baseline) + 2 (first G), so the second G must cost 3. Intuitively, this might make sense: having two colored mana places more restrictions on your deck than just having one.

We can look at this another way, comparing two similar creatures:

  • G1, 2/2
  • GG, 3/2, Trample

The cost difference between G1 and GG is the difference between a cost of 1 (colorless) and the cost of the second G. The benefit difference is 1 (for the extra power) + 1 (for Trample, we guess). This means the second G has a cost of 2 more than a colorless mana, which is a cost of 3.

We’re still not sure, though. Maybe the GG creature is above the curve, or maybe Green has yet another creature bonus we haven’t encountered yet. Let’s look at the double-colored-mana White creatures to see if the pattern holds:

  • WW, 2/2, First Strike, Protection from Black
  • WW2, 2/3, Flying, First Strike
  • WW3, 4/4, Flying, Vigilance

Assuming that Protection from Black, First Strike, and Vigilance each have a +1 benefit (similar to other special abilities), most of these seem on the curve. WW is an expected cost of 6; 2/2, First Strike, Protection from Black seems like a benefit of 6. WW3 is a cost of 10 (remember the +1 for being a total of five mana); 4/4, Flying, Vigilance is also probably 10.

The math doesn’t work as well with WW2 (cost of 8); the benefits of 2/3, Flying and First Strike only add up to 7. So, this card might be under the curve by 1.

Having confirmed that the second colored mana is probably a cost of +3, we can head back to Green to figure out this Trample ability. GG, 3/2, Trample indeed gives us a benefit of 1 for Trample, as we guessed earlier.

Now that we know Trample and the second colored mana, we can examine our GG4 and GG5 creatures again to figure out exactly what’s going on at the level of six or seven mana, total. Let’s first look at GG4, 6/4, Trample. This has a total benefit of 11. The parts we know of the cost are: 1 (baseline) + 2 (first G) + 3 (second G) + 4 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 12, so not only does the sixth mana apparently have no extra benefit but we’re already below the curve. (Either that, or Trample is worth more when you have a really high power/toughness, as we haven’t considered combinations of abilities yet.)

Let’s compare to GG5, 7/7, Trample. This has a benefit of 15. Known costs are 1 (baseline) + 2 (first G) + 3 (second G) + 5 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 13, so going from five to seven mana total has an apparent additional benefit of +2. We might then guess that the benefit is +1 for 6 mana and another +1 for 7 mana, and that the GG4 is just a little below the curve.

Lastly, we have this Deathtouch ability that we can figure out how, from the creature that is GG3, 3/5, Deathtouch. The cost is 1 (baseline) + 2 (first G) + 3 (second G) + 3 (colorless) + 1 (above 4 mana) + 1 (above 5 mana) = 11. Benefit is 8 (power and toughness) + Deathtouch, which implies Deathtouch has a benefit of 3. This seems high, when all of the other abilities are only costed at 1, but if you’ve played Magic you know that Deathtouch really is a powerful ability, so perhaps the high number makes sense in this case.

From here, there are an awful lot of things we can do to make new creatures. Just by going through this analysis, we’ve already identified several creatures that seem above or below the curve. (Granted, this is an oversimplification. Some cards are legacy from earlier sets and may not be balanced along the current curve. And every card has keywords which don’t do anything on their own, but some other cards affect them, so there is a metagame benefit to having certain keywords. For example, if a card is a Goblin, and there’s a card that gives all Goblins a combat bonus, that’s something that makes the Goblin keyword useful… so in some decks that card might be worth using even if it is otherwise below the curve. But keep in mind that this means some cards may be underpowered normally but overpowered in the right deck, which is where metagame balance comes into play. We’re concerning ourselves here only with transitive balance, not metagame balance, although we must understand that the two do affect each other.)

From this point, we can examine the vast majority of other cards in the set, because nearly all of them are just a combination of cost, power, toughness, maybe some basic special abilities we’ve identified already, and maybe one other custom special ability. Since we know all of these things except the custom abilities, we can look at almost any card to evaluate the benefit of its ability (or at least, the benefit assigned to it by the original designer). While we may not know which cards with these custom abilities are above or below the curve, we can at least get a feel for what kinds of abilities are marginally useful versus those that are really useful. We can also put numbers to them, and compare the values of each ability to see if they feel right.

Name That Cost!

Let’s take an example: W1, 2/2, and it gains +1 power and +1 toughness whenever you gain life. How much is that ability worth? Well, the cost is 4, the power/toughness benefit is 4, so that means this ability is free – either it’s nearly worthless, or the card is above the curve. Since there’s no intrinsic way to gain life in the game without using cards that specifically allow it, and since gaining life tends to be a weak effect on its own (since it doesn’t bring you closer to winning), we might guess this is a pretty minor effect, and perhaps the card was specifically designed to be slightly above the curve in order to give a metagame advantage to the otherwise underpowered mechanic of life-gaining.

Here’s another: W4, 2/3, when it enters play you gain 3 life. Cost is 8; power/toughness benefit is 5. That means the life-gain benefit is apparently worth 3 (+1 cost per point of life).

Another: UU1, 2/2, when it enters play return target creature to its owner’s hand. The cost here is 7; known benefits are 5 (4 for power/toughness, 1 for being Blue), so the return effect has a benefit of 2.

And another: U1, 1/1, tap to force target enemy creature to attack this turn if able. Cost is 4, known benefit is 3 (again, 2 for power/toughness, 1 for Blue), so the special ability is costed as a relatively minor benefit of 1.

Here’s one with a drawback: U2, 2/3, Flying, can only block creatures with Flying. Benefit is 5 (power/toughness) + 1 (blue) + 1 (Flying) = 7. Mana cost is 1 (baseline) + 2 (U) + 2 (colorless) = 5, suggesting that the blocking limitation is a +2 cost. Intuitively, that seems wrong, when Defender (complete inability to block) is only +1 cost, suggesting that this card is probably a little above the curve.

Another drawback: B4, 4/5, enters play tapped. Benefit is 9. Mana cost is 1 (baseline) + 2 (B) + 4 (colorless) + 1 (above 5 mana) = 8, so the additional drawback must have a cost of 1.

Here’s a powerful ability: BB1, 1/1, tap to destroy target tapped creature. Mana cost is 7. Power/toughness benefit is 2, so the special ability appears to cost 5. That seems extremely high; on the other hand, it is a very powerful ability, it combos well with a lot of other cards, so it might be justified. Or we might argue it’s strong (maybe a benefit of 3 or 4) but not quite that good, or maybe that it’s even stronger (benefit of 6 or 7) based on seeing it in play and comparing to other strong abilities we identify in the set, but this at least gives us a number for comparison.

So, you can see here that the vast majority of cards can be analyzed this way, and we could use this technique to get a pretty good feel for the cost curve of what is otherwise a pretty complicated game. Not all of the cards fit on the curve, but if you play the game for awhile you’ll have an intuitive sense of which cards are balanced and which feel too good or too weak. By using those “feels balanced” creatures as your baseline, you could then propose a cost curve and set of numeric costs and benefits, and then verify that those creatures are in fact on the curve (and that anything you’ve identified as intuitively too strong or too weak are correctly shown by your math as above or below the curve). Using what you do know, you can then take pretty good guesses at what you don’t know, to identify other cards (those you don’t have an opinion on yet) as being potentially too good or too weak.

In fact, even if you’re a player and not a game designer, you can use this technique to  help you identify which cards you’re likely to see at the tournament/competitive level.

Rules of Thumb

How do you know if your numbers are right? A lot of it comes down to figuring out what works for your particular game, through a combination of your designer intuition and playtesting. Still, I can offer a couple of basic pieces of advice.

First, a limited or restricted benefit is never a cost, and its benefit is always at least a little bit greater than zero. If you have a sword that does extra damage to Snakes, and there are only a few Snakes in the game in isolated locations, that is a very small benefit but it is certainly not a drawback.

Second, if you give the player a choice between two benefits, the cost of the choice must be at least the cost of the more expensive of the two benefits. Worst case, the player takes the better (more expensive) benefit every time, so it should be costed at least as much as what the player will choose. In general, if you give players a choice, try to make those choices give about the same benefit; if it is a choice between two equally good things, that choice is a lot more interesting than choosing between an obviously strong and an obviously weak effect.

Lastly, sometimes you have to take a guess, and you’re not in a position to playtest thoroughly. Maybe you don’t have a big playtest budget. Maybe your publisher is holding a gun to your head, telling you to ship now. Whatever the case, you’ve got something that might be a little above or a little below the curve, and you might have to err on one side or the other. If you’re in this situation, it’s better to make an object too weak than to make it too strong. If it’s too weak, the worst thing that happens is no one uses it, but all of the other objects in the game can still be viable – this isn’t optimal, but it’s not game-breaking. However, if one object is way too strong, it will always get used, effectively preventing everything else that’s actually on the curve from being used since the “balanced” objects are too weak by comparison. A sufficiently underpowered object is ruined on its own; a sufficiently overpowered object ruins the balance of the entire game.

Cost curves for new games

So far, we’ve looked at how to derive a cost curve for an existing game, a sort of design “reverse engineering” to figure out how the game is balanced. This is not necessarily an easy task, as it can be quite tedious at times, but it is at least relatively straightforward.

If you’re making a new game, creating a cost curve is much harder. Since the game doesn’t exist yet, you haven’t played it in its final form yet, which means you don’t have as much intuition for what the curve is or what kinds of effects are really powerful or really weak. This means you have to plan on doing a lot of heavy playtesting for balance purposes, after the core mechanics are fairly solidified, and you need to make sure the project is scheduled accordingly.

Another thing that makes it harder to create a cost curve for a new game is that you have the absolute freedom to balance the numbers however you want. With an existing game you have to keep all the numbers in line with everything that you’ve already released, so you don’t have many degrees of freedom; you might have a few options on how to structure your cost curve, but only a handful of options will actually make any sense in the context of everything you’ve already done. With a new game, however, there are no constraints; you may have thousands of valid ways to design your cost curve – far more than you’ll have time to playtest. When making a new game, you’ll need to grit your teeth, do the math where you can, take your best initial guess… and then get something into your playtesters’ hands as early as you can, so you have as much time as possible to learn about how to balance the systems in your game.

There’s another nasty problem when designing cost curves for new games: changes to the cost curve are expensive in terms of design time. As an example, let’s say you’re making a 200-card set for a CCG, and one of the new mechanics you’re introducing is the ability to draw extra cards, and 20 cards in the set use this mechanic in some way or other. Suppose you decide that drawing an extra card is a benefit of 2 at the beginning, but after some playtesting it becomes clear that it should actually be a benefit of 3. You now have to change all twenty cards that use that mechanic. Keep in mind that you will get the math wrong, because no one ever gets game balance right on the first try, and you can see where multiple continuing changes to the cost curve mean redoing the entire set several times over. If you have infinite time to playtest, you can just make these changes meticulously and one at a time until your balance is perfect. In the real world, however, this is an unsolved problem. The most balanced CCG that I’ve ever worked on, was a game where the cost curve was generated after three sets had already been released; it was the newer sets released after we derived the cost curve that were really good in terms of balance (and they were also efficient in terms of development time because the basic “using the math and nothing else” cards didn’t even need playtesting). Since then, I’ve tried to develop new games with a cost curve in mind, and I still don’t have a good answer for how to do this in any kind of reasonable way.

There’s one other unsolved problem, which I call the “escalation of power” problem, that is specific to persistent games that build on themselves over time – CCGs, MMOs, sequels where you can import previous characters, Facebook games, expansion sets for strategy games, and so on. Anything where your game has new stuff added to it over time, rather than just being a single standalone product. The problem is, in any given set, you are simply not going to be perfect. Every single object in your game will not be perfectly balanced along the curve. Some will be a little above, others will be a little below. While your goal is to get everything as close to the cost curve as possible, you have to accept right now that a few things will be a little better than they’re supposed to… even if the difference is just a microscopic rounding error.

Over time, with a sufficiently large and skilled player base, the things that give an edge (no matter how slight that edge) will rise to the top and become more common in use. And players will adapt to an environment where the best-of-the-best is what is seen in competitive play, and players become accustomed to that as the “standard” cost curve.

Knowing this, the game designer faces a problem. If you use the “old” cost curve and produce a new set of objects that is (miraculously) perfectly balanced, no one will use it, because none of it is as good as the best (above-the-curve) stuff from previous sets. In order to make your new set viable, you have to create a new cost curve that’s balanced with respect to the best objects and strategies in previous sets. This means, over time, the power level of the cost curve increases. It might increase quickly or slowly depending on how good a balancing job you do, but you will see some non-zero level of “power inflation” over time.

Now, this isn’t necessarily a bad thing, in the sense that it basically forces players to keep buying new stuff from you to stay current: eventually their old strategies, the ones that used to be dominant, will fall behind the power curve and they’ll need to get the new stuff just to remain competitive. And if players keep buying from us on a regular basis, that’s a good thing. However, there’s a thin line here, because when players perceive that we are purposefully increasing the power level of the game just to force them to buy new stuff, that gives them an opportunity to exit our game and find something else to do. We’re essentially giving an ultimatum, “buy or leave,” and doing that is dangerous because a lot of players will choose the “leave” option. So, the escalation-of-power problem is not an excuse for lazy design; while we know the cost curve will increase over time, we want that to be a slow and gradual process so that older players don’t feel overwhelmed, and of course we want the new stuff we offer them to be compelling in its own right (because it’s fun to play with, not just because it’s uber-powerful).

If You’re Working On a Game Now…

If you are designing a game right now, and that game has any transitive mechanics that involve a single resource cost, see if you can derive the cost curve. You probably didn’t need me to tell you that, but I’m saying it anyway, so nyeeeah.

Keep in mind that your game already has a cost curve, whether you are aware of it or not. Think of this as an opportunity to learn more about the balance of your game.

Homework

I’ll give you three choices for your “homework” this week. In each case, there are two purposes here. First, you will get to practice the skill of deriving a cost curve for an existing game. Second, you’ll get practice applying that curve to identify objects (whether those be cards, weapons, or whatever) that are too strong or too weak compared to the others.

Option 1: More Magic 2011

If you were intrigued by the analysis presented here on this blog, continue it. Find a spoiler list for Magic 2011 online (you shouldn’t have to look that hard), and starting with the math we’ve identified here, build as much of the rest of the cost curve as you can. As you do this, identify the cards that you think are above or below the curve. For your reference, here’s the math we have currently (note that you may decide to change some of this as you evaluate other cards):

  • Mana cost: 1 (baseline); 1 for each colorless mana; 2 for the first colored mana, and 3 for the second colored mana.
  • High cost bonus: +1 cost if the card requires 4 or more mana (Red, Blue and Green creatures only); +1 cost if the card requires 5 or more mana (White, Black, and Green creatures only – yes, Green gets both bonuses); and an additional +1 cost for each total mana required above 5.
  • Special costs: +1 cost for the Defender special ability.
  • Benefits: 1 per point of power and toughness. 1 for Red and Blue creatures.
  • Special benefits: +1 benefit for Flying, First Strike, Trample, Lifelink, Haste, Swampwalk, Reach, Vigilance, Protection from White, Protection from Black. +2 benefit for Deathtouch.

You may also find some interesting reading in Mark Rosewater’s archive of design articles for Magic, although finding the relevant general design stuff in the sea of  articles on the minutiae of specific cards and sets can be a challenge (and it’s a big archive!).

Option 2: D&D

If CCGs aren’t your thing, maybe you like RPGs. Take a look at whatever Dungeons & Dragons Players Handbook edition you’ve got lying around, and flip to the section that gives a list of equipment, particularly all the basic weapons in the game, along with their Gold costs. Here you’ll have to do some light probability that we haven’t talked about yet, to figure out the average damage of each weapon (hint: if you roll an n-sided die, the average value of that die is (n+1)/2, and yes that means the “average” may be a fraction; if you’re rolling multiple dice, compute the average for each individual die and then add them all together). Then, relate the average weapon damage to the Gold cost, and try to figure out the cost curve for weapons.

Note that depending on the edition, some weapons may have “special abilities” like longer range, or doing extra damage against certain enemy types. Remember to only try to figure out the math for something when you know all but one of the costs or benefits, so start with the simple melee weapons and once you’ve got a basic cost curve, then try to derive the more complicated ones.

If you find that this doesn’t take you very long and you want an additional challenge, do the cost curve for armors in the game as well, and see if you can find a relation between damage and AC.

Option 3: Halo 3

If neither of the other options appeals to you, take a look at the FPS genre, in particular Halo 3. This is a little different because there isn’t really an economic system in the game, so there’s no single resource used to purchase anything. However, there is a variety of weapons, and each weapon has a lot of stats: effective range, damage, fire rate, and occasionally a special ability such as area-effect damage or dual-wield capability.

For this exercise, use damage per second (“dps”) as your primary resource. You’ll have to find a FAQ (or experiment by playing the game or carefully analyzing gameplay videos on YouTube) to determine the dps for each weapon; to compute dps, take the amount of damage and multiply by fire rate (in shots-per-second) and that is dps.

Relate everything else to dps to try to figure out the tradeoffs between dps and accuracy, range, and each special ability. (For some things like “accuracy” that can’t be easily quantified, you may have to fudge things a bit by just making up some numbers).

Then, analyze. Which weapons feel above or below the curve based on your cost curve? How much dps would you add or remove from each weapon to balance it? And of course, is this consistent with your intuition (either from playing the game, or reading comments in player forums)?

Level 2: Numeric Relationships

July 14, 2010

Course Announcements

As promised last week, signups for the paid course are now closed. If you are just finding this blog now, I apologize, but you wouldn’t want to start two weeks behind anyway. If you’re coming late to the party, best advice I can give you is to start reading this blog from the beginning and catch up whenever you do.

Readings/Playings

None for this week, other than this post… but you will be doing a bit of reading later for your “homework” to compensate.

This Week’s Topic

This week, I’m going to talk about the different kinds of numbers you see in games and how to classify them. This is going to be important later, because you can’t really know how to balance a game or how to choose the right numbers unless you first know what kinds of numbers you’re dealing with. Sometimes, a balance change is as simple as replacing one kind of number with another, so understanding what kinds of numbers there are and getting an intuition for how they work is something we need to cover before anything else.

In particular, we’re going to be examining relationships between numbers. Numbers in games don’t exist in a vacuum. They only have meaning in relation to each other. For example, suppose I tell you that the main character in a game does 5 damage when he attacks. That tells you nothing unless you know how much damage enemies can take before they keel over dead. Now you have two numbers, Damage and Hit Points, and each one only has meaning in relation to the other.

Or, suppose I tell you that a sword costs 250 Gold. That has no meaning, until I tell you that the player routinely finds bags with thousands of Gold lying around the country side, and then you know the sword is cheap. Or, I tell you that the player only gets 1 Gold at most from winning each combat, and then it’s really expensive. Even within a game, the relative value of something can change; maybe 250 Gold is a lot at the start of the game but it’s pocket change at the end. In World of Warcraft, 1 Gold used to be a tidy sum, but today it takes tens or hundreds to buy the really epic loot.

With all that said, what kinds of ways can numbers be related to each other?

Identity and Linear Relationships

Probably the simplest type of relationship, which math geeks would call an identity relationship, is where two values change in exactly the same way. Add +1 to one value, it’s equivalent to adding +1 to the other. For game balance purposes, you can treat the two values as identical.

You would think that in such a case, you might just make a single value, but there are some cases where it makes sense to have two different values that just happen to have a one-to-one conversion. As an example, Ultima III: Exodus has Food, something that each character needed to not starve to death in a dungeon. You never got food as an item drop, and could only buy it from food vendors in towns. Food decreases over time, and has no other value (and cannot be sold or exchanged for anything else); its only purpose is to act as a continual slow drain on your resources. Each character also has Gold, something that they find while adventuring. Unlike food, Gold doesn’t degrade over time, and it is versatile (you can use it to bribe guards, buy hints, purchase weapons or armor… or purchase Food). While these are clearly two separate values that serve very different purposes within the game, each unit of Food costs 1 Gold (10 Food costs 10 Gold, 1000 Food costs 1000 Gold, and so on). Food and Gold have an identity relationship… although it is one-way in this case, since you can convert Gold to Food but not vice versa.

A more general case of an identity relationship is the linear relationship, where the conversion rate between two values is a constant. If a healing spell always costs 5 MP and heals exactly 50 HP, then there is a 1-to-10 linear relationship between MP and HP. If you can spend 100 Gold to gain +1 Dexterity, there’s a 100-to-1 linear relationship between Gold and Dexterity. And so on.

Note that we are so far ignoring cases where a relationship is partly random (maybe that healing spell heals somewhere between 25 and 75 HP, randomly chosen each time). Randomness is something we’ll get into in a few weeks, so we’re conveniently leaving that out of the picture for now.

Exponential and Triangular Relationships

Sometimes, a linear relationship doesn’t work for your game. You may have a relationship where there are either increasing or diminishing returns.

For example, suppose a player can pay resources to gain additional actions in a turn-based strategy game. One extra action might be a small boost, but three or four extra actions might be like taking a whole extra turn — it might feel a lot more than 3 or 4 times as powerful as a single action. This would be increasing returns: each extra action is more valuable than the last. You would therefore want the cost of each extra action to increase, as you buy more of them.

Or, maybe you have a game where players have incentive to spend all of their in-game money every turn to keep pace with their opponents, and hoarding cash has a real opportunity cost (that is, they miss out on opportunities they would have had if they’d spent it instead). In this case, buying a lot of something all at once is actually not as good as buying one at a time, so it makes sense to give players a discount for “buying in bulk” as it were. Here we have a decreasing return, where each extra item purchased is not as useful as the last.

In such cases, you need a numeric relationship that increases or decreases its rate of exchange as you exchange more or less at a time. The simplest way to do this is an exponential relationship: when you add to one value, multiply the other one. An example is doubling: for each +1 you give to one value, double the other one. This gives you a relationship where buying 1, 2, 3, 4 or 5 of something costs 1, 2, 4, 8 or 16, respectively. As you can see, the numbers get really big, really fast when you do this.

Because the numbers get prohibitively large very quickly, you have to be careful when using exponential relationships. For example, nearly every card in any Collectible Card Game that I’ve played that has the word “double” on it somewhere (as in, one card doubles some value on another card) ends up being too powerful. I know offhand of one exception, and that was an all-or-nothing gamble where it doubled your attack strength but then made you lose at the end of the turn if you hadn’t won already! The lesson here is to be very, very careful when using exponentials.

What if you want something that increases, but not as fast as an exponential? A common pattern in game design is the triangular relationship. If you’re unfamiliar with the term, you have probably at least seen this series:

1, 3, 6, 10, 15, 21, 28, …

That is the classic triangular pattern (so called because several ways to visualize it involve triangles). In our earlier example, maybe the first extra action costs 1 resource; the next costs 2 (for a running total of 3), the next costs 3 (for a total of 6), and so on.

An interesting thing to notice about triangular numbers is when you look at the difference between each successive pair of numbers. The difference between the first two numbers (1 and 3) is 2. The difference between the next two numbers (3 and 6) is 3. The next difference (between 6 and 10) is 4. So the successive differences are linear: they follow the pattern 1, 2, 3, 4…

Triangular numbers usually make a pretty good first guess for increasing costs. What if you want a decreasing cost, where something starts out expensive and gets cheaper? In that case, figure out how much the first one should cost, then make each one after that cost 1 less. For example, suppose you decide the first Widget should cost 7 Gold. Then try making the second cost 6 Gold (for a total of 13), the third costs 5 Gold (total of 18), and so on.

Note that in this case, you will eventually reach a point where each successive item costs zero (or even negative), which gets kind of ridiculous. This is actually a pretty common thing in game balance, that if you have a math formula the game balance will break at the mathematical extremes. The design solution is to set hard limits on the formula, so that you don’t ever reach those extremes. In our Widget example above, maybe the players are simply prevented from buying more than 3 or 4 Widgets at a time.

Other Numeric Relationships

While linear and triangular relationships are among the most common in games, they are not the only ones available. In fact, there are an infinite number of potential numeric relationships. If none of the typical relationships work for your game, come up with your own custom relationship!

Maybe you have certain cost peaks, where certain thresholds cost more than others because those have in-game significance. For example, if everything in your game has 5 hit points, there is actually a huge difference between doing 4 or 5 damage, so that 5th point of damage will probably cost a lot more than you would otherwise expect. You might have oscillations, where several specific quantities are particularly cheap (or expensive). You can create any ratio between two values that you want… but do so with some understanding of what effect it will have on play!

Relationships Within Systems

Individual values in a game usually exist within larger systems. By analyzing all of the different numbers and relationships between them in a game’s systems, we can gain a lot of insight into how the game is balanced.

Let us take a simple example: the first Dragon Warrior game for the NES. In the game’s combat system, you have four main stats: Hit Points (HP), Magic Points (MP), Attack and Defense. This is a game of attrition; you are exploring game areas, and every few steps you get attacked by an enemy. You lose if your HP is ever reduced to zero.

How are all of these numbers related? Random encounters are related to HP: each encounter reduces HP (you can also say it the other way: by walking around getting into fights, you can essentially convert HP into encounters). This is an inverse relationship, as more encounters means less HP.

There’s a direct relationship between HP and Defense: the more defense you have, the less damage you take, which means your HP lasts longer. Effectively, increasing your Defense is equivalent to giving yourself a pile of extra HP.

Ironically, we see the same relationship between HP and Attack. The higher your attack stat, the faster you can defeat an enemy. If you defeat an enemy faster, that means it has less opportunity to damage you, so you take less damage. Thus, you can survive more fights with higher Attack.

MP is an interesting case, because you can use it for a lot of things. There are healing spells that directly convert MP into HP. There are attack spells that do damage (hopefully more than you’d do with a standard attack); like a higher Attack stat, these finish combats earlier, which means they preserve your HP. There are buff/debuff spells that likewise reduce the damage you take in a combat. There are teleport spells that take you across long distances, so that you don’t have to get in fights along the way, so these again act to preserve your HP. So even though MP is versatile, virtually all of the uses for it involve converting it (directly or indirectly) into HP.

If you draw this all out on paper, you’ll see that everything — Attack, Defense, MP, Monster Encounters — is linked directly to HP. As the loss condition for the game, the designers put the HP stat in the middle of everything! This is a common technique, making a single resource central to all of the others, and it is best to make this central resource either the win or loss condition for the game.

Now, there’s one additional wrinkle here: the combat system interacts with two other systems in the game through the monster encounters. After you defeat a monster, you get two things: Gold and Experience (XP). These interact with the economic and leveling systems in the game, respectively.

Let’s examine the leveling system first. Collect enough XP and you’ll level up, which increases all of your stats (HP, MP, Attack and Defense). As you can see, this creates a feedback loop: defeating enemies causes you to gain a level, which increases your stats, which lets you defeat more enemies. And in fact, this would be a positive feedback loop that would cause the player to gain high levels of power very fast, if there weren’t some kind of counteracting force in the game. That counteraction comes in the form of an increasing XP-to-Level relationship, so it takes progressively more and more XP to gain a level. Another counteracting force is that of player time; while the player could maximize their level by just staying in the early areas of the game beating on the weakest enemies, the gain is so slow that they are incentivized to take some risks so they can level a little faster.

Examining the economic system, Gold is used for a few things. Its primary use is to buy equipment which permanently increases the player’s Attack or Defense, thus effectively converting Gold into extra permanent HP. Gold can also be used to buy consumable items, most of which mimic the effects of certain spells, thus you can (on a limited basis, since you only have a few inventory slots) convert Gold to temporary MP. Here we see another feedback loop: defeating monsters earns Gold, which the player uses to increase their stats, which lets them defeat even more monsters. In this case, what prevents this from being a positive feedback loop is that it’s limited by progression: you have a limited selection of equipment to buy, and the more expensive stuff requires that you travel to areas that you are just not strong enough to reach at the start of the game. And of course, once you buy the most expensive equipment in the game, extra Gold doesn’t do you much good.

Another loop that is linked to the economic system, is that of progression itself. Many areas in the game are behind locked doors, and in order to open them you need to use your Gold to purchase magic keys. You defeat monsters, get Gold, use it to purchase Keys, and use those keys to open new areas which have stronger monsters (which then let you get even more Gold/XP). Of course, this loop is itself limited by the player’s stats; unlocking a new area with monsters that are too strong to handle does not help the player much.

How would a designer balance things within all these systems? By relating everything back to the central value of HP, and then comparing.

For example, say you have a healing spell and a damage spell, and you want to know which is better. Calculate the amount of HP that the player would no longer lose as a result of using the damage spell and ending the combat earlier, and compare that to the amount of HP actually restored by the healing spell. Or, say you want to know which is better, a particular sword or a particular piece of armor. Again, figure out how much extra HP each would save you.

Now, this does not mean that everything in the game must be exactly equal to be balanced. For example, you may want spells that are learned later in the game to be more cost-effective, so that the player has reason to use them. You may also want the more expensive equipment to be less cost-effective, in order to make the player really work for it. However, at any given time in the game, you probably want the choices made available at that time to be at least somewhat balanced with each other. For example, if the player reaches a new town with several new pieces of equipment, you would expect those to be roughly equivalent in terms of their HP-to-cost ratios.

Another Example

You might wonder, if this kind of analysis works for a stat-driven game like an RPG, is it useful for any other kind of game? The answer is yes. Let’s examine an action title, the original Super Mario Bros. (made popular from the arcade and NES versions).

What kinds of resources do we have in Mario? There are lives, coins, and time (from a countdown timer). There’s actually a numeric score. And then there are objects within the game — coin blocks, enemies, and so on — which can sometimes work for or against you depending on the situation. Let us proceed to analyze the relationships.

  • Coins: there is a 100-to-1 relationship between Coins and Lives, since collecting 100 coins awards an extra life. There is a 1-to-200 relationship between Coins and Score, since collecting a coin gives 200 points. There is a relationship between Coin Blocks and Coins, in that each block gives you some number of coins.
  • Time: there is a 100-to-1 relationship between Time and Score, since you get a time bonus at the end of each level. There is also an inverse relationship between Time and Lives, since running out of time costs you a life.
  • Enemies: there is a relationship between Enemies and Score, since killing enemies gives you from 100 to 1000 points (Depending on the enemy). There is an inverse relationship between Enemies and Lives, since sometimes an enemy will cost you a life. (In a few select levels there is potentially a positive relationship between Enemies and Lives, as stomping enough enemies in a combo will give extra lives, but that is a special case.)
  • Lives: there is this strange relationship between Lives and everything else, because losing a life resets the Coins, Time and Enemies on a level. Note that since Coins give you extra Lives, and losing a Life resets Coins, any level with more than 100 Coins would provide a positive feedback loop where you could die intentionally, get more than 100 Coins, and repeat to gain infinite lives. The original Super Mario Bros. did not have any levels like this, but Super Mario 3 did.
  • Relationship between Lives and Score: There is no direct link between Lives and Score. However, losing a Life resets a bunch of things that give scoring opportunities, so indirectly you can convert a Life to Score. Interestingly, this does not happen the other way around; unlike other arcade games of the time, you cannot earn extra Lives by getting a sufficiently high Score.

Looking at these relationships, we see that Score is actually the central resource in Super Mario Bros. since everything is tied to Score. This makes sense in the context of early arcade games, since the win condition is not “beat the game,” but rather, “get the highest score.”

How would you balance these resources with one another. There are a few ways. You can figure out how many enemies you kill and their relative risks (that is, which enemies are harder to kill and which are more likely to kill you). Compare that with how many coins you find in a typical level, and how much time you typically complete the level with. Then, you can either change the amount of score granted to the player from each of these things (making a global change throughout the game), or you can vary the number of coins and enemies, the amount of time, or the length of a level (making a local change within individual levels). Any of these techniques could be used to adjust a player’s expected total score, and also how much each of these activities (coin collecting, enemy stomping, time completion) contributes to the final score.

When you’re designing a game, note that you can change your resources around, and even eliminate a resource or change the central resource to something else. The Mario series survived this quite well; the games that followed the original eliminated Score entirely, and everything was later related to Lives.

Interactions Between Relationships

When you form chains or loops of resources and relationships between them, the relationships stack with each other. They can either combine to become more intense, or they can cancel each other out (completely or partially).

We just saw one example of this in the Mario games, with Lives and Coins. If you have a level that contains 200 Coins, then the 100 Coins to 1 Life relationship combines with 1 Life to 200 Coins in that level, to create a doubling effect where you convert 1 Life to 2 Lives in a single iteration.

Here’s another example, from the PS2 game Baldur’s Gate: Dark Alliance. In this action-RPG, you get XP from defeating enemies, which in turn causes you to level up. The XP-to-Level relationship is triangular: going from Level 1 to Level 2 requires 1000 XP, Level 2 to Level 3 costs 2000 XP, rising to Level 4 costs 3000 XP, and so on.

Each time you level up, you get a number of upgrade points to spend on special abilities. These also follow a triangular progression: at Level 2 you get 1 upgrade point; at Level 3 you get 2 points; the next level gives you 3 points, then the next gives you 4 points, and so on.

However, these relationships chain together, since XP gives you Levels and Levels give you Upgrade Points. Since XP is the actual resource the player is earning, it is the XP-to-Points ratio we care about, and the two triangular relationships actually cancel with each other to form a linear relationship of 1000 XP to 1 Upgrade Point. While the awarding of these upgrade points is staggered based on levels, on average you are earning them at a constant XP rate.

How does Time fit into this (as in, the amount of time the player spends on the game)? If the player were fighting the same enemies over and over for the same XP rewards, there would be a triangular increase in the amount of time it takes to earn a level (and a constant amount of time to earn each Upgrade Point, on average). However, as with most RPGs, there is a system of increasing XP rewards as the player fights stronger monsters. This increasing XP curve doesn’t increase as fast as the triangular progression of level-ups, which means that it doesn’t completely cancel out the triangular effect, but it does partly reduce it — in other words, you level up slightly faster in the early game and slower in the late game, but the play time between level gains doesn’t increase as fast as a triangular relationship.

Note, however, the way this interacts with Upgrade Points. Since the XP-to-Point ratio is linear, and the player gets an increasing amount of XP per unit time, they are actually getting an increasing rate of Upgrade Point gain!

This kind of system has some interesting effects. By changing the rate of XP gain (that is, exactly how fast the XP rewards increase for defeating enemies) you can change both the rate of leveling up and the rate of Upgrade Point gains. If the XP rewards increase faster than the triangular rate of the levels themselves, the player will actually level up faster as the game progresses. If the XP rewards increase more slowly than the rate of level ups, the player will level faster in the early game and slower in the late game (which is usually what you want, as it gives the player frequent rewards early on and starts spacing them out once they’ve committed to continued play). If the XP rewards increase at exactly the same rate, the player will level up at a more or less constant rate.

Suppose you decide to have the player gain levels faster in the early game and slower in the late game, but you never want them to go longer than an hour between levels. How would you balance the XP system? Simple: figure out what level they will be at in the late game, scale the XP gains to take about an hour per level up at that point, and then work your way backwards from there.

Note another useful property this leveling system has: it provides a negative feedback loop that keeps the player in a narrow range of levels during each point in the game. Consider two situations:

  • Over-leveling: The player has done a lot of level-grinding and is now too powerful for the enemies in their current region. For one thing, they’ll be able to defeat the nearby enemies faster, so they don’t have to stick around too long. For another, the XP gains aren’t that good if their level is already high; they are unlikely to gain much in the way of additional levels by defeating weaker enemies. The maximum level a player can reach is effectively limited by the XP-reward curve.
  • Under-leveling: Suppose instead the opposite case, where the player has progressed quickly through the game and is now at a lower level than the enemies in the current region. In this case, the XP gains will be relatively high (compared to the player’s level), and the player will only need to defeat a few enemies to level up quickly.

In either case, the game’s system pushes the player’s level towards a narrow range in the middle of the extremes. It is much easier to balance a combat system to provide an appropriate level of challenge, when you know what level the player will be at during every step of the way!

How Relationships Interact

How do you know how two numeric relationships will stack together? Here’s a quick-reference guide:

  • Two linear relationships that combine: multiply them together. If you can turn 1 of Resource A into 2 Resource B, and 1 Resource B into 5 Resource C, then there is a 1-to-10 conversion between A and C (2×5).
  • Linear relationship combines with an increasing (triangular or exponential) relationship: the increasing relationship just gets multiplied by a bigger number, but the nature of the curve stays the same.
  • Linear relationship counteracts an increasing relationship: if the linear conversion is large, it may dominate early on, but eventually the increasing relationship will outpace it. Exactly where the two curves meet and the game shifts from one to the other depends on the exact numbers, and tweaking these can provide an interesting strategic shift for the players.
  • Two increasing relationships combine: you end up with an increasing relationship that’s even faster than either of the two individually.
  • Two increasing relationships counteract one another: depends on the exact relationships. In general, an exponential relationship will dominate a triangular one (how fast this happens depends on the exact numbers used). Two identical relationships (such as two pure triangulars) will cancel out to form a linear or identity relationship.

If You’re Working On a Game Now…

Are you designing your own game right now? Try this: make a list of every resource or number in your game on a piece of paper. Put a box around each, and spread the boxes out. Then, draw arrows between each set of boxes that has a direct relationship in your game, and label the arrow with the kind of relationship (linear, triangular, exponential, etc.).

Use this diagram to identify a few areas of interest in the balance of your game:

  • Do you see any loops where a resource can be converted to something else, then maybe something else, and then back to the original? If you get back more of the original than you started with by doing this, you may have just identified a positive feedback loop in your game.
  • Do you see a central resource that everything else seems tied to? If so, is that central resource either the win or loss condition, or does it seem kind of arbitrary? If not, does it make sense to create a new central resource, perhaps by adding new relationships between resources?

You can then use this diagram to predict changes to gameplay. If you change the nature of a relationship, you might be able to make a pretty good guess at what other relationships will also change as a result, and what effect that might have on the game’s systems overall.

If your game is a single-player game with some kind of progression system, “Time” (as in, the amount of time the player spends actually playing the game) should be one of your resources, and you can use your diagram to see if the rewards and power gains the player gets from playing are expected to increase, decrease, or remain constant over time.

Homework

Here’s your game balance challenge for this week. First, choose any single-player game that you’ve played and are familiar with, that has progression mechanics. Examples of games with progression are action-adventure games (Zelda), action-RPGs (Diablo), RPGs (Final Fantasy), or MMORPGs (World of Warcraft). I’ll recommend that you choose something relatively simple, such as an NES-era game or earlier. You’re going to analyze the numbers in this game, and as you’ve seen from the earlier examples here, even simple games can have pretty involved systems.

In these games, there is some kind of progression where the player gains new abilities and/or improves their stats over time. As the player progresses, enemies get stronger; again this could just mean they have higher stats, or they might also gain new abilities that require better strategy and tactics to defeat.

Start by asking yourself this question: overall, what was the difficulty curve of the game like? Did it start off easy and get slowly, progressively harder? Or, did you notice one or more of these undesirable patterns:

  • A series of levels that seemed to go by very slowly, because the player was underpowered at the time and did not gain enough power fast enough to compensate, so you had to grind for a long time in one spot.
  • A sudden spike in difficulty with one dungeon that had much more challenging enemies than those that came immediately before or after.
  • A dungeon that was much easier than was probably intended, allowing you to blast through it quickly since you were much more powerful than the inhabitants by the time you actually reached it.
  • The hardest point in the game was not at the end, but somewhere in the middle. Perhaps you got a certain weapon, ally, or special ability that was really powerful, and made you effectively unbeatable from that point on until the end of the game.

So far, all you’re doing is using your memory and intuition, and it probably takes you all of a few seconds to remember the standout moments of epic win and horrible grind in your chosen game. It’s useful to build intuition, but it is even better to make your intuition stronger by backing it up with math. So, once you’ve written down your intuitive guesses at the points where the game becomes unbalanced, let’s start analyzing.

First, seek a strategy guide or FAQ that gives all of the numbers for the game. A web search may turn up surprisingly detailed walkthroughs that show you every number and every resource in the game, and exactly how they are all related.

Next, make a list on paper of all of the resources in the game. Using the FAQ as your guide, also show all relationships between the resources (draw arrows between them, and label the arrows with the relationship type). From this diagram, you may be able to identify exactly what happened.

For example, maybe you seemed to level up a lot in one particular dungeon, gaining a lot of power in a short time. In such a case, you might start by looking at the leveling system: perhaps there is a certain range of levels where the XP requirements to gain a level are much lower than the rest of the progression curve. You might also look at the combat reward system: maybe you just gain a lot more XP than expected from the enemies in that dungeon.

As another example, maybe the game felt too easy after you found a really powerful weapon. In this case you’d look at the combat system: look at how much damage you do versus how much enemies can take, as separate curves throughout the game, and identify the sudden spike in power when you get that weapon. You may be able to graphically see the relationship of your power level versus that of the enemies over time.

Lastly, if you do identify unbalanced areas of the game from this perspective, you should be able to use your numbers and curves to immediately suggest a change. Not only will you know exactly which resource needs to be changed, but also by how much.

This exercise will probably take you a few hours, as researching a game and analyzing the numbers is not a trivial task (even for a simple game). However, after doing this, you will be much more comfortable with identifying resources and relationships in games, and also being able to use your understanding of a game’s systems to improve the balance of those systems.

Level 1: Intro to Game Balance

July 7, 2010

Class Announcements

I have to admit I was a little surprised to see that people were still signing up for the paid course after it started, but I suppose it’s common enough for people to join a class early in the term. However, to be fair to those who signed up well in advance, I’ll be closing signups this Sunday (July 10) at midnight EDT. So, if you haven’t signed up and still want in, make sure to click the Paypal link before then!

Readings/Playings

If you haven’t already, you should watch the intro video for this course first, before reading on. You may need to create an account on that website, but registration for the intro video is free.

This Week’s Topic

This week is probably going to start a bit slow for those of you who are experienced game designers (or those who are hoping to dive deep into the details). Instead, I want to use this week mostly to get everyone into the mindset of a game designer presented with a balance task, and I want to lay out some basic vocabulary terms so we can communicate about game balance properly.

You can think of this week like a tutorial level. The difficulty and pacing of this course will ramp up in the following weeks.

What is Game Balance?

I would start by asking the question “what is game balance?” but I answered it in the teaser video already. While perhaps an oversimplification, we can say that game balance is mostly about figuring out what numbers to use in a game.

This immediately brings up the question: what if a game doesn’t have any numbers or math involved? The playground game of Tag has no numbers, for example. Does that mean that the concept of “game balance” is meaningless when applied to Tag?

The answer is that Tag does in fact have numbers: how fast and how long each player can run, how close the players are to each other, the dimensions of the play area, how long someone is “it.” We don’t really track any of these stats because Tag isn’t a professional sport… but if it was a professional sport, you’d better believe there would be trading cards and websites with all kinds of numbers on them!

So, every game does in fact have numbers (even if they are hidden or implicit), and the purpose of those numbers is to describe the game state.

How do you tell if a game is balanced?

Knowing if a game is balanced is not always trivial. Chess, for example, is not entirely balanced: it has been observed that there is a slight advantage to going first. However, it hasn’t been definitively proven whether this imbalance is mechanical (that is, there is a bona fide tactical/strategic advantage to the first move) or psychological (players assume there is a first-move advantage, so they trick themselves into playing worse when they go second). Interestingly, this first-move advantage disappears at lower skill levels; it is only observed at championship tournaments. Keep in mind that this is a game that has been played, in some form, for thousands of years. And we still don’t know exactly how unbalanced it is!

In the case of Chess, a greater degree of player skill makes the game unbalanced. In some cases, it works the other way around, where skilled players can correct an inherent imbalance through clever play. For example, in Settlers of Catan, much of the game revolves around trading resources with other players. If a single player has a slight gameplay advantage due to an improved starting position, the other players can agree to simply not trade with that player for a time (or only offer unfair trades at the expense of that player) until such time as the starting positions equalize. This would not happen in casual games, as the players would be unable to recognize a slight early-game advantage; at the tournament level, however, players would be more likely to spot an inherent imbalance in the game, and act accordingly.

In short, game balance is not an easy or obvious task. (But you probably could have figured that out, given that I’m going to talk for ten straight weeks on the subject!)

Towards a critical vocabulary

Just like last summer, we need to define a few key terms that we’ll use as we talk about different kinds of balance.

Determinism

For our purposes, I define a “deterministic” game as one where if you start with a given game state and perform a particular action, it will always produce the same resulting new game state.

Chess and Go and Checkers are all deterministic. You never have a situation where you move a piece, but due to an unexpected combat die roll the piece gets lost somewhere along the way, or something. (Unless you’re playing a nondeterministic variant, anyway.)

Candyland and Chutes & Ladders are not deterministic. Each has a random mechanism for moving players forward, so you never know quite how far you’ll move next turn.

Poker is not deterministic, either. You might play several hands where you appear to have the same game state (your hand and all face-up cards on the table are the same), but the actual results of the hand may be different because you never know what the opponents’ cards are.

Rock-Paper-Scissors is not deterministic, in the sense that any given throw (like Rock) will sometimes win, sometimes lose, and sometimes draw, depending on what the opponent does.

Note that there are deterministic elements to all of these games. For example, once you have rolled your die in Chutes & Ladders, called the hand in Poker, or made your throw in Rock-Paper-Scissors, resolving the turn is done by the (deterministic) rules of the game. If you throw Rock and your opponent throws Paper, the result is always the same.

Non-determinism

The opposite of a deterministic game is a non-deterministic game. The easiest way to illustrate the difference is by comparing the arcade classic Pac-Man with its sequel Ms. Pac-Man.

The original Pac-Man is entirely deterministic. The ghosts follow an AI that is purely dependent on the current game state. As a result, following a pre-defined sequence of controller inputs on a given level will always produce the exact same results, every time. Because of this deterministic property, some players were able to figure out patterns of movements; the game changed from one of chasing and being chased to one of memorizing and executing patterns.

This ended up being a problem: arcade games required that players play for 3 minutes or less, on average, in order to remain profitable. Pattern players could play for hours. In Ms. Pac-Man, an element of non-determinism was added: sometimes the ghosts would choose their direction randomly. As a result, Ms. Pac-Man returned the focus of gameplay from pattern execution to quick thinking and reaction, and (at the championship levels, at least) the two games play quite differently.

Now, this is not to say that a non-deterministic game is always “better.” Remember, Chess and Go are deterministic games that have been played for thousands of years; as game designers today, we count ourselves lucky if our games are played a mere two or three years from the release date. So my point is not that one method is superior to the other, but rather that analyzing game balance is done differently for deterministic versus non-deterministic games.

Deterministic games can theoretically undergo some kind of brute-force analysis, where you look at all the possible moves and determine the best one. The number of moves to consider may be so large (as with the game Go) that a brute-force solve is impossible, but in at least some cases (typically early-game and end-game positions) you can do a bit of number-crunching to figure out optimal moves.

Non-deterministic games don’t work that way. They require you to use probability to figure out the odds of winning for each move, with the understanding that any given playthrough might give a different actual result.

Solvability

This leads to a discussion of whether a game is solvable. When we say a game is solvable, in general, we mean that the game has a single, knowable “best” action to take at any given point in play, and it is possible for players to know what that move is. In general, we find solvability to be an undesirable trait in a game. If the player knows the best move, they aren’t making any interesting decisions; every decision is obvious.

That said, there are lots of kinds of solvability, and some kinds are not as bad as others.

Trivial solvability

Normally, when we say a game is solvable in a bad way, we mean that it is trivially solvable: it is a game where the human mind can completely solve the game in real-time. Tic-Tac-Toe is a common example of this; young children who haven’t solved the game yet find it endlessly fascinating, but at some point they figure out all of the permutations, solve the game, and no longer find it interesting.

We can still talk about the balance of trivially solvable games. For example, given optimal play on both sides, we know that Tic-Tac-Toe is a draw, so we could say in this sense that the game is balanced.

However, we could also say that if you look at all possible games of Tic-Tac-Toe that could be played, you’ll find that there are more ways for X to win than O, so you could say it is unbalanced because there is a first-player advantage (although that advantage can be negated through optimal play by both players). These are the kinds of balance considerations for a trivially solvable game.

Theoretical complete solvability

There are games like Chess and Go which are theoretically solvable, but in reality there are so many permutations that the human mind (and even computers) can’t realistically solve the entire game. Here is a case where games are solvable but still interesting, because their complexity is beyond our capacity to solve them.

It is hard to tell if games like this are balanced, because we don’t actually know the solution and don’t have the means to actually solve it. We must rely on our game designer intuition, the (sometimes conflicting) opinions of expert players, or tournament stats across many championship-level games, to merely get a good guess as to whether the game is balanced. (Another impractical way to balance these games is to sit around and wait for computers to become powerful enough to solve them within our lifetimes, knowing that this may or may not happen.)

Solving non-deterministic games

You might think that only deterministic games can be solved. After all, non-deterministic games have random or unknown elements, so “optimal” play does not guarantee a win (or even a draw). However, I would say that non-deterministic games can still be “solved,” it’s just that the “solution” looks a lot different: a solution in this case is a set of actions that maximize your probability of winning.

The card game Poker provides an interesting example of this. You have some information about what is in your hand, and what is showing on the table. Given this information, it is possible to compute the exact odds of winning with your hand, and in fact championship players are capable of doing this in real-time. Because of this, all bets you make are either optimal, or they aren’t. For example, if you compute you have a 50/50 chance of winning a $300 pot, and you are being asked to pay $10 to stay in, that is clearly an optimal move for you; if you lost $10 half of the time and won $300 the other half, you would come out ahead. In this case, the “solution” is to make the bet.

You might wonder, if Poker is solvable, what stops it from becoming a boring grind of players computing odds with a calculator and then betting or not based on the numbers? From a game balance perspective, such a situation is dangerous: not only do players know what the best move is (so there are only obvious decisions), but sometimes optimal play will end in a loss, effectively punishing a player for their great skill at odds computation! In games like this, you need some kind of mechanism to get around the problem of solvability-leading-to-player-frustration.

The way Poker does this, and the reason it’s so interesting, is that players may choose to play suboptimally in order to bluff. Your opponents’ behavior may influence your decisions: if the guy sitting across from you is betting aggressively, is it because he has a great hand and knows something you don’t know? Or is he just bad at math? Or is he good at math, and betting high with a hand that can’t really win, but he’s trying to trick you into thinking his hand is better than it really is? This human factor is not solvable, but the solvable aspects of the game are used to inform players, which is why at the highest levels Poker is a game of psychology, not math. It is these psychological elements that prevent Poker from turning into a game of pure luck when played by skilled individuals.

Solving intransitive games

Intransitive games are a fancy way of saying “games like Rock-Paper-Scissors.” Since the outcome depends on a simultaneous choice between you and your opponent, there does not appear to be an optimal move, and therefore there is no way to solve it. But in fact, the game is solvable… it’s just that the solution looks a bit different from other kinds of games.

The solution to Rock-Paper-Scissors is a ratio of 1:1:1, meaning that you should throw about as many of each type as any other. If you threw more of one type than the others (say, for example, you favored Paper), your opponent could throw the thing that beats your preferred throw (Scissors) more often, which lets them win slightly more than average. So in general, the “solution” to RPS is to throw each symbol with equal frequency in the long term.

Suppose we made a rules change: every win with Rock counts as two wins instead of one. Then we would have a different solution where the ratios would be uneven. There are mathematical ways to figure out exactly what this new ratio would be, and we will talk about how to do that later in this course. You might find this useful, for example, if you’re making a real-time strategy game with some units that are strong against other unit types (in an intransitive way), but you want certain units to be more rare and special in gameplay than others. So, you might change the relative capabilities to make certain units more cost-efficient or more powerful overall, which in turn would change the relative frequencies of each unit type appearing (given optimal play).

Perfect information

A related concept to solvability is that of information availability. In a game with perfect or complete information, all players know all elements of the game state at all times. Chess and Go are obvious examples.

You might be able to see, then, that any deterministic game with perfect information is at least theoretically, completely solvable.

Other games have varying degrees of incomplete information, meaning that each player does not know the entire game state. Card games like Hearts or Poker work this way; in these games, each player has privileged information where they know some things the opponents don’t, and in fact part of the game is trying to figure out the information that the other players know. With Hearts in particular, the sum of player information is the game state; if players combined their information, the game would have perfect information.

Yet other games have information that is concealed from all of the players. An example of this is the card game Rummy. In this game, all players know what is in the discard pile (common information), each player knows what is in his or her own hand but no one else’s hand (privileged information), and no player knows what cards remain in the draw deck or what order those cards are placed in (hidden information).

Trading-card games like Magic: the Gathering offer additional layers of privileged information, because players have some privileged information about the possibility space of the game. In particular, each player knows the contents of cards in their own deck, but not their opponent’s, although neither player knows the exact order of cards in their own draw pile. Even more interesting, there are some cards that can give you some limited information on all of these things (such as cards that let you peek at your opponent’s hand or deck), and part of the challenge of deck construction is deciding how important it is to gain information versus how important it is to actually attack or defend.

Symmetry

Another concept that impacts game balance is whether a game is symmetric or asymmetric. Symmetric games are those where all players have exactly the same starting position and the same rules. Chess is almost symmetric, except for that pesky little detail about White going first.

Could you make Chess symmetric with a rules change? Yes: for example, if both players wrote down their moves simultaneously, then revealed and resolved the moves at the same time, the game would be completely symmetric (and in fact there are variants along these lines). Note that in this case, symmetry requires added complexity; you need extra rules to handle cases where two pieces move into or through the same square, or when one piece enters a square just as another piece exits the square.

In one respect, you could say that perfectly symmetric games are automatically balanced. At the very least, you know that no player is at an advantage or disadvantage from the beginning, since they have the exact same starting positions. However, symmetry alone does not guarantee that the game objects or strategies within the game are balanced; there may still be certain pieces that are much more powerful than others, or certain strategies that are clearly optimal, and symmetry doesn’t change that. Perfect symmetry is therefore not an “easy way out” for designers to make a balanced game.

The Metagame

The term metagame literally means “the game surrounding the game” and generally refers to the things players do when they’re not actively playing the game, but their actions are still affecting their chances to win their next game. Trading card games like Magic: the Gathering are a clear example of this: in between games, players construct a deck, and the contents of that deck affect their ability to win. Another example would be championship-level Poker or even world-tournament Rock-Paper-Scissors, players analyze the common behaviors and strategies of their opponents. Professional sports have all kinds of things going on in between games: scouting, drafting, trading, training, and so on.

For games that have a strong metagame, balance of the metagame is an important consideration. Even if the game itself is balanced, a metagame imbalance can destroy the balance of the game. Professional sports are a great example. Here is a positive feedback loop that is inherent in any professional sport: teams that win more games, get more money; more money lets them attract better players, which further increases their chance of winning more games. (With apologies to anyone who lives in New York, this is the reason everyone else hates the Yankees.)

Other sports have metagame mechanics in place to control this positive feedback. American Football includes the following:

  • Drafts. When a bunch of players leave their teams to be picked up by other teams, the weakest team gets to choose first. Thus, the weakest teams pick up the strongest players each year.
  • Salary caps. If there is a limit to how much players can make, it prevents a single team from being able to throw infinite money at the problem. Even weaker teams are able to match the max salary for a few of their players.
  • Player limits. There are a finite number of players allowed on any team; a good team can’t just have an infinite supply of talent.

These metagame mechanics are not arbitrary or accidental. They were put in place on purpose, by people who know something about game balance, and it’s part of the reason why any given Sunday, the weakest team in the NFL might be able to beat the strongest team.

From this, you might think that fixing the metagame is a great way to balance the game. Trading card games offer two examples of where this tactic fails.

First, let’s go back to the early days of Magic: the Gathering. Some cards are rarer than others. Thus, some rare cards ended up being flat-out better than their more common counterparts. Richard Garfield clearly thought that rarity itself was a way to balance the game. (In his defense, this was not an unreasonable assumption at the time. He had no way of knowing that some people would spend thousands of dollars on cards just to get a full set of rares, nor did he know that people would largely ignore the rules for “ante” which served as an additional balancing factor.) Today, trading card game designers are more aware of this problem; while one does occasionally see games where “more rare = more powerful,” players are (thankfully) less willing to put up with those kinds of shenanigans.

Second, TCGs have a problem that video games don’t have: once a set of cards is released, it is too late to fix it with a “patch” if some kind of gross imbalance is discovered. In drastic cases they can restrict or outright ban a card, or issue some kind of errata, but in most cases this is not practical; the designers are stuck. Occasionally you might see a designer that tries to balance an overpowered card in a previous set by creating a “counter-card” in the next set. This is a metagame solution: if all the competitive decks use Card X, then a new Card Y that punishes the opponent for playing Card X gives players a new metagame option… but if Card Y does nothing else, it is only useful in the context of the metagame. This essentially turns the metagame into Rock (dominant deck) – Paper (deck with counter-card) – Scissors (everything else). This may be preferable to a metagame with only one dominant strategy, but it’s not much better, and it mostly shifts the focus from the actual play of the game to the metagame: you may as well just show your opponent your deck and determine a winner that way.

This is admittedly an extreme example, and there are other ways to work around an imbalance like this. The counter-card might have other useful effects. The game overall might be designed such that player choices during the game contribute greatly to the outcome, where the deck is more of an influence on your play style than a fixed strategy. Still, some games have gone so far as to print a card that says “When your opponent plays [specific named card], [something really bad happens to them]” with no other effect, so I thought this was worth bringing up.

Game balance versus metagame balance

In professional sports, metagame fixes make the game more balanced. In TCGs, metagame fixes feel like a hack. Why the difference?

The reason is that in sports, the imbalance exists in the metagame to begin with, so a metagame fix for this imbalance is appropriate. In TCGs, the imbalance is either part of the game mechanics or individual game objects (i.e. specific cards); the metagame imbalances that result from this are a symptom and not the root cause. As a result, a metagame fix for a TCG is a response to a symptom, while the initial problem continues unchecked.

The lesson here is that a game balance problem in one part of a game can propagate to and manifest in other areas, so the problems you see during playtesting are not always the exact things that need to be fixed. When you identify an imbalance, before slapping a fix on it, ask yourself why this imbalance is really happening, what is actually causing it… and then, what is causing that, and what is causing that, and so on as deep as you can go.

Game Balance Made Easy, For Lazy People

I’m going to try to leave you each week with some things you can do right now to improve the balance of a game you’re working on, and then some “homework” that you can do to improve your skills. Since we just talked about vocabulary (symmetry, determinism, solvability, perfect information, and the metagame) this week, there’s not a lot to do, so instead I’m going to start by saying what not to do.

If you’re having trouble balancing a game, the easiest way to fix it is to get your players to do this for you. One way to do this is auction mechanics. There is nothing wrong with auctions as a game mechanic, mind you – they are often very compelling and exciting – but they can be used as a crutch to cover up a game imbalance, and you need to be careful of that.

Let me give an example of how this works. Suppose you’re a designer at Blizzard working on Warcraft IV, and you have an Orcs-vs-Humans two-player game that you want to balance, but you think the Orcs are a little more powerful than the humans (but not much). You decide the best way to balance this is to reduce the starting resources of the Orcs; if the Humans start with, say, 100 Gold… maybe the Orcs start with a little less. How much less? Well, that’s what game balance is all about, but you have no idea how much less.

Here’s a solution: make players bid their starting Gold on the right to play the Orcs at the start of the game. Whoever bids the most, loses their bid; the other player starts with the full 100 Gold and plays the weaker Humans. Eventually, players will reach a consensus and start bidding about the same amount of Gold, and this will make things balanced. I say this is lazy design because there is a correct answer here, but instead of taking the trouble to figure it out, you instead shift that burden to the players and make them balance the game for you.

Note that this can actually be a great tool in playtesting. Feel free to add an auction in a case like this, let your testers come to a consensus of how much something is worth, then just cost it accordingly in the final version (without including the auction).

Here’s another way to get players to balance your game for you: in a multiplayer free-for-all game, include mechanics that let the players easily gang up on the leader. That way, if one player finds a game imbalance, the other players can cooperate to bring them down. Of course, this brings other gameplay problems with it. Players may “sandbag” (play suboptimally on purpose) in order to not attract too much attention. Players who do well (even without rules exploits) may feel like the other players are punishing them for being good players. Kill-the-leader mechanics serve as a strong negative feedback loop, and negative feedback has other consequences: the game tends to take longer, early-game skill is not as much a factor as late-game, and some players may feel that the outcome of the game is more decided on their ability to not be noticed than their actual game skill. Again, there is nothing inherently wrong with giving players the ability to form alliances against each other… but doing it for the sole purpose of letting players deal with your poor design and balancing skills should not be the first and only solution.

Okay, is there anything you can do right now to improve the balance of a game you’re working on? I would say, examine your game to see if you are using your players as a game balance crutch (through auctions, kill-the-leader mechanics, or similar). Try removing that crutch and seeing what happens. You might find out that these mechanics are covering up game imbalances that will become more apparent when they’re removed. When you find the actual imbalances that used to be obscured, you can fix them and make the game stronger. (You can always add your auctions or kill-the-leader mechanics back in later, if they are important to the gameplay.)

Homework

I’ll go out on a limb and guess that if you’re reading this, you are probably playing at least one game in your spare time. If you work in the game industry as a designer, you may be playing a game at your day job for research. Maybe you have occasion to watch other people play, either while playtesting your own game, or on television (such as watching a game show or a professional sports match).

As you play (or watch) these games this week, don’t just play/watch for fun. Instead, think about the actions in the game and ask yourself if you think the game is balanced or not. Why do you think that? If you feel it’s not, where are the imbalances? What are the root causes of those imbalances, and how would you change them if you wanted to fix them? Write down your thoughts if it helps.

The purpose of this is not to actually improve the game you’re examining, but to give you some practice in thinking critically about game balance. It’s emotionally easier to find problems in other people’s games than your own (even if the actual process is the same), so start by looking at the balance or imbalance in other people’s games first.