Imaginary Cogs

On the operation of massively multiplayer online games.
RSS icon Email icon Home icon
  • Patching the Game (Part II)

    Posted on April 3rd, 2009 Bryant

    Part I of the series is here. In this part, I’ll get more technical.

    I like having a checklist for the process I’m about to describe. It’s good to have whoever is executing each step checking off their work. It feels dull because it is dull, but it keeps fallible human beings from forgetting the one boring step they’ve done a hundred times before. It also instills a sense of responsibility. Either paper or electronic is fine, as long as the results are archived and each step plus the overall patch is associated with a specific person each time.

    Once the patch is approved, it’ll need to be moved to the data center. As Joe notes in the post I linked in Part I, that can be a surprisingly long process. That’s a problem even if you aren’t doing continuous deployment, because there will come a time when you need to get a fix out super-quickly. The easy answer here is that patches shouldn’t be monolithic. Data files should be segmented such that you can push out a change in one without having to update the whole wad. The speed of the uplink to the data center is definitely something you should be thinking about as a tech ops guy, though. Find out how big patches could be, figure out how long it’ll take to get them uploaded in the worst case, and make sure people know about that.

    A backup plan for file uploads should be in place, in case something unexpected happens. Back at Turbine, we had a USB drive sitting around so we could copy patches to it and drive them to the data center. That’s really cheap — you can buy one at Best Buy and keep it around in case of emergency. It doesn’t work as well if you have multiple data centers, but the bandwidth between data centers is probably better than the bandwidth to the office.

    It may come in handy to be able to do Quality of Service on your office network, as well. At a game company, you need to expect that people will be playing games during work hours. This is a valid use of time, since it’s important to know what the competition is like. Still, it’s good to be able to throttle that usage if you’re trying to get the damned patch up as quick as possible to minimize server downtime. Or if the patch took a couple days extra to get through testing, but you’ve already made the mistake of announcing a patch date… yeah.

    If your office is physically close to the data center, cost out a T1 line directly there. Then compare the yearly cost of the T1 to the cost of six hours of downtime. Also, if you have a direct connection into the data center, you can avoid some security concerns at the cost of some different ones.

    Right. The files are now at the data center. You have, say, a couple hundred servers that need new files. The minimum functionality for an automated push is as follows:

    • Must be able to push a directory structure and the files within it to a specified location on an arbitrary number of servers.
    • Must be able to verify file integrity after the push.
    • Must be able to run pre-push and post-push scripts. (This sort of takes care of the second requirement.)
    • Must report on success or failure.

    That’ll get you most of the way to where you need to go. The files should be pushed to a staging location on each server — best practice is to push to a directory whose name incorporates the version number. Something like /opt/my-mmo/patches/2009-03-22-v23456/ is good. Once everything’s pushed out and confirmed and it’s time to make the patch happen, you can run another command and automatically move the files from there into their final destination, or relink the data file directory to the new directory, or whatever. Sadly, right now, “whatever” probably includes taking the servers down. Make sure that the players have gotten that communication first; IMHO it’s better to delay a bit if someone missed sending out game alerts and forum posts. If your push infrastructure can do the pre-push and post-push scripts, you can treat this step as just another push, which is handy.

    This is often a time to do additional maintenance; e.g., taking full backups can happen during this downtime. You should absolutely do whatever’s necessary to ensure that you can roll back the patch, but you also want to keep downtime to a minimum.

    Somewhere in here, perhaps in parallel, any data files or executables destined for the client need to be moved to the patch server. “Patch server” is a bit of a handwave. I think the right way to do this is to have one server or cluster responsible for telling the client what to download, and a separate set of servers to handle the downloads proper. That’ll scale better because functionality is separated.

    If you use HTTP as the transport protocol for your client patches, you have a lot of flexibility as to where you host those patches. Patch volumes will be really high; most of your active customers will download the patches within a few hours after they go live. At Turbine, we found out that it would take multiple gigabyte network drops to handle patch traffic, which is way more than you need for day to day operations. You want the flexibility to deliver patches as Amazon S3 objects, or via a CDN like Akamai if you’re way rich. Using Amazon gives you Bittorrent functionality for free, which might save you some bandwidth costs. I wouldn’t expect to save a lot that way, for reasons of human nature.

    Client patches can theoretically be pre-staged using the same basic approach used with server files: download early, move files into place as needed. If you’re really studly, your client/server communication protocol is architected with reserve compatibility in mind. Linden Lab does this for Second Life — you can usually access new versions of the server with old clients. Let people update on their schedule, not yours. That also makes roll backs easier, unless it’s the game client or data files which need to be rolled back. Client patching architecture should be designed to allow for those rollbacks as well.

    Pushing files to patch servers might use the same infrastructure as pushing server and data files around. Akamai will pull files from a server inside your datacenter, as will most CDNs, so that’s easy. Pushing files to Amazon S3 would require a different process. Fortunately the Amazon API is not very hard to work with. Note that you still want that consistency check at the end of the push. You can do this by downloading the files from Amazon and comparing them with the ones you pushed up there.

    Once everything’s in place, if you’ve taken the servers down, you run one more consistency check to make sure the files in place are the ones you want. Then you bring the servers back up. They should come back up in a locked state, whether that’s a per-shard configuration or a knob you turn on the central authentication server. (Fail-safe technique: insist that servers come up locked by default, and don’t open to customers until someone types an admin command.)

    Tech ops does the first login. If that sniff test goes well, QA gets a pass at it. This will include client patching, which is a second check on the validity of those files. Assuming all this goes well, the floodgates open and you’re done. Assuming no rollbacks are needed.

    After you’re done, you or your designate sits in the war room watching metrics and hanging out with everyone else in the war room. The war room is a good topic for another post; it’s a way to have everyone on alert and to have easy access to decision-makers if decisions need to be made. It’s usually quiet. Sometime in the evening the war room captain says you’re really done, and this time you can go home.

    Part III of this series will be a discussion of patch downtime, and MMO downtime in general.

  • Patching the Game (Part I)

    Posted on April 2nd, 2009 Bryant

    Chris asked about patching the game in comments, which dovetails nicely with this post. I have a nit to pick with the theory of continuous deployment, but that’ll wait a post or two.

    Joe’s outline of release management focuses mostly on the engineering and QA side of the house, which makes sense. The Flying Lab process is very similar to the Turbine process as far as that goes. I’m going to get into the tech ops aspects of patching in the next post, but in this one I want to cover some business process and definitions. Oh, and one side note: patch, hotfix, content update, content push, whatever you want to call it. If you’re modifying the game by making server or client changes, it’s a patch from the operational perspective.

    Roughly speaking, you can divide a patch into four potential parts. Not all patches will necessarily need each of these parts. Depending on your server and client design, you may have to change all of these concurrently, but optimally they’re independent.

    Part one is server data, which could come in any number of forms. Your servers might use binary data files. They might use some sort of flat text file — I bet there’s someone out there doing world data in XML. I know of at least one game that kept all the data in a relational database. It all boils down to the data which defines the world.

    I suppose that in theory, and perhaps in practice, game data could be compiled into the server executable itself. This is suboptimal because it removes the theoretical ability to reload game data on the fly without a game server restart. Even if your data files are separate, you may not be able to do a reload on the fly, but at least separation should make it easier to rework the code to do the right thing later on. There will be more on this topic at a later date.

    Part two is the server executable itself. This doesn’t change as often; maybe just when the game introduces new systems or new mechanics. Yay for simplicity. I am pretending that there aren’t multiple pieces of software which make up your game shard, which is probably untrue, but the principle is the same regardless.

    Parts three and four split the same way, but apply to the client: client data files and client executables. Any given game may or may not use the same patching mechanism for these two pieces. The distribution method is likely to be the same, but it’s convenient to be able to handle data files without client restarts for the same reason you want to be able to update game data without a server restart.

    I prefer to be involved with the release process rather than just pushing out code as it’s thrown over the wall. My job is to keep the servers running happily; at the very least, the more I know about what’s happening, the better I can react to problems. One methodology that I’ve used in the past in games: have a release meeting before the patch hits QA. Break down each change in the patch, and rate each one for importance — how much do we need this change? — and risk. Then when the patch comes out of QA, go back and do the same breakdown. QA will often have information which changes the risk factor, and sometimes that means you don’t want to make a specific change after all. Sometimes the tech ops idea of risk is different than engineering’s idea of risk, for perfectly valid reasons. The second meeting either says “yep, push it!” or “no, don’t push it.” If it’s a no, generally that means you decided to drop some changes and do another QA round.

    Meetings like that include QA, engineering, whoever owns the continued success of the game (i.e., a producer or executive producer), community relations, and customer support. You can fold the rest of the go/no-go meeting process into this meeting as well. There’s a checklist: do we have release notes for players? Is the proposed date of the push a bad one for some reason? Etc.

    I haven’t mentioned the public test server, but that should happen either as part of the QA process or as a separate step in the process. I tend to think that you benefit from treating public test servers as production, which may mean that your first patch meeting in the cycle also formally approves the patch going to public test. You might have quickie meetings during the course of the QA cycle to push out new builds to test as well.

    Tomorrow: nuts and bolts.

  • Recap

    Posted on March 29th, 2009 Bryant

    Hey, that’s a week. Neat. Thoughts and questions for people who’ve found their way here:

    Anything in particular you want to see? I have pending requests for another post about datacenters, something on customer service, and a piece on planning for usage spikes. If there’s anything in particular you want me to talk about, let me know.

    For that matter, if there’s a general category of stuff which is more interesting, let me know that, too.

    I fiddled around with the look of the blog a bit over the course of the week. Comment links are now at the bottom of each post instead of the top. I don’t imagine anyone really cares, but if you want those links at the top as well as the bottom, I could do that.

    There is a LiveJournal feed, which I should put in the sidebar. (Edit: there, that was fun. Custom WordPress widgets are easy, as it turns out.) There is also now a LiveJournal feed containing just excerpts, since the fairly large posts do chew up a bunch of room: imgnry_cgs_shrt. Not the most memorable name in the world but there’s a length limit on LiveJournal syndication names.

    I’m away on business Monday, so see you again probably on Tuesday. Thanks for coming by.

  • Three Rings Metrics

    Posted on March 27th, 2009 Bryant

    Daniel James of Three Rings (Puzzle Pirates, Whirled) made a great post with his slides from his GDC presentation. Attention alert: lots of real numbers! It’s like catnip for MMO geeks.

    From a tech ops perspective, I paid lots of attention to those graphs. Page 7 is awesome. That is exactly the sort of data which should be on a graph in your network monitoring software; ideally it should be on a page with other graphs showing machine load, network load, and so on. Everything should be on the same timeline, for easy comparisons. It’s my job to tell people when we’re going to need to order new hardware; a tech ops manager should have a deep understanding of how player load affects hardware load. Hm, let’s have an example of graphing:

    Cacti graphs showing network traffic and CPU utilization.

    Cacti graphs showing network traffic and CPU utilization.

    That’s cacti, which is my favorite open source tool for this purpose right now, although it has its limitations and flaws. This particular pair of graphs shows network traffic on top and CPU utilization for one CPU of the server below; not surprisingly, CPU utilization rises along with network traffic. Data collection for CPU utilization and network traffic is built into cacti, and it’s easy to add collection for pretty much any piece of data that can be expressed as a numeric value.

    That sort of trend visualization also helps catch problem areas before they get bad. Does the ratio of concurrent players to memory used change abruptly when you hit a specific number of concurrent users? If so, talk to the engineers. It might be fixable. And if it isn’t, well, the projections for profitability might have just changed in which case you better be talking to the financial guys. Making sure the company is making money is absolutely part of the responsibility of anyone in technical operations; some day perhaps I’ll rant about the self-defeating geek tendency to sneer at the business side of the house.

    Page 8, more of the same. The observant will notice one of the little quirks of gaming operations: peak times are afternoon to evening, and the peak days are the weekends. The Saturday peak is broader, because people can play during the day more on weekends. You might assume that browser-based games like Whirled would see more play from work, but nope, I guess not.

    I wonder what those little dips on 3/17, 3/18, and 3/20 are? I don’t think Whirled is a sharded game, so that can’t be a single shard crashing. Welp, I’ll never know, but that’s a great example of the sorts of things graphs show. If those were because of crashes, you’d know without needing graphs to tell you because your pager would go off, but if it’s something else you’d want to investigate. Could be a bug in your data collection, for that matter, but that’s bad too.

    Less tech ops, but still interesting: the material on player acquisition is excellent. Read this if you want to know how to figure out the economics of a game. If I were Daniel James, I would also have breakdowns telling me how those retention cohorts broke down based on play time and perhaps styles of play. What kinds of players stick around? Very important question. I believe strongly in the integration of billing metrics and operational metrics. That work is something that technical operations can drive if need be; all the data sources are within your control. It’s worth spending the time to whip up a prototype dashboard and pitch it to your CFO.

    Then there’s a chunk of advice on building an in-world economy that relates to the real world. Heh: it’s MMO as platform again. Whirled is built on that concept, as I understand it. That dovetails nicely with his discussion of billing. When he says “Don’t build, but use a provider,” he is absolutely correct.

    I love this slideshow. In the blog post surrounding it, he talks about how he feels it’s OK to give away the numbers. There are dangers in sharing subscriber numbers and concurrencies, particularly if you’re competing in the big traditional space, but I like seeing people taking the risk. There is plenty of room in the MMO space for more players and plain old numbers are not going to be the secret sauce that makes you rich. How you get those numbers is a different story. So thanks to Daniel for this.

  • What Do I Do? Colocation

    Posted on March 26th, 2009 Bryant

    I’m never sure how mystifying my job is to the average person. I do know that even technophiles don’t always really know what technical operations does beyond “they’re the guys who keep the servers running,” and I like talking about my job, so I figured I’d expand a bit on the brief blurb and talk about what a typical tech ops team does from time to time.

    I’m going to try to use the term “technical operations” for my stuff, in the interests of distinguishing it from operations in general. When a business guy talks about operations, he’s probably talking about the whole gamut of running a game (or a web site, whatever). This includes my immediate bailiwick, but it also includes stuff like customer support, possibly community management, and in some cases even coders maintaining the game. It’s sort of a fuzzier distinction in online gaming; back in the wonderful world of web sites, there’s not a ton of distinction between development pre-launch and development post-launch. Gaming tends to think of those two phases as very different beasts, for mostly good reasons. Although I think some of that is carryover from offline games. I digress! Chalk that up for a later post.

    So okay. My primary job is to keep servers running happily. The bedrock of this is the physical installation of servers in the data center. This post is going to be about how you host your servers.

    Figure any MMO of any notable size will have… let’s say over 100 servers. This is conservative; World of Warcraft has a lot more than that. There’ll also be big exceptions. I think Puzzle Pirates is a significant MMO and given that it’s a 2D environment, it might be pretty small in terms of server footprint. Um, eight worlds — yeah, I wouldn’t be surprised if they were under 100. But figure we’re generally talking in the hundreds.

    You don’t want to worry about the physical aspect of hosting that many servers, especially if you’re a gaming company, because then that’s really not your area of expertise. My typical evaluation of a hosting facility includes questions about how many distinct power grids the facility can access; if, say, Somerville has a power outage I’d like it if the facility could get power from somewhere else. I want to know how long the facility can go without power at all, and how often those backup generators are tested. I want to know how redundant the air conditioning systems are. I want to know how many staff are on site overnight. I want to know about a million things about their network connectivity to the rest of the world. This is all both expensive and hard to build, and why buy that sort of headache? There are companies who will do it for you, and it will be more cost effective, because they’re doing it on a larger scale.

    If I’m starting from the ground up, step one is choosing the right hosting facility. Call it colocation if you like. Some people spell that collocation, which is not incorrect but which drives me nuts. (Sorry, Mike.) You start out with the evaluation… well, no. You start out by figuring what’s important to you. As with everything, you need to make the money vs. convenience vs. quality tradeoffs. A tier 1 provider like AT&T or MCI can be really good, but you’re going to pay more than you would for a second tier provider, and that’s not always a wise choice.

    My full RFP (request for proposal) document is thousands of words of questions. I won’t reproduce the whole thing here. Suffice it to say that this choice is one of the most important ones you’re going to make. You do not want the pain of changing data centers once you’ve launched. Even once you’ve launched beta. It’s good to get this one right.

    There’s also a fair amount of ongoing work that goes into maintaining the relationship, because the bill for hosting is one of your biggest monthly costs. Every month, you have to go over the bill and make sure you’re getting charged for all the right things. I have worked with a lot of colocation facilities and even the best of them screw up billing from time to time.

    It’s also smart to basically keep in touch with your facility. You need to figure out who the right person is — probably your Technical Account Manager, maybe someone else. I’ve had relationships where the right guy to talk to was my sales guy, because he loved working with a gaming company and he was engaged enough to look at our bills himself every month to make sure they were right. You want to talk to someone at least once a month, in any case, for a bunch of reasons.

    First off, if they’ve got concerns, it’s an avenue for them to express them informally. Maybe you’re using more power than you’re paying for. Maybe your cage is a mess, in which case shame on you and why didn’t you already know about it? But you never know. Maybe there’s a new customer that’s about to scoop up a ton of space in your data center and you won’t have expansion room available.

    If you’re talking to your key people regularly, they’re going to keep you in mind when things like that last happen. Often enough you can’t do anything about it; it’s still good to know.

    Oh, and if your hosting provider has some sort of game-oriented group, latch onto it! AT&T has an absolutely great Gaming Core Team; when Turbine hooked up with them, our already good service got even better.

    Like any relationship with any vendor, you’re going to get more out of it the more you put into it. You don’t stop worrying once you sign the contract.

  • … And We’re Gonna Try

    Posted on March 25th, 2009 Bryant

    Speaking of streaming games, OnLive wants to implement that cloud gaming solution. 720p resolution at 60 FPS — hey, that’s really similar to what Dyack was saying would be possible, huh? I think some people are going to be disappointed, since there’s not much OnLive can do about intermediary network problems, but we’ll see.

    This leaves the question of cost. I’m wondering about the cloud computing resources OnLive plans on using. Barring substantial rewrites, the cloud would need to have either high end video cards or something capable of a really good emulation, right? Maybe some custom hardware to provide banks of nVidia/ATI processors? You wouldn’t run this on a standard cloud, because a standard cloud doesn’t provide really good DirectX capacity.

    I can’t really speculate honestly because they don’t talk about their pricing model at all. If they charge a buck an hour, then they’re making enough per year ($8,760) to pay for a single desktop-class computer outright. Assume hosting is another couple hundred bucks a month? I don’t know, because I don’t know what their hardware is. Add on headcount for ops, headcount for everything else, a percentage for the game publishers. I don’t think that really adds up well even if you amortize it out over three years. I’m also simplifying, because the majority of computer equivalents won’t be in use 24/7.

    $20/month for an all you can eat subscription? There are 720 hours in a month. Say we’re targeting an average of $2/hour for hours played. The average usage for this ought to be higher than 10 hours a month. You’re selling convenience, after all.

    $10 for 24 hours in realtime with one game? That’s not going to fly with consumers.

    Whoops, I speculated after all. It’s an intriguing question. Steve Perlman has a decent track record, so I can’t assume this is just a publicity bubble. He does seem to be the kind of guy who’ll spend as long as it takes to polish a product. We may be waiting longer than Winter 2009 to see this sucker.

    Um. More technical stuff later this week. People keep talking about interesting stuff at GDC!

  • Well, Sorta

    Posted on March 24th, 2009 Bryant

    Greg Costikyan wrote a takedown of Denis Dyack’s editorial on cloud computing and gaming. I think Costikyan’s sort of right, but the semantic errors don’t totally invalidate what Dyack’s trying to say. Even if he’s saying it poorly.

    I disagree with Costikyan’s definition of cloud computing. He’s basically defining it by example as Amazon’s cloud computing offering, which allows random people to power up remote compute services in a scalable fashion. I agree that right now, there’s not much value in Amazon-style offerings for game companies. (Note: future post on this, because we ought to be thinking about that particular question for a bunch of reasons.) On the other hand, that’s not what Dyack is talking about and his definition is both broader and more accurate.

    He’s talking about Google Docs — or hey, Gmail — in the gaming context. Google Docs is absolutely an example of cloud computing. It happens to be the case that the company providing the service owns the computers on which the service runs, but from our perspective, the documents and the software live out there in the cloud.

    From a business-speak perspective, Costikyan is talking about IaaS: Infrastructure as a Service. Cloud computing includes IaaS, but it also includes SaaS, or Software as a Service. Google Docs is Software as a Service; it’s a full featured program that mostly runs on servers, with a relatively lightweight client. Dyack’s talking about SaaS.

    And yeah, MMOs are in fact specialized versions of SaaS. I’ve been using that line when I interview at non-gaming companies. It makes people more comfortable when I accurately categorize the last six years of my career as working on SaaS, which I find both pleasing and amusing.

    On point two, yep. Linear entertainment is not a commodity. That was a cute way for Dyack to say it’s easy to pirate linear entertainment. But Dyack is right about piracy, even if his terminology was sloppy again.

    Point three, however, is where Dyack is wrong, and it’s for exactly the reasons Costikyan outlines. The user’s already spent money on the desktop CPU. It’s less profitable for gaming companies to pay for CPUs to do work that users can already do. Not too complex.

    I guess you could argue that freeing yourself from the risk of piracy is worth a certain investment in servers. I’m not sure how high that value really is. The books I’d want to read… probably Popcap, right? Casual browser games are SaaS. Popcap’s games migrate from browsers into standalone games as a matter of course, so that must be a profitable business decision, assuming the Popcap guys aren’t dumb.

    Finally, it’s worth noting that Dyack slipped a casual “Imagine if technology allowed us simply to broadcast a video signal (games) at 60fps at 720p through a server” in there. Yeah, I can imagine that. It’s not all that close, if you assume that you don’t want network lag to affect your gameplay. And you don’t. You also want to make sure college dorms can all play your game at once without problems. Etc.

  • Databases

    Posted on March 23rd, 2009 Bryant

    Relational databases: please no!

    OK. It is completely obvious that any MMO is going to need a way to store data. I understand that the instinctive reaction is to use a relational database, because that’s what relational databases are for. However, I beg of you as the guy who needs to keep the things running fast and smooth, think twice.

    Yes, you get the ease of writing code against a mature system. You also get slower response times and more fiddly parts. If you haven’t hired a really good DBA to work on schema design, you are going to find out all about the ways in which relational databases can be slow under load.

    Relational databases are really good at transactional integrity. It’s probably worth thinking about whether or not that’s a key feature. Most games don’t implement it very well right now. If the game crashes five seconds after you kill a monster, do you really feel confident that the loot will be in your bags when you log back in? I don’t.

    Besides, you can get transactional integrity out of non-relational databases too. I’m going to cite a bunch of systems with weaker integrity later on, but they’re weaker because they’re big distributed systems spanning multiple datacenters. Games typically do not have that problem.

    Relational databases do not scale cheaply. They can scale well, particularly if you bite the bullet and pay for Microsoft SQL or Oracle or something. They don’t scale cheaply.

    It’s abominably easy to write bad relational database code. The problem here is that the failure state is not obvious. Bad SQL code reveals problems under load when there’s a lock on one table because transaction A is in process, which blocks transaction B, which happens to be the transaction responsible for showing your player what she’s got in her mailbox. She now waits 30 seconds for the mailbox to load and bitches on the forums. These load problems happen to be one of the things that’s hard to test programatically. Experienced SQL programmers won’t make those mistakes, but inexperienced SQL programmers may not realize how easy they are to make.

    Relational databases are not maximally fast. They can’t be, because one of the big concerns is the above-mentioned transactional integrity, and that takes time. If you’ve got some dataset that’s primarily read-only and isn’t too horrendously large, use an in-memory data store. For the tricky bit, give that data store the same ease of updates that a SQL database has; don’t make us restart the game server to change one item.

    It is significant that the big players in the Web space all use non-relational databases heavily. I suspect this is in part the legacy of search — back at AltaVista, we’d have been somewhat horrified at the idea of using Oracle to serve up search queries. The smart people who did the original work at AltaVista, Google, and Yahoo wrote their own data stores, optimized for returning search results quickly.

    This attitude stuck. I know a possible apocryphal story about Yahoo, which claims that user data was just stored in a standard filesystem. The code was supposedly hacked such that niceties like file update times weren’t stored anywhere. Speed was all. Whether or not that’s true, Yahoo’s still using speed-oriented database code (warning: PDF). So is Google. Amazon does the same sort of thing.

    Now, OK, we’re not gonna write our own serious database systems. CouchDB and HBase probably aren’t there yet. I still really wish more people would ask if they need a relational DB. If you really do need one, make sure you’ve hired someone who’s worked with large systems before and remember that 95% of Web work is smaller than anything you’ll be doing.

  • New Blizzard Add-On Policy

    Posted on March 23rd, 2009 Bryant

    Blizzard has decided that they don’t want anyone charging for writing in-game addons. This isn’t too surprising. In broad strokes, you can go two ways when it comes to your game: you can try and hold onto all the potential profits yourself, or you can open up the ecosystem to others. Either direction has pros and cons.

    In this case, if we’re looking for specific addons which may have prompted the action, we gotta start with Carbonite. Carbonite is basically a quest guide with a million other features baked in; it makes it easier to level. Carbonite has two features which distinguish it from older attempts at commercial addons.

    One, it’s aimed at a profitable market. Quest guides and gold-making guides are real business these days — and the companies behind them get bought for real money. Nobody’s successfully selling how to raid guides for money –

    Quick digression. Raiding is a group experience; WoW leveling is not. WoW raiders are fairly likely to have at least semi-clued friends. It’s very easy for a solo WoW player to lack such friends; thus, leveling guides have a bigger market. Interesting unanswered question: will Blizzard’s efforts to make raiding more casual result in a bigger market for commercial raiding guides? Digression ends.

    – so yeah, RDX didn’t make enough money to keep the developer working on it.

    Second, Carbonite went for a free/premium model, with the free version showing ads in-game. I suspect that’s a bigger reason for the change than one might suspect. One addon showing advertisements is no huge deal, albeit annoying. When a majority of your addons are doing it, the user experience is negatively affected.

    Carbonites in-game advertising.

    Carbonite's in-game advertising. Not subtle.

    However, if ads were all Blizzard was worried about, the policy would be different. Blizzard clearly wants to control the monetary space around their game, and why shouldn’t they? They created the platform; they should get to profit from it in the manner they choose.

    The best example of the opposite approach is Linden Labs and Second Life. The Lindens go all in with an explicit definition of their product as a platform, which is accurate. They want to sell a basic service that third parties can build on, and their basic service is pretty well tuned for that purpose.

    That approach does work. For a traditional Diku-style MMO, however, you’d open yourself up to worries about RMT; once you open the door to micropayments, people start getting agitated.

    I don’t actually think that’s an attitude likely to last. I’m old enough to remember when people thought advertisements on the Web were an abomination. Heck, I’m old enough to remember when people thought the Internet should never be used for commercial purposes. We pay for tickets to sporting events, and we don’t freak out when the ticket has an advertisement on it. We pay a monthly fee for cable service, but premium channels still have advertisements.

    I think by making this change Blizzard’s actually opened a few doors. Intelligent eloquent people are making voluble arguments against the new restriction. Mostly the one about donations. A couple of popular addons are going to go away, and everyone’s going to know it’s because Blizzard said you can’t charge money/ask for donations as part of the addon. If you liked QuestHelper or Outfitter, there’s a decent chance you’ll be biased towards those arguments.

    So while it’s probably not a reasonable transition for WoW, what if the next game Blizzard publishes comes with an iPhone style App Store? Blizzard would get a few nice effects there. First, they’d take — say, 20% of the revenue stream. I pulled that out of my hat; if I were doing ops for Blizzard I’d run the numbers and be smarter. I don’t know if Blizzard logs the addons a player uses, but if I were in charge over there they would, so let’s assume they do. You could make a pretty good stab at the size of the stream.

    Second, they’d have a lot more control over the addons available, but no more control than they wanted to have. Again, c.f. Apple. There are a million low-class fart apps for the iPhone; it doesn’t reflect on Apple’s quality. On the other hand, if Blizzard wanted to screen out crap, they could.

    Third, the classic problem of distributing addons could be solved. A lot of WoW players rely on addons and feel like they can’t play well without them. On a big patch day, old addons break. Addon sites tend to die under the load of millions of players trying to get addons at once. This is, like it or not, part of the WoW experience. Making it better is a relatively small win, but it’s a win.

    The traditional arguments against Blizzard control of addons are workload and responsibility. Curse shows 3,727 addons. WoW Interface shows 2,122 standalone addons plus 459 in the Featured Projects section; they do sort out obsolete addons. This is not a crushing workload. It’s probably one person.

    Responsibility is a bigger problem. It’s not so much responsibility to the players — they’ll understand that addon quality isn’t certified. The problem is the need to present a sane relationship to your developers. The key word I kept sneaking in up above: “platform.” WoW’s UI API has been fairly stable, but it’s also always been very clearly and aggressively prone to change. Running it as a platform doesn’t mean you can’t change it. It does mean you have to manage the community better.

    In particular, it would be nice if addons didn’t potentially break every time there’s an update to the game. This is a bigger workload than screening submissions.

    Still. QuestHelper is about to be cancelled. It has been downloaded, from Curse alone, over 20 million times. There have been around 100 updates, so let’s divide that 20 million by 100, assuming that every user has downloaded every update. 200,000 people have downloaded QuestHelper from one site. Maybe it costs two bucks in the hypothetical store. 20% to Blizzard is $80,000 over the course of the last two years. That’s not pure profit, of course.

    I’m cheating, because at a brief glance QuestHelper is the most downloaded addon on Curse. I’m also cheating because on the one hand, I’m being conservative and assuming that each user downloaded the addon 100 times; on the other hand, I’m assuming each download would have been a sale. Who knows? If I were Blizzard I’d have better numbers and be able to do better math. Perhaps the revenue share should be 30%. Maybe it doesn’t make sense at all.

    I sort of doubt that you can entirely turn addons into a profit center. But they aren’t supposed to be a profit center — they’re a tool to make the game easier to play and more attractive. If you can make them stickier, you enhance the game, and letting people have a monetary stake in the success of your game is a marketing win.

  • Welcome!

    Posted on March 19th, 2009 Bryant

    I’ve been thinking about writing a Massively Multiplayer Online RPG blog for, oh, years or so now. I’ve never quite felt comfortable starting one while I was working for Turbine or Vivox. Turbine, because I didn’t want to risk slipping into talking too much about what I was actually doing, and Vivox, because I wanted to be comfortable expressing opinions without making our customers potentially angry. In both cases I don’t think the waters would have been that hard to navigate, but better safe than sorry, right?

    Also, I have a massive fear of being outed on the forums. We have community managers to take the heat when downtime runs long. It’s easier when customers think of us ops guys as a sort of faceless amoeba which cannot reasonably bear blame for anything.

    So what changed? Well, first off, I’m unemployed. This means I have a certain amount of spare time and some of the previous worries have gone away. Obviously, I’ll still steer clear of anything covered by NDAs, for both practical and moral reasons, but it’s helpful knowing I’m not in any way likely to be seen as the voice of anyone but myself.

    Second… you know, people do this. Scott Jennings blogs. Anthony Castoro blogs. Half of 38 Studios blogs. Eric Heimberg and Sandra Powers blog together — well, maybe that’s a bad example, I dunno if they’re ever going to be foolhardy enough to work in MMOs again. But you get the point; it’s OK to have personal opinions.

    Third, there still aren’t many if any people blogging about MMO operations. Fertile ground! And hey, I have an ego on me: I think I can say useful and relevant things.

    So there’s a topic for you. Massively Multiplayer Online Operations. I’m primarily interested in the gentle discipline of running the datacenters and all the myriad of details that surround that task, because that’s what I’ve done for the last fifteen years of my life. (Not always in gaming.) I take code and content from the developers, or the release managers, or QA, and I ensure that it winds up on the servers I chose, bought, and installed. After it goes live, I lie awake at night worrying about whether or not it’ll crash. If it does, my team and I bring the servers back up, gather data, and do what we can to help developers make sure it doesn’t happen again.

    I also worry about a lot of other things, though. Any ops guy who thinks of the above as the sum total of his job isn’t any good. I do due diligence on other companies to help choose good vendors and good partners. I care about billing, business development, customer service (a lot). I hopefully help developers write server code that makes sense in our datacenter environment.

    Maybe I just like having a peek into everything. But man, it makes my life easier when I do, so I’ll talk a bit about all that stuff.

    I do not know much about game design, other than as a player. I have strong opinions there. They aren’t really informed, though, other than that I don’t tend to think that the devs are incompetent boobs who’re out to get players whenever possible. The evidence against that is too strong. Anyways, I won’t geek much about game design except where it overlaps with operations, which is here and there.

    I was going to write a big fancy statement of intent, but come on. I’m a blogger. I’m going to write about the aspects of operating MMOs that interest me.

    More about me: here. More about the job: future posts. Onward.