Hey, that’s a week. Neat. Thoughts and questions for people who’ve found their way here:

Anything in particular you want to see? I have pending requests for another post about datacenters, something on customer service, and a piece on planning for usage spikes. If there’s anything in particular you want me to talk about, let me know.

For that matter, if there’s a general category of stuff which is more interesting, let me know that, too.

I fiddled around with the look of the blog a bit over the course of the week. Comment links are now at the bottom of each post instead of the top. I don’t imagine anyone really cares, but if you want those links at the top as well as the bottom, I could do that.

There is a Livejournal feed, which I should put in the sidebar. There is also now a Livejournal feed containing just excerpts, since the fairly large posts do chew up a bunch of room: imgnry_cgs_shrt. Not the most memorable name in the world but there’s a length limit on Livejournal syndication names.

I’m away on business Monday, so see you again probably on Tuesday. Thanks for coming by.

Daniel James of Three Rings (Puzzle Pirates, Whirled) made a great post with his slides from his GDC presentation. Attention alert: lots of real numbers! It’s like catnip for MMO geeks.

From a tech ops perspective, I paid lots of attention to those graphs. Page 7 is awesome. That is exactly the sort of data which should be on a graph in your network monitoring software; ideally it should be on a page with other graphs showing machine load, network load, and so on. Everything should be on the same timeline, for easy comparisons. It’s my job to tell people when we’re going to need to order new hardware; a tech ops manager should have a deep understanding of how player load affects hardware load. Hm, let’s have an example of graphing:

Cacti graphs showing network traffic and CPU utilization.
Cacti graphs showing network traffic and CPU utilization.

That’s cacti, which is my favorite open source tool for this purpose right now, although it has its limitations and flaws. This particular pair of graphs shows network traffic on top and CPU utilization for one CPU of the server below; not surprisingly, CPU utilization rises along with network traffic. Data collection for CPU utilization and network traffic is built into cacti, and it’s easy to add collection for pretty much any piece of data that can be expressed as a numeric value.

That sort of trend visualization also helps catch problem areas before they get bad. Does the ratio of concurrent players to memory used change abruptly when you hit a specific number of concurrent users? If so, talk to the engineers. It might be fixable. And if it isn’t, well, the projections for profitability might have just changed in which case you better be talking to the financial guys. Making sure the company is making money is absolutely part of the responsibility of anyone in technical operations; some day perhaps I’ll rant about the self-defeating geek tendency to sneer at the business side of the house.

Page 8, more of the same. The observant will notice one of the little quirks of gaming operations: peak times are afternoon to evening, and the peak days are the weekends. The Saturday peak is broader, because people can play during the day more on weekends. You might assume that browser-based games like Whirled would see more play from work, but nope, I guess not.

I wonder what those little dips on 3/17, 3/18, and 3/20 are? I don’t think Whirled is a sharded game, so that can’t be a single shard crashing. Welp, I’ll never know, but that’s a great example of the sorts of things graphs show. If those were because of crashes, you’d know without needing graphs to tell you because your pager would go off, but if it’s something else you’d want to investigate. Could be a bug in your data collection, for that matter, but that’s bad too.

Less tech ops, but still interesting: the material on player acquisition is excellent. Read this if you want to know how to figure out the economics of a game. If I were Daniel James, I would also have breakdowns telling me how those retention cohorts broke down based on play time and perhaps styles of play. What kinds of players stick around? Very important question. I believe strongly in the integration of billing metrics and operational metrics. That work is something that technical operations can drive if need be; all the data sources are within your control. It’s worth spending the time to whip up a prototype dashboard and pitch it to your CFO.

Then there’s a chunk of advice on building an in-world economy that relates to the real world. Heh: it’s MMO as platform again. Whirled is built on that concept, as I understand it. That dovetails nicely with his discussion of billing. When he says “Don’t build, but use a provider,” he is absolutely correct.

I love this slideshow. In the blog post surrounding it, he talks about how he feels it’s OK to give away the numbers. There are dangers in sharing subscriber numbers and concurrencies, particularly if you’re competing in the big traditional space, but I like seeing people taking the risk. There is plenty of room in the MMO space for more players and plain old numbers are not going to be the secret sauce that makes you rich. How you get those numbers is a different story. So thanks to Daniel for this.

I’m never sure how mystifying my job is to the average person. I do know that even technophobes don’t always really know what technical operations does beyond “they’re the guys who keep the servers running,” and I like talking about my job, so I figured I’d expand a bit on the brief blurb and talk about what a typical tech ops team does from time to time.

I’m going to try to use the term “technical operations” for my stuff, in the interests of distinguishing it from operations in general. When a business guy talks about operations, he’s probably talking about the whole gamut of running a game (or a web site, whatever). This includes my immediate bailiwick, but it also includes stuff like customer support, possibly community management, and in some cases even coders maintaining the game. It’s sort of a fuzzier distinction in online gaming; back in the wonderful world of web sites, there’s not a ton of distinction between development pre-launch and development post-launch. Gaming tends to think of those two phases as very different beasts, for mostly good reasons. Although I think some of that is carryover from offline games. I digress! Chalk that up for a later post.

So okay. My primary job is to keep servers running happily. The bedrock of this is the physical installation of servers in the data center. This post is going to be about how you host your servers.

Figure any MMO of any notable size will have… let’s say over 100 servers. This is conservative; World of Warcraft has a lot more than that. There’ll also be big exceptions. I think Puzzle Pirates is a significant MMO and given that it’s a 2D environment, it might be pretty small in terms of server footprint. Um, eight worlds — yeah, I wouldn’t be surprised if they were under 100. But figure we’re generally talking in the hundreds.

You don’t want to worry about the physical aspect of hosting that many servers, especially if you’re a gaming company, because then that’s really not your area of expertise. My typical evaluation of a hosting facility includes questions about how many distinct power grids the facility can access; if, say, Somerville has a power outage I’d like it if the facility could get power from somewhere else. I want to know how long the facility can go without power at all, and how often those backup generators are tested. I want to know how redundant the air conditioning systems are. I want to know how many staff are on site overnight. I want to know about a million things about their network connectivity to the rest of the world. This is all both expensive and hard to build, and why buy that sort of headache? There are companies who will do it for you, and it will be more cost effective, because they’re doing it on a larger scale.

If I’m starting from the ground up, step one is choosing the right hosting facility. Call it colocation if you like. Some people spell that collocation, which is not incorrect but which drives me nuts. (Sorry, Mike.) You start out with the evaluation… well, no. You start out by figuring what’s important to you. As with everything, you need to make the money vs. convenience vs. quality tradeoffs. A tier 1 provider like AT&T or MCI can be really good, but you’re going to pay more than you would for a second tier provider, and that’s not always a wise choice.

My full RFP (request for proposal) document is thousands of words of questions. I won’t reproduce the whole thing here. Suffice it to say that this choice is one of the most important ones you’re going to make. You do not want the pain of changing data centers once you’ve launched. Even once you’ve launched beta. It’s good to get this one right.

There’s also a fair amount of ongoing work that goes into maintaining the relationship, because the bill for hosting is one of your biggest monthly costs. Every month, you have to go over the bill and make sure you’re getting charged for all the right things. I have worked with a lot of colocation facilities and even the best of them screw up billing from time to time.

It’s also smart to basically keep in touch with your facility. You need to figure out who the right person is — probably your Technical Account Manager, maybe someone else. I’ve had relationships where the right guy to talk to was my sales guy, because he loved working with a gaming company and he was engaged enough to look at our bills himself every month to make sure they were right. You want to talk to someone at least once a month, in any case, for a bunch of reasons.

First off, if they’ve got concerns, it’s an avenue for them to express them informally. Maybe you’re using more power than you’re paying for. Maybe your cage is a mess, in which case shame on you and why didn’t you already know about it? But you never know. Maybe there’s a new customer that’s about to scoop up a ton of space in your data center and you won’t have expansion room available.

If you’re talking to your key people regularly, they’re going to keep you in mind when things like that last happen. Often enough you can’t do anything about it; it’s still good to know.

Oh, and if your hosting provider has some sort of game-oriented group, latch onto it! AT&T has an absolutely great Gaming Core Team; when Turbine hooked up with them, our already good service got even better.

Like any relationship with any vendor, you’re going to get more out of it the more you put into it. You don’t stop worrying once you sign the contract.

Speaking of streaming games, OnLive wants to implement that cloud gaming solution. 720p resolution at 60 FPS — hey, that’s really similar to what Dyack was saying would be possible, huh? I think some people are going to be disappointed, since there’s not much OnLive can do about intermediary network problems, but we’ll see.

This leaves the question of cost. I’m wondering about the cloud computing resources OnLive plans on using. Barring substantial rewrites, the cloud would need to have either high end video cards or something capable of a really good emulation, right? Maybe some custom hardware to provide banks of nVidia/ATI processors? You wouldn’t run this on a standard cloud, because a standard cloud doesn’t provide really good DirectX capacity.

I can’t really speculate honestly because they don’t talk about their pricing model at all. If they charge a buck an hour, then they’re making enough per year ($8,760) to pay for a single desktop-class computer outright. Assume hosting is another couple hundred bucks a month? I don’t know, because I don’t know what their hardware is. Add on headcount for ops, headcount for everything else, a percentage for the game publishers. I don’t think that really adds up well even if you amortize it out over three years. I’m also simplifying, because the majority of computer equivalents won’t be in use 24/7.

$20/month for an all you can eat subscription? There are 720 hours in a month. Say we’re targeting an average of $2/hour for hours played. The average usage for this ought to be higher than 10 hours a month. You’re selling convenience, after all.

$10 for 24 hours in realtime with one game? That’s not going to fly with consumers.

Whoops, I speculated after all. It’s an intriguing question. Steve Perlman has a decent track record, so I can’t assume this is just a publicity bubble. He does seem to be the kind of guy who’ll spend as long as it takes to polish a product. We may be waiting longer than Winter 2009 to see this sucker.

Greg Costikyan wrote a takedown of Denis Dyack’s editorial on cloud computing and gaming. I think Costikyan’s sort of right, but the semantic errors don’t totally invalidate what Dyack’s trying to say. Even if he’s saying it poorly.

I disagree with Costikyan’s definition of cloud computing. He’s basically defining it by example as Amazon’s cloud computing offering, which allows random people to power up remote compute services in a scalable fashion. I agree that right now, there’s not much value in Amazon-style offerings for game companies. (Note: future post on this, because we ought to be thinking about that particular question for a bunch of reasons.) On the other hand, that’s not what Dyack is talking about and his definition is both broader and more accurate.

He’s talking about Google Docs — or hey, Gmail — in the gaming context. Google Docs is absolutely an example of cloud computing. It happens to be the case that the company providing the service owns the computers on which the service runs, but from our perspective, the documents and the software live out there in the cloud.

From a business-speak perspective, Costikyan is talking about IaaS: Infrastructure as a Service. Cloud computing includes IaaS, but it also includes SaaS, or Software as a Service. Google Docs is Software as a Service; it’s a full featured program that mostly runs on servers, with a relatively lightweight client. Dyack’s talking about SaaS.

And yeah, MMOs are in fact specialized versions of SaaS. I’ve been using that line when I interview at non-gaming companies. It makes people more comfortable when I accurately categorize the last six years of my career as working on SaaS, which I find both pleasing and amusing.

On point two, yep. Linear entertainment is not a commodity. That was a cute way for Dyack to say it’s easy to pirate linear entertainment. But Dyack is right about that, even if his terminology was sloppy again.

Point three, however, is where Dyack is wrong, and it’s for exactly the reasons Costikyan outlines. The user’s already spent money on the desktop CPU. It’s less profitable for gaming companies to pay for CPUs to do work that users can already do. Not too complex.

I guess you could argue that freeing yourself from the risk of piracy is worth a certain investment in servers. I’m not sure how high that value really is. The books I’d want to read… probably Popcap, right? Casual browser games are SaaS. Popcap’s games migrate from browsers into standalone games as a matter of course, so that must be a profitable business decision, assuming the Popcap guys aren’t dumb.

Finally, it’s worth noting that Dyack slipped a casual “Imagine if technology allowed us simply to broadcast a video signal (games) at 60fps at 720p through a server” in there. Yeah, I can imagine that. It’s not all that close, if you assume that you don’t want network lag to affect your gameplay. And you don’t. You also want to make sure college dorms can all play your game at once without problems. Etc.

Relational databases: please no!

OK. It is completely obvious that any MMO is going to need a way to store data. I understand that the instinctive reaction is to use a relational database, because that’s what relational databases are for. However, I beg of you as the guy who needs to keep the things running fast and smooth, think twice.

Yes, you get the ease of writing code against a mature system. You also get slower response times and more fiddly parts. If you haven’t hired a really good DBA to work on schema design, you are going to find out all about the ways in which relational databases can be slow under load.

Relational databases are really good at transactional integrity. It’s probably worth thinking about whether or not that’s a key feature. Most games don’t implement it very well right now. If the game crashes five seconds after you kill a monster, do you really feel confident that the loot will be in your bags when you log back in? I don’t.

Besides, you can get transactional integrity out of non-relational databases too. I’m going to cite a bunch of systems with weaker integrity later on, but they’re weaker because they’re big distributed systems spanning multiple datacenters. Games typically do not have that problem.

Relational databases do not scale cheaply. They can scale well, particularly if you bite the bullet and pay for Microsoft SQL or Oracle or something. They don’t scale cheaply.

It’s abominably easy to write bad relational database code. The problem here is that the failure state is not obvious. Bad SQL code reveals problems under load when there’s a lock on one table because transaction A is in process, which blocks transaction B, which happens to be the transaction responsible for showing your player what she’s got in her mailbox. She now waits 30 seconds for the mailbox to load and bitches on the forums. These load problems happen to be one of the things that’s hard to test programatically. Experienced SQL programmers won’t make those mistakes, but inexperienced SQL programmers may not realize how easy they are to make.

Relational databases are not maximally fast. They can’t be, because one of the big concerns is the above-mentioned transactional integrity, and that takes time. If you’ve got some dataset that’s primarily read-only and isn’t too horrendously large, use an in-memory data store. For the tr

It is significant that the big players in the Web space all use non-relational databases heavily. I suspect this is in part the legacy of search — back at AltaVista, we’d have been somewhat horrified at the idea of using Oracle to serve up search queries. The smart people who did the original work at AltaVista, Google, and Yahoo wrote their own data stores, optimized for returning search results quickly.

This attitude stuck. I know a possible apocryphal story about Yahoo, which claims that user data was just stored in a standard filesystem. The code was supposedly hacked such that niceties like file update times weren’t stored anywhere. Speed was all. Whether or not that’s true, Yahoo’s still using speed-oriented database code (warning: PDF). So is Google. Amazon does the same sort of thing.

Now, OK, we’re not gonna write our own serious database systems. CouchDB and HBase probably aren’t there yet. I still really wish more people would ask if they need a relational DB. If you really do need one, make sure you’ve hired someone who’s worked with large systems before and remember that 95% of Web work is smaller than anything you’ll be doing.

Blizzard has decided that they don’t want anyone making money from writing in-game addons. This isn’t too surprising. In broad strokes, you can go two ways when it comes to your game: you can try and hold onto all the potential profits yourself, or you can open up the ecosystem to others. Either direction has pros and cons.

In this case, if we’re looking for specific addons which may have prompted the action, we gotta start with Carbonite. Carbonite is basically a quest guide with a million other features baked in; it makes it easier to level. Carbonite has two features which distinguish it from older attempts at commercial addons.

One, it’s aimed at a profitable market. Quest guides and gold-making guides are real business these days — and the companies behind them get bought for real money. Nobody’s successfully selling how to raid guides for money —

Quick digression. Raiding is a group experience; WoW leveling is not. WoW raiders are fairly likely to have at least semi-clued friends. It’s very easy for a solo WoW player to lack such friends; thus, leveling guides have a bigger market. Interesting unanswered question: will Blizzard’s efforts to make raiding more casual result in a bigger market for commercial raiding guides? Digression ends.

— so yeah, RDX didn’t make enough money to keep the developer working on it.

Second, Carbonite went for a free/premium model, with the free version showing ads in-game. I suspect that’s a bigger reason for the change than one might suspect. One addon showing advertisements is no huge deal, albeit annoying. When a majority of your addons are doing it, the user experience is negatively affected.

Carbonites in-game advertising.
Carbonite's in-game advertising. Not subtle.

However, if ads were all Blizzard was worried about, the policy would be different. Blizzard clearly wants to control the monetary space around their game, and why shouldn’t they? They created the platform; they should get to profit from it in the manner they choose.

The best example of the opposite approach is Linden Labs and Second Life. The Lindens go all in with an explicit definition of their product as a platform, which is accurate. They want to sell a basic service that third parties can build on, and their basic service is pretty well tuned for that purpose.

That approach does work. For a traditional Diku-style MMO, however, you’d open yourself up to worries about RMT; once you open the door to micropayments, people start getting agitated.

I don’t actually think that’s an attitude likely to last. I’m old enough to remember when people thought advertisements on the Web were an abomination. Heck, I’m old enough to remember when people thought the Internet should never be used for commercial purposes. We pay for tickets to sporting events, and we don’t freak out when the ticket has an advertisement on it. We pay a monthly fee for cable service, but premium channels still have advertisements.

I think by making this change Blizzard’s actually opened a few doors. Intelligent eloquent people are making voluble arguments against the new restriction. Mostly the one about donations. A couple of popular addons are going to go away, and everyone’s going to know it’s because Blizzard said you can’t charge money/ask for donations. If you liked QuestHelper or Outfitter, there’s a decent chance you’ll be biased towards those arguments.

So while it’s probably not a reasonable transition for WoW, what if the next game Blizzard publishes comes with an iPhone style App Store? Blizzard would get a few nice effects there. First, they’d take — say, 20% of the revenue stream. I pulled that out of my hat; if I were doing ops for Blizzard I’d run the numbers and be smarter. I don’t know if Blizzard logs the addons a player uses, but if I were in charge over there they would, so let’s assume they do. You could make a pretty good stab at the size of the stream.

Second, they’d have a lot more control over the addons available, but no more control than they wanted to have. Again, c.f. Apple. There are a million low-class fart apps for the iPhone; it doesn’t reflect on Apple’s quality. On the other hand, if Blizzard wanted to screen out crap, they could.

Third, the classic problem of distributing addons could be solved. A lot of WoW players rely on addons and feel like they can’t play well without them. On a big patch day, old addons break. Addon sites tend to die under the load of millions of players trying to get addons at once. This is, like it or not, part of the WoW experience. Making it better is a relatively small win, but it’s a win.

The traditional arguments against Blizzard control of addons are workload and responsibility. Curse shows 3,727 addons. WoW Interface shows 2,122 standalone addons plus 459 in the Featured Projects section; they do sort out obsolete addons. This is not a crushing workload. It’s probably one person.

Responsibility is a bigger problem. It’s not so much responsibility to the players — they’ll understand that addon quality isn’t certified. The problem is the need to present a sane relationship to your developers. The key word I kept sneaking in up above: “platform.” WoW’s UI API has been fairly stable, but it’s also always been very clearly and aggressively prone to change. Running it as a platform doesn’t mean you can’t change it. It does mean you have to manage the community better.

In particular, it would be nice if addons didn’t potentially break every time there’s an update to the game. This is a bigger workload than screening submissions.

Still. QuestHelper is about to be cancelled. It has been downloaded, from Curse alone, over 20 million times. There have been around 100 updates, so let’s divide that 20 million by 100, assuming that every user has downloaded every update. 200,000 people have downloaded QuestHelper from one site. Maybe it costs two bucks in the hypothetical store. 20% to Blizzard is $80,000 over the course of the last two years. That’s not pure profit, of course.

I’m cheating, because at a brief glance QuestHelper is the most downloaded addon on Curse. I’m also cheating because on the one hand, I’m being conservative and assuming that each user downloaded the addon 100 times; on the other hand, I’m assuming each download would have been a sale. Who knows? If I were Blizzard I’d have better numbers and be able to do better math. Perhaps the revenue share should be 30%. Maybe it doesn’t make sense at all.

I sort of doubt that you can entirely turn addons into a profit center. But they aren’t supposed to be a profit center — they’re a tool to make the game easier to play and more attractive. If you can make them stickier, you enhance the game, and letting people have a monetary stake in the success of your game is a marketing win.

I’ve been thinking about writing a Massively Multiplayer Online RPG blog for, oh, years or so now. I’ve never quite felt comfortable starting one while I was working for Turbine or Vivox. Turbine, because I didn’t want to risk slipping into talking too much about what I was actually doing, and Vivox, because I wanted to be comfortable expressing opinions without making our customers potentially angry. In both cases I don’t think the waters would have been that hard to navigate, but better safe than sorry, right?

Also, I have a massive fear of being outed on the forums. We have community managers to take the heat when downtime runs long. It’s easier when customers think of us ops guys as a sort of faceless amoeba which cannot reasonably bear blame for anything.

So what changed? Well, first off, I’m unemployed. This means I have a certain amount of spare time and some of the previous worries have gone away. Obviously, I’ll still steer clear of anything covered by NDAs, for both practical and moral reasons, but it’s helpful knowing I’m not in any way likely to be seen as the voice of anyone but myself.

Second… you know, people do this. Scott Jennings blogs. Anthony Castoro blogs. Half of 38 Studios blogs. Eric Heimberg and Sandra Powers blog together — well, maybe that’s a bad example, I dunno if they’re ever going to be foolhardy enough to work in MMOs again. But you get the point; it’s OK to have personal opinions.

Third, there still aren’t many if any people blogging about MMO operations. Fertile ground! And hey, I have an ego on me: I think I can say useful and relevant things.

So there’s a topic for you. Massively Multiplayer Online Operations. I’m primarily interested in the gentle discipline of running the datacenters and all the myriad of details that surround that task, because that’s what I’ve done for the last fifteen years of my life. (Not always in gaming.) I take code and content from the developers, or the release managers, or QA, and I ensure that it winds up on the servers I chose, bought, and installed. After it goes live, I lie awake at night worrying about whether or not it’ll crash. If it does, my team and I bring the servers back up, gather data, and do what we can to help developers make sure it doesn’t happen again.

I also worry about a lot of other things, though. Any ops guy who thinks of the above as the sum total of his job isn’t any good. I do due diligence on other companies to help choose good vendors and good partners. I care about billing, business development, customer service (a lot). I hopefully help developers write server code that makes sense in our datacenter environment.

Maybe I just like having a peek into everything. But man, it makes my life easier when I do, so I’ll talk a bit about all that stuff.

I do not know much about game design, other than as a player. I have strong opinions there. They aren’t really informed, though, other than that I don’t tend to think that the devs are incompetent boobs who’re out to get players whenever possible. The evidence against that is too strong. Anyways, I won’t geek much about game design except where it overlaps with operations, which is here and there.

I was going to write a big fancy statement of intent, but come on. I’m a blogger. I’m going to write about the aspects of operating MMOs that interest me.

More about me: here. More about the job: future posts. Onward.