I’m never sure how mystifying my job is to the average person. I do know that even technophobes don’t always really know what technical operations does beyond “they’re the guys who keep the servers running,” and I like talking about my job, so I figured I’d expand a bit on the brief blurb and talk about what a typical tech ops team does from time to time.
I’m going to try to use the term “technical operations” for my stuff, in the interests of distinguishing it from operations in general. When a business guy talks about operations, he’s probably talking about the whole gamut of running a game (or a web site, whatever). This includes my immediate bailiwick, but it also includes stuff like customer support, possibly community management, and in some cases even coders maintaining the game. It’s sort of a fuzzier distinction in online gaming; back in the wonderful world of web sites, there’s not a ton of distinction between development pre-launch and development post-launch. Gaming tends to think of those two phases as very different beasts, for mostly good reasons. Although I think some of that is carryover from offline games. I digress! Chalk that up for a later post.
So okay. My primary job is to keep servers running happily. The bedrock of this is the physical installation of servers in the data center. This post is going to be about how you host your servers.
Figure any MMO of any notable size will have… let’s say over 100 servers. This is conservative; World of Warcraft has a lot more than that. There’ll also be big exceptions. I think Puzzle Pirates is a significant MMO and given that it’s a 2D environment, it might be pretty small in terms of server footprint. Um, eight worlds — yeah, I wouldn’t be surprised if they were under 100. But figure we’re generally talking in the hundreds.
You don’t want to worry about the physical aspect of hosting that many servers, especially if you’re a gaming company, because then that’s really not your area of expertise. My typical evaluation of a hosting facility includes questions about how many distinct power grids the facility can access; if, say, Somerville has a power outage I’d like it if the facility could get power from somewhere else. I want to know how long the facility can go without power at all, and how often those backup generators are tested. I want to know how redundant the air conditioning systems are. I want to know how many staff are on site overnight. I want to know about a million things about their network connectivity to the rest of the world. This is all both expensive and hard to build, and why buy that sort of headache? There are companies who will do it for you, and it will be more cost effective, because they’re doing it on a larger scale.
If I’m starting from the ground up, step one is choosing the right hosting facility. Call it colocation if you like. Some people spell that collocation, which is not incorrect but which drives me nuts. (Sorry, Mike.) You start out with the evaluation… well, no. You start out by figuring what’s important to you. As with everything, you need to make the money vs. convenience vs. quality tradeoffs. A tier 1 provider like AT&T or MCI can be really good, but you’re going to pay more than you would for a second tier provider, and that’s not always a wise choice.
My full RFP (request for proposal) document is thousands of words of questions. I won’t reproduce the whole thing here. Suffice it to say that this choice is one of the most important ones you’re going to make. You do not want the pain of changing data centers once you’ve launched. Even once you’ve launched beta. It’s good to get this one right.
There’s also a fair amount of ongoing work that goes into maintaining the relationship, because the bill for hosting is one of your biggest monthly costs. Every month, you have to go over the bill and make sure you’re getting charged for all the right things. I have worked with a lot of colocation facilities and even the best of them screw up billing from time to time.
It’s also smart to basically keep in touch with your facility. You need to figure out who the right person is — probably your Technical Account Manager, maybe someone else. I’ve had relationships where the right guy to talk to was my sales guy, because he loved working with a gaming company and he was engaged enough to look at our bills himself every month to make sure they were right. You want to talk to someone at least once a month, in any case, for a bunch of reasons.
First off, if they’ve got concerns, it’s an avenue for them to express them informally. Maybe you’re using more power than you’re paying for. Maybe your cage is a mess, in which case shame on you and why didn’t you already know about it? But you never know. Maybe there’s a new customer that’s about to scoop up a ton of space in your data center and you won’t have expansion room available.
If you’re talking to your key people regularly, they’re going to keep you in mind when things like that last happen. Often enough you can’t do anything about it; it’s still good to know.
Oh, and if your hosting provider has some sort of game-oriented group, latch onto it! AT&T has an absolutely great Gaming Core Team; when Turbine hooked up with them, our already good service got even better.
Like any relationship with any vendor, you’re going to get more out of it the more you put into it. You don’t stop worrying once you sign the contract.