Lydia Leong has a great post about the question of speedy provisioning. As she says, the exciting bit about getting new hardware in place isn’t the OS and software installation. Even if you’re not virtualized, you can install a new server unattended in a couple of hours. You want to be able to do that even if you never expect to grow, because you need to be able to rebuild servers quickly if they die. This isn’t hard to manage. In a pinch, people will sell you solutions and you can get a consultant in, but it’s easier to just plan ahead.

She hits on the internal aspect: getting someone to sign off on the new servers. If we’re talking about the need to buy more capacity on short notice in our industry, we’re probably talking about launch, which means this problem isn’t so bad for us. But you’ve got to get the ducks lined up in advance. You don’t want to shock your CEO with an order on the third day of launch; she’s worrying about other stuff. Better to get the plan in writing way in advance, along with executive buyoff. Then you can tell the appropriate people you need ten more shards, get the documents signed, and get your vendor moving.

I think that’s a bit trickier than Lydia says, but I also think she’s talking about onesies/twosies. Buying one server is easy, as she notes. Buying a hundred servers for serious expansion is going to take a bit longer, because Dell and HP and IBM hate keeping too much backstock around, so they’re going to have to build those servers for you.

You can alleviate this, of course. First tactic is to let them know it’s coming. None of those companies are going to increase their inventory just for the sake of your possible buy, because you’re too small, unless you’re Blizzard. However, you can and should get some commitments around response time. You can also, and I think this is more important, find out what’s going to ship the fastest. There’s no reason why you shouldn’t take that as an input to your hardware decision matrix. If all else is equal, go with the servers that generally have the largest inventory. Or ask questions about factories: can your vendor literally build 1U servers faster than blades?

Also, make sure the vendor order process is just as quick. As with all vendors, you want your hardware sales people to be on call during the two weeks around launch. Midnight calls are very unlikely; weekend calls are more probable.

Finally, figure out how you’re going to rack and stack a hundred servers quickly. Could be your vendor’s professional services, could be some local contractor. Even if your internal staff racked the rest of the servers, it’s better not to ask them to spend the time in the machine room during launch, cause you are going to be doing a lot of other things.

None of this is really all that hard, it’s just a great example of one of the many rows you need to have your ducks in. It’s not difficult, it’s merely if

CCP Yokai, the Technical Director over in EVE-land, just posted a dev blog about their new rack setup. This is pretty rare insight for any operation, so it’s definitely worth reading. You don’t get the nitty-gritty details but you get a good overview.

They’re located in 12 cabinets. That apparently covers their single server, their test server, and ancillary services. If you don’t know, EVE is a single shard setup, which is really technically impressive. They crowd all 50-60K concurrent players into one world. That’s one big reason why network connectivity is so important to them. Yokai mentions it a few times in the blog. That’s a very high quality network he’s got set up, probably because most of those 64 servers may need to talk to any other server. Compare that to an infrastructure where 10 servers make up a shard. I can’t know for sure but it sure seems like you’d need to be ready for more interconnections.

He’s using blades, and the blades have a lot of RAM. IBM makes a really solid blade, by the way. The HS21 is I think one generation old; they’re currently selling the HS22s in that price/performance spot, but once you’ve bought a bunch of blades you don’t upgrade unless you need to. The interesting thing to me is the amount of RAM they’ve got in each blade. 32GB is a fair bit. I don’t want to speculate too much but CCP has never been shy about smart ways to use the fastest possible resources, and RAM is fast. See also that big 2 terabyte SSD SAN (storage area network) he mentions.

Lots of blades means lots of heat. I am not surprised that they need a self-contained cooling system. I should talk some about the blade vs. 1U server question, since while blades do take up less physical space, the practical space they consume may not save you much. On the other hand, as noted, CCP needs the fast interconnects. Blades do help there.

Don’t miss the comment thread, either. The devs are again being very open about some of their choices, which is awfully nice of them.