Relational databases: please no!
OK. It is completely obvious that any MMO is going to need a way to store data. I understand that the instinctive reaction is to use a relational database, because that’s what relational databases are for. However, I beg of you as the guy who needs to keep the things running fast and smooth, think twice.
Yes, you get the ease of writing code against a mature system. You also get slower response times and more fiddly parts. If you haven’t hired a really good DBA to work on schema design, you are going to find out all about the ways in which relational databases can be slow under load.
Relational databases are really good at transactional integrity. It’s probably worth thinking about whether or not that’s a key feature. Most games don’t implement it very well right now. If the game crashes five seconds after you kill a monster, do you really feel confident that the loot will be in your bags when you log back in? I don’t.
Besides, you can get transactional integrity out of non-relational databases too. I’m going to cite a bunch of systems with weaker integrity later on, but they’re weaker because they’re big distributed systems spanning multiple datacenters. Games typically do not have that problem.
Relational databases do not scale cheaply. They can scale well, particularly if you bite the bullet and pay for Microsoft SQL or Oracle or something. They don’t scale cheaply.
It’s abominably easy to write bad relational database code. The problem here is that the failure state is not obvious. Bad SQL code reveals problems under load when there’s a lock on one table because transaction A is in process, which blocks transaction B, which happens to be the transaction responsible for showing your player what she’s got in her mailbox. She now waits 30 seconds for the mailbox to load and bitches on the forums. These load problems happen to be one of the things that’s hard to test programatically. Experienced SQL programmers won’t make those mistakes, but inexperienced SQL programmers may not realize how easy they are to make.
Relational databases are not maximally fast. They can’t be, because one of the big concerns is the above-mentioned transactional integrity, and that takes time. If you’ve got some dataset that’s primarily read-only and isn’t too horrendously large, use an in-memory data store. For the tr
It is significant that the big players in the Web space all use non-relational databases heavily. I suspect this is in part the legacy of search — back at AltaVista, we’d have been somewhat horrified at the idea of using Oracle to serve up search queries. The smart people who did the original work at AltaVista, Google, and Yahoo wrote their own data stores, optimized for returning search results quickly.
This attitude stuck. I know a possible apocryphal story about Yahoo, which claims that user data was just stored in a standard filesystem. The code was supposedly hacked such that niceties like file update times weren’t stored anywhere. Speed was all. Whether or not that’s true, Yahoo’s still using speed-oriented database code (warning: PDF). So is Google. Amazon does the same sort of thing.
Now, OK, we’re not gonna write our own serious database systems. CouchDB and HBase probably aren’t there yet. I still really wish more people would ask if they need a relational DB. If you really do need one, make sure you’ve hired someone who’s worked with large systems before and remember that 95% of Web work is smaller than anything you’ll be doing.