> But we are talking about performance... Having something in a single table that is denormalized is always going to be faster than having an elegant data model with "Everything In It's Right Place"
Unless you specify the workload, that's anywhere between completely true and exactly incorrect. Do you have big values you're always interested in and a couple of tiny ids? That's probably going to be faster in one table.
Are you querying only the metadata most of the time and the big value is multiple KB, almost never accessed? You're just killing your readahead and multiple levels of caches for no reason. "always going to be faster" is always incorrect ;)
Querying a single table with an indexed key will be faster as a single lookup than doing a JOIN, let alone several. That said, it really depends on your load, and if you're not dealing with hundreds of thousands of simultaneous users, and haven't over-normalized your data, you can get by with a lot of options. And a good caching layer for mostly-read scenarios will likely get you further anyway.
That said, use a system that's a good match for your data model(s)... if you're data can fit in a handful of collections, but may have varying object shapes/structure for semi-related data a document store may work better. Need massive scale, C* may be your best bet. There are use cases that are great fits for just about every database server that's been made. Some similar options may be a better fit, but it all depends.
Personally, I'm hoping to see in the box replication options for PostgreSQL in the next few releases, then it will probably be my go to option for most stuff. RethinkDB is pretty damned nice too, and you should be able to scale to many millions of users.
Once you hit certain scales, you usually have to implement a number of solutions... be they sharding, caching or queuing to deal with higher load. It depends on where your bottlenecks are.
Unless you specify the workload, that's anywhere between completely true and exactly incorrect. Do you have big values you're always interested in and a couple of tiny ids? That's probably going to be faster in one table.
Are you querying only the metadata most of the time and the big value is multiple KB, almost never accessed? You're just killing your readahead and multiple levels of caches for no reason. "always going to be faster" is always incorrect ;)