So much to disagree with here ... > Locks limit access to the table, so on a hig...

spotman · on May 24, 2016

>> Locks limit access to the table, so on a high throughput use case it may limit our performance.

> Then use a proper database that implements MVCC.

InnoDB does implement MVCC. MVCC is not a silver bullet.

>> Do not use transactions, which introduce locks. Instead, use applicative transactions.

> Or just use a database that handle transactions more efficiently.

Easy to say, hard to implement at this scale. If you do a lot of writes and reads concurrently to a hot dataset, it's really quite hard to beat this architecture. This is why its such a popular and battle tested solution for many extremely high scale applications with workloads like this. Not to mention extremely well understood.

>> Do not normalize.

> Bullshit. Normalize as much as is practical and denormalize as necessary. It's much easier to denormalize and it greatly simplifies any transaction logic to deal with a normalized model.

But we are talking about performance... Having something in a single table that is denormalized is always going to be faster than having an elegant data model with "Everything In It's Right Place"

>> Fields only exist to be indexed. If a field is not needed for an index, store it in one blob/text field (such as JSON or XML).

> This is terrible advice.

So facebook/friendfeed, uber, dropbox, and many more or wrong then. Ok.

This is really all best practice for running something like this.

Of course it flies in the face of best practice for running a smaller system. Is there tradeoffs? Absolutely! Would it be smart to do this if the need for this scale is not obvious? Probably not.

You end up having more logic in your application and coordination layers, but this is all pretty good advice for people at this scale, and certainly not bad at all.

koolba · on May 24, 2016

> Of course it flies in the face of best practice for running a smaller system. Is there tradeoffs? Absolutely! Would it be smart to do this if the need for this scale is not obvious? Probably not

From the article:

> The routes table is of the order of magnitude of 100,000,000 records, 10GB of storage. > The sites table is of the order of magnitude of 100,000,000 records, 200GB of storage

That's tiny. Both of those easily fit in memory on modern hardware. This isn't cough web scale, this is peanuts.

The savings from having a simpler system that operates both transactional and the lack of disparate CASE/IF logic would win over this monstrosity of a design.

For a counterpoint where this type of model makes more sense check out Ubers data model[1]. Similar setup but more applicable use case and (without having any inside intel on it) I'd wager is justified.

[1]: https://eng.uber.com/schemaless-part-one/

spotman · on May 24, 2016

So your argument has shape shifted from "This is terrible advice" to "this is terrible advice unless your at uber scale".

Sounds like we are in agreement then - at high enough scale - this is solid advice.

koolba · on May 24, 2016

> So your argument has shape shifted from "This is terrible advice" to "this is terrible advice unless your at uber scale".

No my argument is this particular design is both unjustified for the use case and poorly thought out/implemented. The uuid as varchar(50) is a dead giveaway of amateur status.

jdiscar · on May 25, 2016

Are you saying a NoSQL solution is better for this use case? Because that's what this article is asking. Sure, they could do things better (can't we always?), but that's not the point they're trying to make.

Like you said, 10GB of data isn't very much, it really doesn't matter if you go with NoSQL or SQL. But SQL will probably give you more flexibility and will be easier to manage until you get really, really big.

manigandham · on May 25, 2016

Actually yes, a single key-value database could handle all of their load with better functionality. Make that 2 nodes and you have HA and redundancy.

Problem solved.

jbergens · on May 25, 2016

I think you loose flexibility with sql. If the usage changes slightly and you need to index one or two more fields you would have to do an alter table and read through 200 GB of data deserializing all blobs to put the values into the new field. A good nosql would just add a secondary index. You might also have a hard time doing map-reduce on sql. It is often built into nosql systems.

johanneskanybal · on May 25, 2016

Not sure how you manage to miss the point of the article. I'll break it down for you:

* A lot of traditional sql solutions have scaling issues.

* A lot of nosql solutions for these issues have become popular in recent years. Their main use case is web scale (simplification)

* A ton of actual use cases fall between those two chairs i.e they would have scaling issues with traditional sql systems but don't really have billions of transactions to process.

The article presents some valid real world approaches for this third case, some of it might be case dependant but overall pretty good advice.

tremon · on May 25, 2016

they would have scaling issues with traditional sql systems but don't really have billions of transactions to process.

If you have scaling issues with traditional SQL systems, the first thing you should do is understand your problem, not cargo-cult to a different paradigm.

More specifically:

* You do not get to complain about SQL transaction speed if you use denormalized tables.

* You do not get to complain about SQL transaction speed if your primary keys are varchar(50).

* You do not get to complain about SQL transaction speed if you store all non-identifying data in a single json blob.

calpaterson · on May 25, 2016

> Having something in a single table that is denormalized is always going to be faster than having an elegant data model with "Everything In It's Right Place"

This one folk wisdom that is untrue. There are significant speed disadvantages relating to large blobs of data the database doesn't understand. Serialisation time makes returning large JSON/XML objects expensive when you only need a small part. Overwriting a whole object to increment a counter is an unnecessary source of IO. Duplicating JSON keys in every record bloats the size of your working set, making it more difficult to fit into memory (or the fast part of your SAN).

99% of denormalisation out there is unnecessary and has inferior performance. The best route to performance with row store SQL databases (any database?) is two fold: 1) get an expert on your database to help you write your code and 2) get an expert on your hardware to help you choose wisely. Denormalisation is typically a way to take a performance problem and make it worse while introducing data corruption and race conditions.

jeffffff · on May 26, 2016

the reason for putting everything in json in one column is because alter table on a large database can take days. the only sql database i'm familiar with that doesn't have this problem is tokudb.

> Serialisation time makes returning large JSON/XML objects expensive when you only need a small part.

the expensive part of reads is finding the row on disk. once you've found the row the cost of reading part of the row vs the whole row is negligible. amount of data sent over the network doesn't matter either in 99% of cases. these days network bandwidth is orders of magnitude greater than random IO bandwidth on a spinning disk and still greater than random IO bandwidth on a SSD assuming you're using 10GbE.

> Overwriting a whole object to increment a counter is an unnecessary source of IO.

there is no way to write 4 bytes to a hard disk. disk io is performed in multiples of 4096 bytes, so it doesn't matter whether you just update the counter or update the whole blob. only incrementing the counter may allow you to write less data to the write ahead log, so you may save some IO there, but most databases put the whole row in the log anyway so it doesn't matter.

> Duplicating JSON keys in every record bloats the size of your working set, making it more difficult to fit into memory (or the fast part of your SAN).

this is definitely true, it's better to use an extensible but compact format like protobuf or thrift if you have a significant amount of data. or you could use a better database and not have to worry about the cost of adding columns.

calpaterson · on May 26, 2016

> alter table on a large database can take days

There's a lot to migrations in these various implementations but in short: every variant of alter table is an "online" operation (meaning it doesn't hold an exclusive lock) in InnoDB as of MySQL 5.6 and it's possible with various well-supported third party tools before that. For Postgres: most migrations can be done online and those that do hold a lock are typically constant time.

Admittedly migration has been a big problem in the past but that hasn't been true for years now.

> the expensive part of reads is finding the row on disk

Hopefully most SQL queries in an optimised setup are not finding a row on disk! The difference between reading the whole row from on-disk heap storage and reading the interesting part of it from an index in memory is in fact considerable: 1000x or more - and obviously far worse if you have to scan through any of the heap storage.

> amount of data sent over the network doesn't matter either in 99% of cases

It actually matters hugely in the case of JSON blob storage because it all has to be doubly deserialised on the other end - first from the db's wire protocol and then from JSON. There are many apps out there for which JSON serialisation is a bottleneck (for the CPU, not the network) - that's why there are so many "fast JSON" libraries.

Good point - you could mitigate this by using something quicker. I haven't seen anywhere do that - the ability to read the database with other tools is normally useful

> most databases put the whole row in the log anyway so it doesn't matter

re: this whole topic - I'm not proposing switching a JSON blob with a table for which every JSON field was reified into a column. I'm comparing to a normalised design with narrow tables (most tables have fewer than 5 columns). The other stuff about serialisation applies.

viraptor · on May 25, 2016

> But we are talking about performance... Having something in a single table that is denormalized is always going to be faster than having an elegant data model with "Everything In It's Right Place"

Unless you specify the workload, that's anywhere between completely true and exactly incorrect. Do you have big values you're always interested in and a couple of tiny ids? That's probably going to be faster in one table.

Are you querying only the metadata most of the time and the big value is multiple KB, almost never accessed? You're just killing your readahead and multiple levels of caches for no reason. "always going to be faster" is always incorrect ;)

tracker1 · on May 25, 2016

Querying a single table with an indexed key will be faster as a single lookup than doing a JOIN, let alone several. That said, it really depends on your load, and if you're not dealing with hundreds of thousands of simultaneous users, and haven't over-normalized your data, you can get by with a lot of options. And a good caching layer for mostly-read scenarios will likely get you further anyway.

That said, use a system that's a good match for your data model(s)... if you're data can fit in a handful of collections, but may have varying object shapes/structure for semi-related data a document store may work better. Need massive scale, C* may be your best bet. There are use cases that are great fits for just about every database server that's been made. Some similar options may be a better fit, but it all depends.

Personally, I'm hoping to see in the box replication options for PostgreSQL in the next few releases, then it will probably be my go to option for most stuff. RethinkDB is pretty damned nice too, and you should be able to scale to many millions of users.

Once you hit certain scales, you usually have to implement a number of solutions... be they sharding, caching or queuing to deal with higher load. It depends on where your bottlenecks are.

Jweb_Guru · on May 25, 2016

> Querying a single table with an indexed key will be faster as a single lookup than doing a JOIN, let alone several.

Not always, particularly when you're returning multiple keys at once. Some comparisons with column stores would be instructive.

wvenable · on May 24, 2016

To be fair, this article is about using MySQL as a NoSQL storage and so all of this advice is geared towards that use-case. I'd kill for so much traffic that any of this would be necessary as opposed to any RDBMS best-practices.

I do agree that UUIDs should be stored differently -- the use of varchar rather than a fixed length type for a primary key will hurt performance.

agentgt · on May 24, 2016

We use varchar for UUID (on postgres) which surprisingly hasn't been that terribly performance wise. And yes we do use varchar(36) although on postgres it doesn't really matter because I think almost all varchars are text.

I would love to switch to native UUID someday though.

narfz · on May 24, 2016

we did the same and then switched to the native UUID type. it eliminates the need for a unique index and we saw a drop in storage space by 1/2. it's totally worth converting UUID to the uuid field.

ALTER TABLE my_table ALTER COLUMN my_uuid TYPE uuid USING uuid::uuid;

olavgg · on May 24, 2016

And you want as much of the index as possible in RAM. When the index is 10 times larger than necessary and do not fit in your RAM, you get a very expensive performance penalty!

StevePerkins · on May 25, 2016

Same, but we use raw binary columns (i.e. "BINARY(16)" for MySQL and "BYTEA(16)" for PostgreSQL).

Conversion between the database-native UUID type and the application-layer UUID type is often so flaky (particularly with Java and JPA).

elmigranto · on May 25, 2016

All the `CHAR`s in Postgres are `TEXT`, from the docs:

Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs and slower sorting. In most situations text or character varying should be used instead.

jdiscar · on May 25, 2016

Yeah, VARCHAR and TEXT are backed by the same datatype, but VARCHAR(N) gives you a limit check (that can be a pain to change if you ever need to.) There shouldn't be any noticable performance difference. For UUIDs, the native type is way better.

calpaterson · on May 24, 2016

This article is pretty terrible but just in point of fact: it looks like they are using MySQL's InnoDB backend - which does support transactions and MVCC. If they're even talking about avoiding transactions for speed purposes (no matter how stupidly) they must be talking about Inno because in MyISAM BEGIN and COMMIT are no-ops.

nerfhammer · on May 24, 2016

The article says "Note that a transaction is using a DB-level lock that prevents concurrent writes—and sometimes reads—from the affected tables."

In innodb locks are row-level; myisam supports table-level locks though that's not and shouldn't be confused with a transaction; I don't know what a "database-level" lock is supposed to mean, are they really saying they're locking all tables to do a write? It doesn't sound like this author understands what a transaction is.

justizin · on May 24, 2016

> If they're even talking about avoiding transactions for speed purposes (no matter how stupidly)

As many have pointed out, so much criticism of this article ignores that it is comparing to other key-value stores which are not transactional. Many of what this would compete with are AP, with Consistency not guaranteed.

It really sounds like they should be talking about MySQL cluster, which is protocol compatible but a completely separate implementation and essentially a key-value store with RDBMS attributes atop it. It supports many-master mode like mongo and other distributed systems, which is fairly mandatory for replacing them. It's hard to argue you can replace HDFS with anything that's not distributed, and if you didn't, why wouldn't you just use .. the actual FS? The author may not really understand that HDFS is optimized for storing large-ish files.

justinhj · on May 25, 2016

There is nothing terrible or stupid about avoiding joins/transactions for speed. The technique of using blobs of data in MySQL rows in this article is perfectly valid and widespread at this point. As long as you understand the trade offs.

calpaterson · on May 25, 2016

In principle that is fine so long as you know what you're doing. In practice most people denormalising "for speed" aren't knowledgeable about the database implementations they are optimising for and that is very obviously the case here.

jdiscar · on May 25, 2016

The point of this article is showing how MySQL could be used to get a lot of what a NoSQL solution provides. NoSQL certainly has a place, but a lot of people don't really understand what that is and simply use NoSQL because it's popular, which cuts them off from a lot of useful features a SQL solution could provide them. That said, you're right more care could have been put in the details of the article, but a lot of the points could be correct for their situation.

For example, 'Do not normalize.'

This was in the context of a read heavy table that competes with NoSQL. In that context, I think this is accurate. We noticed a big difference after denormalizing when we went from millions of rows to billions of rows.

The general advice of SQL solutions being as useful as NoSQL to a certain scale is good. I don't think the individual examples are horrible, but they aren't universal advice to achieve NoSQL performance.

tagrun · on May 25, 2016

>> `site_id` varchar(50) NOT NULL,

> Why varchar(50)? UUIDs are 16-bytes.

Why do you think it's a UUID?

> The best way to store them would be the binary bytes (which is how postgres stores them).

Is it actually better than a pair of BIGINTs?

wtetzner · on May 25, 2016

>Why do you think it's a UUID?

Because of this:

>Also notice that we are not using serial keys; instead, we are using varchar(50), which stores client-generated GUID values

tagrun · on May 25, 2016

GUID and UUID refer to different things in general. Also, depending on the context, they can be longer than 128-bits. Oracle Coherence API defines 256-bit UUIDs for example (which is clearly not RFC 4122 or Microsoft GUID, but it still is an identifying number which can statistically be called unique, which a UUID is).

As long as it meets the statistical properties for collision, I don't see any problems with making a 50-bytes GUID. The essential thing is the statistical property, not the number of bits.

mdellabitta · on May 25, 2016

Do you think it's more likely that they invented a unique id scheme that requires 50 bytes, or that they're using an overly wide datatype for ordinary UUIDs?

tagrun · on May 25, 2016

And I have to pick one of those two options?

Like I mentioned, there are different UUID/GUID schemes using more than 128 bits publicly available already. You don't have to invent one. They may simply be using one of those with additional bytes reserved for future algorithmic changes (using first few bytes as a header to specify the algorithm).

mdellabitta · on May 26, 2016

But is it likely that they are? I would imagine no.

viraptor · on May 25, 2016

> If it's hex without dashes, it'll be varchar(32)

Or just char(32) no need to note the length if it's always the same.

return0 · on May 25, 2016

This is not terrible advice in general. It is also not good advice in general. To make things scale, you obviously have to "break the norms". They give an insight how their specific case works. It's "watch, learn and pick what fits you" material.