*I’m calling these six conceptualizations “memory models”, even though that term...

mmphosis · on Jan 1, 2017

"Data models" makes me think of relational models.

Memory models made me remember the 8086 which I would like to forget, so I think "memory models" works to describe these six conceptualizations.

kragen · on Jan 2, 2017

Maybe the desire to call them "data models" is an argument that the relational model doesn't really belong in this essay, because it really is much more of a data model (abstractly describing an ontology of mathematical objects) than a memory model (describing how to map that ontology onto bytes in memory).

I will have to think more about your comment.

chubot · on Jan 2, 2017

Yes I agree about the relational model. It is higher level than the rest -- one of the key ideas in Codd's paper was to abstract data away from concrete storage. In contrast, the records, parallel arrays, and to some degree the object graph model are pretty closely tied to concrete storage. C/C++/Go all explicitly specify the memory layout and allow the programmer to control it by design.

And as mentioned, I think the relational model and file system are interesting but orthogonal topics.

I do think this pattern of taking the memory model/data model and "externalizing" into a DSL is interesting (JSON, protobufs and many other schemes, ASDL). That makes it clear that persistence is an orthogonal concern.

One thing I've been thinking about, and which your article helped me hone in on, is that scripting languages almost use the object graph model, but that model is inefficient on modern computers. Pointers are huge and they lead to scattered data.

For example in Python:

    Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
    [GCC 4.8.4] on linux

    >>> sys.getsizeof({})
    288
    >>> sys.getsizeof([])
    64
    >>> sys.getsizeof(())
    48
    >>> sys.getsizeof(set())
    224
    >>> sys.getsizeof('')
    49
    >>> sys.getsizeof(b'')
    33
    >>> sys.getsizeof(99)
    28

This seems like a pretty enormous amount of overhead... there is more "metadata" than there is data!

Another point: Someone else mentioned R and pandas. I've been meaning to write a blog post called "R is the only language without the ORM problem". There's no mismatch, because R's data model is the same as SQL -- tables with homogeneous columns (this is in the logical sense). It's meant for "measurements and observations" rather than "business data", but I don't see any fundamental reason why these are different. It's more about R's implementation quirks than the logical model.

So that is another argument that persistence is a separate concern. R has non-persistent tables, but SQL has persistent tables.

Another example is Redis. Redis is a persistent (although it didn't start off that way), but it doesn't use the relational model. I haven't used it too much, but as far as I know it has dictinoaries, sets, and lists. So it looks like a database server but has a different model.

So I think these concerns should be represented in the taxonomy:

- logical vs physical model (logical is what the user sees; physical is concrete storage). You can have an SQL database that is row-oriented or column-oriented. And I noticed that the Jai programming language has this structure-of-arrays vs array-of-structures duality built in.

- Persistence -- each model can be dealt with in-memory or on disk. I didn't know that COBOl dealt with records on disk, which is interesting. A B-Tree is a data structure with pointers, but it's designed for being seralized.

Thanks again for the great article!