CouchDB 2.1.0

tarr11 · on Aug 7, 2017

I was looking really hard at using CouchDB 2.0 for an offline-first progressive web app. The biggest asset in my opinion of CouchDB is actually PouchDB [1] PouchDB is really great and nicely designed.

However, the showstopper that I hit was that CouchDB still does not support permissions on a per-user basis (even though PouchDB encourages this explicitly [2])

What this means is that if you have a typical system with users and logins, who can only access their own data, and you require some sort of aggregated view where you can view all users data behind some sort of permissions (admin user, etc), you are forced to create a new db for each user in Couch, and then run replication for those users to a separate master.

Think for example if you wanted to store images in CouchDB, and then have some sort of global feed (like instagram)

So, systems with many users (tens of thousands -> millions) would be stuck replicating an equal number of couch databases constantly. Each replica in 2.0 was an erlang thread, and each database is a separate file. Apparently some of the performance issues were fixed in 2.1, but it still requires a physical file per db and continuous replication.

Some promising solutions like envoy [3] from Cloudant, which were based on Mango [4] have been worked on, but don't seem to have much support in the core Couch community and feels to have been abandoned.

Another problem is that you can't really throttle or manage pouchDB <-> couchDB replication very well. Eg, if you have a large-ish database that's out of sync, it's just going to try to download itself as soon as you sync, which isn't very mobile friendly, or "PWA" friendly IMO.

Net result for me was that I just stuck with my Postgres DB, Rails API, and a bunch of Redux/React code to accomplish what I needed.

Still a fan of Couch/Pouch but wouldn't recommend it if your system requires any kind of serious per-user or role-based permissions or aggregated views.

[1] https://pouchdb.com/

[2] https://pouchdb.com/2015/04/05/filtered-replication.html

[3] https://github.com/cloudant-labs/envoy

[4] https://blog.couchdb.org/2016/08/03/feature-mango-query/

sagelywizard · on Aug 7, 2017

The main feature which was added in 2.1 was the replication scheduler, which makes it possible to run millions of replications concurrently. That makes the database-per-user model much more feasible.

As for throttling, the CouchDB replicator gracefully handles HTTP 429, but that wouldn't help with PouchDB-managed replications. I can't really speak on PouchDB, but maybe it's possible to limit the replication on the client side, since that's where it's managed?

marknadal · on Aug 8, 2017

If you are looking for something like CouchDB but only syncs partial subsets of the data you request (rather than the whole thing), try checking out https://github.com/amark/gun (disclosure: I build it). However Postgres is probably the best database choice out there right now, so you made a good decision.

tropshop · on Aug 7, 2017

> So, systems with many users (tens of thousands -> millions) would be stuck replicating an equal number of couch databases constantly

I also ran into this when using `continuous: true` to create persistent replication, which suffers from this 1-to-1 resource problem. I now use a single listener on `_db_changes` and fire one-off replication to the aggregate db. I'm also keeping an eye on spiegel[1]

I'm curious on how you handle offline support, sync conflicts, and multiple devices. I started with CouchDB 1.x and have been through enough growing pains to wonder how things would have been to stick with SQL, but I think the future is only brighter. CouchDB 2.1 looks great, and I hope the entire erlang ecosystem continues to see growth and adoption.

[1] https://github.com/redgeoff/spiegel

tarr11 · on Aug 7, 2017

I know that it is a common trope that Couch / Pouch seems to think that SQL databases don't work for offline replication and you need their "batteries-included" solution. However, it's not the only solution and every app in the app store that works offline has had to deal with it. Most of those are using SQL databases.

In truth, what Couch does is force you to handle conflict events which gives you a convenient platform level place to put some of your automatic conflict/merge code. A lot of merge code is UI however and Couch does not help with that at all.

Detecting conflicts is a subtle thing, but it's not necessarily a one-size-fits-all problem as couch would have us believe. I

It also gives you versioning for free (which is pretty easy to implement in SQL)

However, at the end of the day, every developer must make application level decisions on how to handle merge conflicts. Couch/Pouch does not obviate this - it just forces you to deal with it and lets you think about your application a little differently. Once you have to deal with merges and conflicts in your application code, it's more a matter of designing your application to have fewer conflicts (either through better merging, vector clocks, pessimistic locking, etc)

I wish I could have used it but all my other data is in a battle-tested Postgres db, being nicely backed up regularly on Heroku without fail. My CouchDB was on a Google Cloud server, using a brand new replication framework as backup.

I don't really mean to rag on Couch - I think it's a promising idea. But I ended up wasting several weeks going down this path only to throw it away because I didn't feel like it was production-ready.

tropshop · on Aug 7, 2017

In production, I have a horribly designed schema running couchdb 1.6 and node. It is serving a few thousand users, but has serious architectural issues.

I started in freelance php web development, then switched to Ruby/Rails in 2006 and haven't looked back. I'm now thinking mobile access, unreliable networks, and global connectivity so it was time to re-survey the land. My MVP was with Node.js and CouchDB 1.6, but now I'm almost ready with a much more robust and well defined rewrite in Phoenix 1.3 and CouchDB 2.1

I agree that the 2.0 release was subpar. I actually needed to beta test against master because of a bug in the 2.0.0 release. But even with the rough patches, cutting Node.js and going all in on BEAM/OTP - I'm hoping will pay off for the next decade and then some.

As far as CouchDB, I agree nothing is a magic bullet, but after better learning the entire CouchDB API surface, I was much better off at schema design and where to put logic in _design files vs application layer. I actually run a conflict-free schema now, and still benefit from master-master sync and first class offline support. I'm not saying the concepts are trivial, but when done in the correct CouchDB way, and getting all the benefits that come with that, I cannot imagine what it would take to manually handle sync logic with Ruby and SQL, which is why I asked.

moneytalks · on Aug 7, 2017

Any parts of the project open source and/or have you written any detailed blogs on the development process?

Would love to read more about the challenges you encountered and how you solved them.

Even moreso the move to Phoenix is enticing for a project of mine. Any tips there?

nothrows · on Aug 7, 2017

no Per Document Authorization is still my biggest issue with couch that keeps me from using it https://wiki.apache.org/couchdb/PerDocumentAuthorization

feld · on Aug 7, 2017

The answer I got to this problem was "design middleware that handles this for you" which has to be a joke

random023987 · on Aug 7, 2017

> The answer I got to this problem was "design middleware that handles this for you" which has to be a joke

It's not a joke.

The couch security model doesn't match the requirements of multi-user untrusted clients typical of internet distributed applications. But then most database have a similar limitation, it's only more visible in CouchDB because you can read/write documents directly from a browser without an application server, so the next logical step is to just let clients read/write directly to CouchDB over the internet without an app server.

If your data is in postgres, you will need an application server handing access control, business logic, and serialization.

If your data is in CouchDB, you need a proxy server that handles access control, whitelisting certain URL patterns and body content based on user entitlements.

Xophmeister · on Aug 7, 2017

Can you not write validation functions in design documents that handle the security (to a degree) for you? I seem to remember being able to do this in CouchDB 1.6 and, while it seemed like a pretty crude method, it was easier than managing an interposing proxy server.

devrandomguy · on Aug 8, 2017

Can you fake a reasonably indistinguishable 404 response for content that exists, but should not be publicly discoverable?

In the validation function docs [1], there is an example of HTTP errors being thrown, I'm just not sure if there is something like a `throw({notFound: null})` option.

[1] http://guide.couchdb.org/draft/validation.html

Xophmeister · on Aug 11, 2017

The only validation function I've written contains this:

    throw({forbidden: 'Only administrators can write changes'});

IIRC, it raises either a 401 or 403 HTTP error.

oblib · on Aug 8, 2017

That's not too difficult to do with a server side script that accesses the database as an Admin user.

I use perl cgi scripts to handle that. There are perl modules for interfacing with CouchDB but you can also create a simple "curl" call to do it.

I have run into some issues with encoding/decoding JSON doing that with perl though, but I've not looked far into how they might be solved yet.

lmcardle · on Aug 7, 2017

Why use CouchDB instead of Couchbase?

gdelfino01 · on Aug 7, 2017

Full RESTful HTTP/JSON API

stock_toaster · on Aug 7, 2017

I presume the parent was talking about couchbase's SyncGateway offering, but maybe not.

granitosaurus · on Aug 7, 2017

My favorite explanation: https://stackoverflow.com/a/15184612/3737009 They are very different things and Couchbase is just misleading since it doesn't have much to do with original CouchDB at all and in general the consensus seem to be that CDB is better than CB but CB is just being heavily marketed.

HodGreeley · on Aug 7, 2017

That analysis is from 4 years ago and is wrong in many, many ways at this point. As for "the general consensus", I'd like to see evidence that's true among real enterprise users. (FD: I work for Couchbase.)

trextrex · on Aug 7, 2017

> That analysis is from 4 years ago and is wrong in many, many ways at this point.

Can you elaborate on how it is wrong at this point?

elcritch · on Aug 8, 2017

Investigated the grand parent's question myself, though not extensively. The stackoverflow comment contains a lot of items which aren't true (couchbase does have document id's). The replication is multi-master but is cluster wide, which means couchbase has really nice tools to manage adding/removing nodes but doesn't offer ad-hoc replication (AFAIK). The SO comment also lists a bunch of things which won't matter to many devs like pure http-rest api as they'll end up using client libraries anyway from a middle tier layer.

Not to say Couchdb doesn't have many great advantages like the rest api and self-hosted apps, couchbase does to my mind seem like a good but not 100% alternative to couchdb (and vice-versa). Mainly I like that incremental map reduce is still available along with the atomic type of documents and revisions but clustering is still much more developed in CB.

dsun179 · on Aug 7, 2017

Has anyone tried out rxdb on the client with couchdb as server? I'm affraid mostly about the scaling issues when you sync with many clients

nwienert · on Aug 7, 2017

Currently using this. Haven't tested heavy scaling though because our app doesn't have huge scale needs. BUT, will say things are looking promising. Theres are some[1][2] improvements to replication and general speed landing/landed.

As far RxDB, in our limited tests, using only the query-sync feature speeds up everything a ton, not to mention optimistic updates. Overall it's been pretty great, and developer is very active.

[1] https://github.com/apache/couchdb/pull/495 [2] https://github.com/apache/couchdb/pull/470 [2]