More

mble_ · 2024-12-18T18:25:01 1734546301

Agreed, if you have the budget for it. There are often times where living off the land is necessary.

wjnc · 2024-12-19T11:49:20 1734608960

I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.

mble_ · 2024-12-18T18:23:44 1734546224

Author here.

One of the main things here is that you should know your data well enough to articulate the right request from BI. In my experience, BI often end up as pure order takers - if you ask the wrong question, you get a lovingly formatted but wrong answer.

The other thing is that this assumes you have a BI team at hand - smaller teams/orgs often don't! Perhaps I should make this a little more explicit.

My central thesis, also not made explicit, is that leaders should be appropriately curious _and_ leverage the tools they have to be able to do things like "hey, this looks weird, what's up?" and share the data and their methodology - that way it can be corrected/investigated etc.

conductr · 2024-12-18T18:43:02 1734547382

Thanks for chiming in, great post, I like the premise - I just think we must have completely different working experiences. I'm typically in a larger org that has multiple systems feeding data into a data lake or something similar that has been normalized but also can still usually has some quirks. Articulating the right request to BI is certainly a skill, but my approach/experience is that I try to paint the picture of the end goal and let them fill in the gaps as needed. Sometimes that's literally drawing out a graph or chart that I want to exist.

Even when no BI team is dedicated, there's usually someone that's wearing that hat. Someone setup those schemas and data pipelines, etc or is responsible for maintaining them. That person is probably the one that knows "make sure you exclude the NULL items" or something similar.

I do like being in touch with changing data trends from a leadership perspective. It's either real and could be a valuable insight or it's a bug that needs to be addressed before any ill advised decisions are made from the 'info'. I find this can often be setup proactively and put into a dashboard. In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.

mble_ · 2024-12-18T18:48:13 1734547693

> I just think we must have completely different working experiences.

Likely! I've generally worked in smaller orgs (including as part of a much larger org, as with my current employer) and there is less access to dedicated resources.

> Even when no BI team is dedicated, there's usually someone that's wearing that hat.

100%. Unfortunately, this has commonly be me from my personal experience.

> In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.

Totally agreed.

For some additional context, I've spent my working career on data systems so I likely feel a much stronger affinity to this type of self-serve analysis than your average bear.

conductr · 2024-12-18T20:50:17 1734555017

Confirms my initial impression that the the author, you, were likely on the receiving end of these requests and would rather teach people to fish than being the cook. Which is a great thing and certainly has a place, especially on smaller teams/orgs. So I think the bias is strong (desire for others to self serve) and ignores a lot of the realities of trying to 'manage up' in this way (the risks and inefficiencies and skills gaps of having managers exercising technical chops). For that reason, I feel like promoting usage of Retool or something more GUI based would be more successful than promoting that managers should start using DuckDB and Python, et al.

PaulHoule · 2024-12-18T18:46:21 1734547581

From my POV I have a choice of a database or a tool like pandas. Anybody who is interested in this sort of work has a choice of doing it with databases or with a specialized data analysis tool. What's your take on that?

mble_ · 2024-12-19T16:54:05 1734627245

Why not both?

There are times when pushing the work down to the database layer is appropriate - databases are quite good at a lot of these operations - but if you need more nuanced approaches (e.g. ANOVA, ARIMA, other kinds of forecasting or analysis), leverage the appropriate tools.

mble_ · 2024-12-06T12:16:16 1733487376

Pinot is something I haven't had any personal experience with, so that's why it wasn't on the list - same with StarRocks, or Druid.

Something for me to look into next year, clearly.

mble_ · 2024-12-06T09:46:33 1733478393

I love Datalog, but its such a niche technology. If I had included it, I would have probably swapped out TigerBeetle for it.

refset · 2024-12-06T13:25:44 1733491544

The upcoming XTDB v2 is a SQL-first engine. We also built an experimental Clojure/Datalog-like 'XTQL' language to go along with it, to provide some continuity for v1 users, but the primary API is now SQL over the Postgres wire protocol, where we implemented a variation on SQL:2011 - see https://docs.xtdb.com/quickstart/sql-overview.html

mble_ · 2024-12-06T15:43:59 1733499839

Oh, very cool. I'll have to add this to my list to check out next year.

mble_ · 2024-12-06T09:45:19 1733478319

> What people usually do for HA with PostgreSQL or do they just not care about it?

Patroni for most cases. At Heroku we have our own control plane to manage HA and fencing which works very reliably. I also like the approach the Cloud Native PG folks have taken with implementing it in the k8s API via the instance manager[1].

Other options like Stolon or repmgr are popular too. Patroni is, despite the Jepsen testing, used well without issues in the majority of circumstances. I wouldn't over think it.

[1]: https://cloudnative-pg.io/documentation/1.24/instance_manage...

mble_ · 2024-12-06T09:33:59 1733477639

Apache Cassandra would probably be the most notable one (outside of Kafka etc).

negus · 2024-12-06T09:50:42 1733478642

Yes. And this is the reason ScyllaDB exists

mble_ · 2024-12-05T19:38:53 1733427533

Still very much "open-source": https://github.com/cockroachdb/cockroach

But relicensed to the "CockroachDB Software License" as a form of BSL to prevent reselling.

maxmcd · 2024-12-05T20:43:20 1733431400

I think "source available" at this point: https://news.itsfoss.com/cockcroachdb-no-open-source/

mble_ · 2024-12-05T19:12:15 1733425935

Yeah, this is the bit for me. We have almost no good OSS layers for folks to "plug and play".

Its a bit of a vicious circle - because there is low exposure, no one is building those layers. Because no one is building the layers, there is no exposure.

jstimps · 2024-12-05T20:20:50 1733430050

If you're interested in a layer compatible with Elixir's Ecto, please take a look at the EctoFoundationDB adapter:

https://github.com/foundationdb-beam/ecto_foundationdb

https://hexdocs.pm/ecto_foundationdb

mble_ · 2024-12-05T18:30:56 1733423456

I didn't want ClickHouse to take all the glory. /s

The actual reason is that DuckDB's API and integration into other places (e.g. Evidence) and its use of extensions (like the aforementioned gsheets one) gives it priority for me.

Additionally, its being used in a bunch more places like pg_duckdb that make it more "worth it".

mble_ · 2024-12-05T18:26:18 1733423178

Author here.

Thanks for sharing! My choices are pretty coloured by personal experience, and I didn't want to re-tread anything from the book (Redis/Valkey, Neo4j etc) other than Postgres - mostly due to Postgres changing _a lot_ over the years.

I had considered an OSS Dynamo-like (Cassandra, ScyllaDB, kinda), or a Calvin-like (FaunaDB), but went with FoundationDB instead because to me, that was much more interesting.

After a decade of running DBaaS at massive scale, I'm also pretty biased towards easy-to-run.

leeoniya · 2024-12-05T19:49:07 1733428147

was hoping to see https://github.com/GreptimeTeam/greptimedb on the list

maybe 2026, or some bonus content for 2025 :)

https://www.greptime.com/blogs/2024-09-09-report-summary

biggestlou · 2024-12-05T22:47:31 1733438851

As a co-author of the book of the same name, I’m disappointed that you didn’t see fit to provide any kind of attribution

mble_ · 2024-12-06T09:31:42 1733477502

It was not intentional. I've corrected this oversight, and attribution is now provided - my apologies.

timbotron · 2024-12-05T20:59:01 1733432341

I'm curious why you said you don't find MongoDB interesting?

mble_ · 2024-12-05T21:27:42 1733434062

I lived through the MongoDB hype cycle.

For document databases, I'm more interested in things like PoloDB and SurrealDB.

timbotron · 2024-12-06T15:22:31 1733498551

I agree mongo is overhyped and attracts a lot of web newbies who only know javascript and don't want to think through schemas, although one interesting newer feature of mongo is time series collections -- unfortunately they are a bit buggy but they're getting better seem like a legitimate non-relational use case.

aravindputrevu · 2024-12-06T00:28:48 1733444928

Ex-Surrealer here. Thanks for listing us. Never thought someone on HN would cite us esp when it comes to MongoDB.