I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.
One of the main things here is that you should know your data well enough to articulate the right request from BI. In my experience, BI often end up as pure order takers - if you ask the wrong question, you get a lovingly formatted but wrong answer.
The other thing is that this assumes you have a BI team at hand - smaller teams/orgs often don't! Perhaps I should make this a little more explicit.
My central thesis, also not made explicit, is that leaders should be appropriately curious _and_ leverage the tools they have to be able to do things like "hey, this looks weird, what's up?" and share the data and their methodology - that way it can be corrected/investigated etc.
Thanks for chiming in, great post, I like the premise - I just think we must have completely different working experiences. I'm typically in a larger org that has multiple systems feeding data into a data lake or something similar that has been normalized but also can still usually has some quirks. Articulating the right request to BI is certainly a skill, but my approach/experience is that I try to paint the picture of the end goal and let them fill in the gaps as needed. Sometimes that's literally drawing out a graph or chart that I want to exist.
Even when no BI team is dedicated, there's usually someone that's wearing that hat. Someone setup those schemas and data pipelines, etc or is responsible for maintaining them. That person is probably the one that knows "make sure you exclude the NULL items" or something similar.
I do like being in touch with changing data trends from a leadership perspective. It's either real and could be a valuable insight or it's a bug that needs to be addressed before any ill advised decisions are made from the 'info'. I find this can often be setup proactively and put into a dashboard. In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.
> I just think we must have completely different working experiences.
Likely! I've generally worked in smaller orgs (including as part of a much larger org, as with my current employer) and there is less access to dedicated resources.
> Even when no BI team is dedicated, there's usually someone that's wearing that hat.
100%. Unfortunately, this has commonly be me from my personal experience.
> In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.
Totally agreed.
For some additional context, I've spent my working career on data systems so I likely feel a much stronger affinity to this type of self-serve analysis than your average bear.
Confirms my initial impression that the the author, you, were likely on the receiving end of these requests and would rather teach people to fish than being the cook. Which is a great thing and certainly has a place, especially on smaller teams/orgs. So I think the bias is strong (desire for others to self serve) and ignores a lot of the realities of trying to 'manage up' in this way (the risks and inefficiencies and skills gaps of having managers exercising technical chops). For that reason, I feel like promoting usage of Retool or something more GUI based would be more successful than promoting that managers should start using DuckDB and Python, et al.
From my POV I have a choice of a database or a tool like pandas. Anybody who is interested in this sort of work has a choice of doing it with databases or with a specialized data analysis tool. What's your take on that?
There are times when pushing the work down to the database layer is appropriate - databases are quite good at a lot of these operations - but if you need more nuanced approaches (e.g. ANOVA, ARIMA, other kinds of forecasting or analysis), leverage the appropriate tools.
The upcoming XTDB v2 is a SQL-first engine. We also built an experimental Clojure/Datalog-like 'XTQL' language to go along with it, to provide some continuity for v1 users, but the primary API is now SQL over the Postgres wire protocol, where we implemented a variation on SQL:2011 - see https://docs.xtdb.com/quickstart/sql-overview.html
> What people usually do for HA with PostgreSQL or do they just not care about it?
Patroni for most cases. At Heroku we have our own control plane to manage HA and fencing which works very reliably. I also like the approach the Cloud Native PG folks have taken with implementing it in the k8s API via the instance manager[1].
Other options like Stolon or repmgr are popular too. Patroni is, despite the Jepsen testing, used well without issues in the majority of circumstances. I wouldn't over think it.
Yeah, this is the bit for me. We have almost no good OSS layers for folks to "plug and play".
Its a bit of a vicious circle - because there is low exposure, no one is building those layers. Because no one is building the layers, there is no exposure.
I didn't want ClickHouse to take all the glory. /s
The actual reason is that DuckDB's API and integration into other places (e.g. Evidence) and its use of extensions (like the aforementioned gsheets one) gives it priority for me.
Additionally, its being used in a bunch more places like pg_duckdb that make it more "worth it".
Thanks for sharing! My choices are pretty coloured by personal experience, and I didn't want to re-tread anything from the book (Redis/Valkey, Neo4j etc) other than Postgres - mostly due to Postgres changing _a lot_ over the years.
I had considered an OSS Dynamo-like (Cassandra, ScyllaDB, kinda), or a Calvin-like (FaunaDB), but went with FoundationDB instead because to me, that was much more interesting.
After a decade of running DBaaS at massive scale, I'm also pretty biased towards easy-to-run.
I agree mongo is overhyped and attracts a lot of web newbies who only know javascript and don't want to think through schemas, although one interesting newer feature of mongo is time series collections -- unfortunately they are a bit buggy but they're getting better seem like a legitimate non-relational use case.