Stateless – Evolving the architecture of Elasticsearch to simplify deployment

ianbutler · on Oct 7, 2022

Many of the people commenting don't realize that for the majority of ES users, it is a pain in the butt to manage and scale their cluster storage in lock step with their compute. Most companies use ES for relatively small scale search with people who are not experts in managing an ES cluster, or really infra at all.

A cloud offering that decouples storage from compute makes this a lot easier and becomes more of a no brainer for 90% of the use cases.

If you're an outlier with TBs or PBs of search data this probably you can keep using on prem if you want. Though I don't really immediately grant that it's worse for your usecase, especially when object storage is insanely cheap and supposing they provide a compat layer for S3 you can get away with everything from R2, S3, Ceph, Minio, Backblaze, etc. This is very much a case you should benchmark/analyze as a proper engineer.

I've managed many TB ES clusters. This would have reduced my cost and time to manage it so I could focus on other features that would have benefitted my users.

jrochkind1 · on Oct 7, 2022

I'm not familiar with this space, but I would have thoguht existing managed cloud offerings already decoupled storage from compute as far as the customer is concerned. But not true, or not as much true as it could be with new architecture?

ianbutler · on Oct 7, 2022

Not as much true as it could be, with the offerings I'm aware of (AWS OpenSearch, Elasti, Co's own cloud), and having used OpenSearch more extensively, you're still basically fully responsible for managing the cluster topology and the "data" node type is what handles storage and the important compute for search. If you need more storage you'll have to bump the EBS volume sizes on a new cluster and then do replication, or add a new node to the existing cluster and rebalance your data which are both very expensive operations. If you need more compute you're stuck with adding more storage regardless, but you can at least limit how much.

In the case of both services, they make the processes easier but you still have time and expense considerations to them, if only just waiting time and not active work on your part. But it also means you have a lot more monitoring concerns you have to implement and then account for.

Plus your choice in storage IOPS greatly effects query time so you also have to consider that and high IOPS storage is expensive.

It's a lot of things for someone who just wants a workable search for their product to have to know.

The new architecture they're proposing seems to allow for much more seamless scaling as your data grows without a bunch of manual intervention, monitoring and infra knowledge to make sure things don't fall over on you.

darkwater · on Oct 7, 2022

> Not as much true as it could be, with the offerings I'm aware of (AWS OpenSearch, Elasti, Co's own cloud), and having used OpenSearch more extensively, you're still basically fully responsible for managing the cluster topology and the "data" node type is what handles storage and the important compute for search. If you need more storage you'll have to bump the EBS volume sizes on a new cluster and then do replication, or add a new node to the existing cluster and rebalance your data which are both very expensive operations. If you need more compute you're stuck with adding more storage regardless, but you can at least limit how much.

This, absolutely. I'll add that I once tried to look at the Elasti.co offer for an ELK solution that could handle the logs we are currently managing internally (on EC2 managed by us with some automation), and you basically had to specify all the details of the topology to get a quote. It was basically a layer on top of AWS Cost Calculator with their margin baked in.

No way you can tell them "I send on average X bytes/day, I want a quick answer for Y days and a slower answer for Z days and a global retention of XY days" and get a "you will pay aprox NNN dollars/month in the HA config, MMM in the non-HA config"

apendleton · on Oct 7, 2022

Quickwit is another project aiming at this sort of stateless search: https://quickwit.io/ . I've been keeping an eye on it for a possible project that would involve wanting to offer search over a lot of data for an application that wouldn't have very many users, such that keeping a big ES box around all the time would be needlessly expensive; I had been looking for a search situation where my only always-on costs were for storage, and I could pay for compute as needed (and ideally also separate paying for indexing compute, which doesn't happen often in this application, from search compute).

Interesting that ES might also end up with this kind of an offering.

ianbutler · on Oct 7, 2022

Managing storage and compute together is one of the biggest issues with ElasticSearch it makes managing a cluster a fairly complex task, I think it's natural to see them move in this direction.

I imagine that, especially after the OpenSearch debacle, they're keen to make sure they don't lose even more marketshare so they're paying particular attention to the way the wind blows and wanted to do this PoC so they can at least compete with people already in the stateless space like Quickwit.

fulmicoton · on Oct 7, 2022

And for those who cannot wait, there is quickwit :-) https://quickwit.io

lloydatkinson · on Oct 7, 2022

I’m not well versed with elastic search and the like but have a project in mind… quickwit mentions logs in the first header. Is it definitely for generic searching or is it for searching logs?

fulmicoton · on Oct 7, 2022

It can be used to search other things than logs, but it has to be large datasets of Append only data. Emails, Chat, Web Crawl data, logs...

doliveira · on Oct 7, 2022

> All your nodes are stateless, no more cluster babysitting

This is music to my ears! Analytical databases are all so damn complicated to manage with a dozen of different stateful node types

ccleve · on Oct 7, 2022

How is it that the S3 API is remotely fast enough to make this work?

As search engine that operates at any kind of scale needs to skip through very large files to evaluate a query. You need very low-latency, high-bandwidth access to disk. A search engine instance that accesses files on a local SSD is an order of magnitude faster than one that puts files on EBS.

They make some mention of local caching, but the devil is in the details here. Does all data get copied to local cache? What is the performance here?

remram · on Oct 8, 2022

If you're doing a single round-trip it's really not bad. You don't get that big an impact compared to the round-trip to the user.

If you are doing multiple dependent loads, e.g. loading an index that tells you which part of the data to load which tells you which other related table to look into (e.g. a complex join)... that would be bad.

ianbutler · on Oct 7, 2022

Probably somewhat similar to how Trino/Presto/Bigtable/Spanner works but targeted at search, decompose the query into a set of highly parallelizable steps and execute them simultaneously over the set of data using some type of specialized storage format for rapidly indexing into the file, some really nice heuristics, and then drop all the ones without a potential for a hit, aggregate the rest and then do maybe a more classical search over the vastly reduced set of potential files in memory.

I know Presto isn't focused on search, but Athena (AWS branded Presto) can do some really fast queries over S3, the issue is coldstart time on the compute, for a similar solution focused on search maybe you keep the compute always warm and work from there.

jgb1984 · on Oct 7, 2022

So they will be forcing my few dozen terabyte ES cluster that today runs fully on dedicated physical hardware to use extremely expensive cloud storage services instead? What an awful idea!

I hope the option to fully self-host on dedicated hardware remains viable, as I enjoy the high performance, low price and full control of my own systems.

jillesvangurp · on Oct 7, 2022

You can easily emulate object storage with e.g. s3 compatible APIs. And probably it will just support whatever file system, including nfs (currently not supported). So, that should not be a show stopper. And technically this could actually reduce your cost by a lot because you will be able to use some central network storage instead of having to have a lot of high end SSDs. You might still want to use those for caching of course. But getting rid of some of the data replication probably might actually help lower your cost.

As Elasticsearch is closed source, you can also choose to switch to Opensearch which will of course not get any of these changes. Though I would not be surprised to see this move mirrored on their side as it makes a lot of sense to do this. But it would end up being an independent implementation of the same concept.

As a long time Elasticsearch user, this stateless architecture makes a lot of sense to me. Especially for very large clusters. Basically, it vastly simplifies scaling and cluster operations. You can literally auto scale nodes both on the indexing and querying tiers. That's a big deal. You want it faster? Add more nodes. Likewise upgrades are a lot easier. Simply bring new nodes online and re-index to some new objects. Once it is done, take down the old ones. Also it simplifies testing. You can simply bring up a few test nodes and query your production data without having to worry about affecting production loads. Everything gets easier.

It will be interesting to see how they will bring this to market. This reads like they are starting the work on this, not like they are ready to deploy this. I guess this would be part of a future major release and they just had their previous one fairly recently.

petabite · on Oct 7, 2022

How is Elasticsearch "closed source"? The code is literally right here: https://github.com/elastic/elasticsearch

Hamcha · on Oct 7, 2022

Probably meant "Source available" since neither Elasticsearch code licenses are "Open source" (going by OSI approval).

petabite · on Oct 10, 2022

We're not really talking licenses here. The very definition of "open source", according to OSI, is: "Open source software is software with source code that anyone can inspect, modify, and enhance."

Elasticsearch meets the above definition.

The permissiveness of the modification and enhancement is a different topic.

yazaddaruvala · on Oct 7, 2022

> You can easily emulate object storage with e.g. s3 compatible APIs

This seems right in theory, but when you actually look at the details it stops being true.

Specifically, Lucene today is best served through memory mapping the files and using crazy amounts of RAM.

My guess, Elastic just no longer cares about latency sensitive users. Even going from ES 5 to ES 7 there was a performance reduction for queries using any multi-word synonyms. It’s likely this trend is set to continue with remote storage.

I still think it’s the correct trade off for ES. Most of their clients use them for latency-insensitive analytics/log type stuff.

ignoramous · on Oct 7, 2022

I believe it would be cheaper to run stateless (given the reduced overhead of devops)? It is an open secret that the most expensive commodity in a datacenter (servers) is also the most under utilised [0]. Hence, folks like me prefer to pay per-query cost.

Elastic had to do this in face of competition from the likes of quickwit.io and snowflake search. AWS might have something similar up their sleeve, because I don't believe Athena/Trino can do super fast searches, yet.

[0] https://youtube.com/watch?v=dInADzgCI-s&t=538

cavisne · on Oct 8, 2022

What they are really doing is offloading the job of persisting data from ES.

So you wouldn't need to use S3 or some s3 emulator like other comments have suggested.

Give the ES cluster a disk it can write to that is guaranteed to be replicated (ie a RAID cluster) and thats it. The limits of object storage (no appending, no in place editing) tend to make those clusters cheaper so you could also run something like Ceph locally. It's probably going to do a better job of persisting data than ES also.

danw1979 · on Oct 7, 2022

Fingers crossed they support any S3-compatible local object storage system (e.g. Ceph) then all you need to do is completely change your storage system and probably duplicate it’s capacity during migration.

If this is the future of ES I don’t see a happy path for non-cloud customers…

robertlagrant · on Oct 7, 2022

> Fingers crossed they support any S3-compatible local object storage system

Do you mean they "only support"?

_jezell_ · on Oct 7, 2022

minio

More-nitors · on Oct 7, 2022

...which means another layer that eats up performance?