I hated Splunk so much that I spent a couple days a few months ago writing a sin...

tw04 · on Sept 21, 2023

For how many data sources? The whole reason everyone goes to Splunk is that it scales, and scales incredibly well.

Large enterprises can generate hundreds of terabytes to petabytes every day. Splunk has all sorts of issues, but to pretend as if you can replace them in any large shop with a 1200 line python script and SQLite is just being disingenuous. This acquisition falls right into Cisco's sweet spot, they aren't chasing shops that can dump all their security and infrastructure logging into a SQLite database and not have it tip over in an hour.

eigenvalue · on Sept 21, 2023

It's around 6 data sources on ~25 machines, but it could be easily scaled to way more than that with a bit of work. And I mean less work than it takes to do even trivially simple things using the horrible Splunk API. There are many thousands of small companies using Splunk and getting totally ripped off for a very mediocre product with a rapacious and annoyingly aggressive salesforce.

coalbin · on Sept 21, 2023

That is a tiny setup all things considered. You aren’t operating at a scale you’d need to consider a monitoring platform for.

steveBK123 · on Sept 21, 2023

You'd be surprised how many companies with infra that small have CTOs get consultant buzzword pilled into buying every SaaS under the sun nonetheless...

mlhpdx · on Sept 21, 2023

How many servers does Stack overflow run on? It’s not a good measure of data volume or criticality.

I think “expensive” here is basically relative to revenue/margin. Where margins are high, spending on Splunk (etc.) isn’t meaningful. Where margins are thin, it hurts.

Basically, the arguments here seem to reflect the markets and business model folks are working under. Some pay, some can’t and some won’t - all valid.

ilyt · on Sept 21, 2023

But you definitely want to, even if it simple ELK stack

ignoramous · on Sept 21, 2023

> it could be easily scaled to way more than that with a bit of work.

I guess you'd appreciate the words easily and bit are doing a lot of heavy lifting there.

westpfelia · on Sept 21, 2023

Liiiiissssteeeennnnn

I havent developed it yet. But my Splunk killer solutions actually scales so big we can use it to walk to the center of the universe. And its only 1 line of Rust and a bash script that runs when ever the Unix clock has 420 in the number string.

hk__2 · on Sept 21, 2023

> I guess you'd appreciate the words easily and bit are doing a lot of heavy lifting there.

This goes with the previous comment:

> And oh yeah, it's totally free instead of costing my company thousands of dollars a year

Unless you work for free, then something you make and maintain is not "totally free".

tw04 · on Sept 21, 2023

I think we're talking about very different levels of scale. Enterprises are generally feeding tens to hundreds of thousands of datapoints into Splunk depending on their size between servers, networking gear, endpoint devices, etc.

callalex · on Sept 21, 2023

Wait what this is such an important detail. Log aggregators like Splunk start being something to consider when you get to about 25 THOUSAND machines, not 25 machines. I hope that for you, humility will come with experience.

thereddaikon · on Sept 21, 2023

Splunk isn't perfect. Managing it is more work than it should be for example. But I've got hundreds of systems I'm pulling logs from and that's not counting infra and applications as well. And my deployment isn't even a large one by their standards. Your use case just isn't the scale where splunk makes sense.

ta1243 · on Sept 21, 2023

I have an order of magnitude more machines than you and would never in a million years consider splunk

Right tool for the right job. Splunk is for mega-scale setups

davinci123 · on Sept 21, 2023

ya as someone else already noted - Splunk is not for you

baz00 · on Sept 21, 2023

Splunk does not scale to large data sources. It fucks out at a few TB and then you have to spend hours on the phone trying to work out which combination of licenses and sales reps you need to get going again.

By which time you can just suck the damn log file and grep it on the box.

teach · on Sept 21, 2023

I'm gonna respectfully disagree that it fails "at a few TB". We send them 100s of terabytes a day.

anonzzzies · on Sept 21, 2023

But, and this is not meant as criticism or insult as I have no idea how Splunk works, it is just based on other comments; do you know what license your company has with them? It appears that if you are paying them millions, it scales fine, otherwise, it does not?

tekla · on Sept 21, 2023

> I have no idea how Splunk works Cool

> It appears that if you are paying them millions, it scales fine

yes, if you pay someone for product and services, you get them. If you don't, you don't

baz00 · on Sept 21, 2023

It's difficult to control data ingress so you end up in debt and on repayment plans. Which are expensive.

anonzzzies · on Sept 21, 2023

That makes sense, so looking at what people ingress, they pay afterwards or just really huge plans upfront? Or a mix?

baz00 · on Sept 21, 2023

Well usually you have to overpurchase up front and they sell you a 3 year lock in to make it affordable capital cost. Then when you eek over it temporarily, the sales guy calls you up within 10 nanoseconds to bill you for more.

I was getting 2-4 calls a week.

It was so fucking annoying and expensive ($1.2M spend each cycle) we shitcanned the entire platform.

First thing they hear of this is when our ingress rate drops to zero and they phone us up to ask what is happening. Then we don't go to the numerous catch up and renewal meetings and calls. Then we stop answering the phone.

eigenvalue · on Sept 21, 2023

Had a similar experience with them, they are truly the worst. We wasted a bunch of time trying to figure out how the ingestion volume could be so high and then realized that 99% of it was from the ridiculous default settings of their universal collector agent which was dumping detailed system stats every few seconds-- all to drive up usage so they can harass you about spending more money on their awful product. I did the renewal call with them just to basically tell them how outrageous their company is.

anonzzzies · on Sept 21, 2023

Yeah, because that is what I meant. A lot of services are useable without paying through the nose, this one apparently not, but thanks for the excellent input.

teach · on Sept 22, 2023

I'm certainly not a Splunk expert and I'm CERTAINLY have no insight into the nature of our financial arrangement with them, but yeah it's expensive.

I think there's not much of a useful "flat rate" tier; you pay based on usage. People can accidentally spin up a ton of EC2 instances and get a huge surprise AWS bill, too. And yeah our logging needs are high and monotonically increasing but they're also relatively predictable at our scale.

It ALSO turns out though that Splunk is really really good at their job and matching their expertise would require tons of engineering effort and it's not like the disk space alone is THAT cheap if you want it to be searchable.

nostrebored · on Sept 21, 2023

I've worked at companies with objectively large amounts of data. Splunk scaled to meet their workloads. At no enterprise doing this is someone able to just isolate a single log file and grep through it at scale.

Aeolun · on Sept 21, 2023

Presumably you can have a cluster of grepping machines. I wonder how it scales compared to the millions you pay for Splunk.

baq · on Sept 21, 2023

is your business' core competency building a distributed grep or actually selling useful stuff?

radiator · on Sept 21, 2023

Well, according to what people write in this thread, a distributed grep or some other way to organize a decent central logging system might be a necessary part of the core competency. Because if they buy splunk instead, they might go bankrupt.

baq · on Sept 21, 2023

You don’t have to be splunk to make money out of distributed grep but it turns out to not be that easy… as proven by the fact that there are quite a few competitors

Aeolun · on Sept 22, 2023

My core competency is saving the business tons of money, so that they can pay me.

westpfelia · on Sept 21, 2023

Uhhhh you splunk scales no matter the size. for just pure ingest. Now if you got duped into the SVC model I can see what you mean. But for pure Gigs/Day ingest if you know what youre doing it can scale infinitely.

jbergens · on Sept 22, 2023

I remember a client using Graylog. It was good for app logging and is available as open source.

tekla · on Sept 21, 2023

This mostly sounds like a badly managed Splunk. If a 1200 line Python script is all you need to replace a Splunk instance, you weren't doing anything all that interesting or well in the first place.

> useful metadata like the IP address of the instance, the machine name, the log source, the datetime,

This should be tagged on every single log line already, and not something that you should be doing post-ingestion

eigenvalue · on Sept 21, 2023

The logs included things like the systemd logs and stuff that I don’t have control over. You need to be able to enrich with arbitrary metadata for it to be generally useful.

My point is more that a large portion of Splunk customers could do the same thing I did and be way better off. Obviously not their huge enterprise customers spending millions a year.

leoc · on Sept 21, 2023

My complaint is that this acquisition is going to add another 1-4 paragraphs of examinable marketing copy to the Cisco CCNP ENCOR textbook. I'll have to somehow remember not to confuse Splunk with Cisco Firepower NGIPS, which uses Snort. This is what happens when an industry starts to name its products after the sound effects from Peppa Pig.

runjake · on Sept 21, 2023

Why wouldn't you just use Graylog Free Edition?

While it doesn't compete with Splunk, IMHO, it's much easier and much better than what 1,200 lines of Python could conjure up. Dashboarding and all. I love it and use it in a very large enterprise environment.

shandor · on Sept 21, 2023

Sounds awesome for your use case!

…but this sounds so much like the legendary Dropbox release thread’s ”just use FTP, SVN, etc” that it made me smile :)

Scarbutt · on Sept 21, 2023

Well no, dropbox is aimed at non-technical oriented users. Sure, they have "enterprise" features for admins now but that's not how it started and in the end the product is vastly consumed by non technical users.

eigenvalue · on Sept 21, 2023

I hear you, but the difference is that Dropbox is actually good and reasonably priced. Splunk is horrible to use and costs 1,000x what it should, and they are super aggressive about harassing you about usage caps and threatening you constantly with huge price hikes. Dropbox has barely raised price over the years (until pretty recently at least) and has been rock solid and amazing.

eigenvalue · on Sept 21, 2023

Since someone asked, I cleaned up my script and released it:

https://news.ycombinator.com/item?id=37600019

anonzzzies · on Sept 21, 2023

Great, finally someone who actually does that. So many examples here with people whining about their Dropbox thingy in 4 lines of Perl but never releasing anything for us to check out. Well done!

asynchronous · on Sept 21, 2023

That “thousands of dollars per year” number seems quite a bit low for a Splunk license. Even for a small amount of data it’s more like thousands per month.

spoonjim · on Sept 21, 2023

I’m sure the Cisco CEO is quaking in his boots thinking about this cronjob

geodel · on Sept 21, 2023

Well today you are doing 100KB log processing, who knows, tomorrow you may end up doing 500KB log processing. It will be All Hands On on late night Friday to eliminate this existential threat.

phyzome · on Sept 21, 2023

I used SumoLogic at my last job, which feels basically the same as Splunk. (Maybe not as fast? No idea on price.) There were times when it was easier to sync 45 GB of logs from S3 down to my laptop and run grep over them than it was to figure out the right arcane syntax and wait for the results. :-)

manicennui · on Sept 21, 2023

This comment is incredibly naive. Cisco isn't making acquisition decisions based on your happiness. Splunk's revenue is increasing every year and their losses decrease. It is an incredibly popular tool that complements their products and services well.

ilyt · on Sept 21, 2023

Expect entering splunk API key in next generation of their OSes for seamless monitoring

manicennui · on Sept 21, 2023

I don't know about their router/switch OSes in particular, but a lot of their products already have Splunk integration and they seem to have a couple of products built on top of Splunk.

nemo44x · on Sept 21, 2023

There's quite a few log ingestion programs that can do all that for you. Did you have some type of specialized log that one of the various logging tools couldn't handle for some reason? It sounds like you recreated the ELK stack lol.

bluedays · on Sept 21, 2023

Sounds like a startup

dingdong33 · on Sept 21, 2023

This is most stupid comment I’ve ever read from here.

ShrigmaMale · on Sept 21, 2023

look at vector.dev and clickhouse. fast, has a language for extension, v easy to set up.

evantbyrne · on Sept 21, 2023

I used Vector in the Beaker Studio prototype back when it was designed to deploy directly to Ubuntu virtual machines. That was a couple years ago at this point, and it worked wonderfully!

magixx · on Sept 21, 2023

It's weird seeing no mention of Graylog anywhere here which is slightly different but I've found much easier to use in smaller setups. Unfortunately I have no idea what enterprise cost ends up looking like.

prabhatsharma · on Sept 21, 2023

Why build in this age when too many open source solutions backed by opentelemetry standard are available. Use fluentbit/vector/otel-collector to capture data and send to some open source solution.

eigenvalue · on Sept 21, 2023

Because I find all that stuff to be even more mental overhead to learn and work with, and super annoying to deploy and manage. It would literally take me longer to get one of those kinds of tools to work on my data the way I want it than it took me to make my own tool that does exactly what I want, exactly the way I want it, where it's incredibly trivial for me to add new kinds of logs or anything else.

When you have a hugely complex, made by committee, enterprise-grade generic system/protocol like opentelemetry that does anything and everything, at any scale, it's always going to have huge amount of excess complexity when you are trying to do a specific simple thing well and quickly. It would be harder to figure out the config files for that stuff than it was to just make my own system.

TheRealDunkirk · on Sept 21, 2023

It sounds like you reinvented the concept of a loghost with a database.

mongol · on Sept 21, 2023

It sounds like the difference between a car and a freight train.

moneywoes · on Sept 21, 2023

have you released this anywhere

eigenvalue · on Sept 21, 2023

Yes, just now: https://news.ycombinator.com/item?id=37600019