I think an underestimated issue with k8s (et al) is on a cultural level. Once you let in complex generic things, it doesn't stop there. A chain reaction has started, and before you know it, you've got all kinds of components reinforcing each other, that are suddenly required due to some real, or just perceived, problems that are only there in the first place because of a previous step in the chain reaction.
I remember back when the Cloud first started getting a foothold that what people was drawn to was that it would enable reducing complexity of managing the most frustrating things like the load-balancer and the database, albeit at a price of course, but it was still worth it.
Stateless app servers however, was certainly not a large maintenance problem. But somehow we've managed to squeeze in things like k8s in the there anyway, we just needed to evangelize microservices to create a problem that didn't exist before. Now that this is part of the "culture" it's hard to even get beyond hand-wavy rationalizations that microservices is a must, assumingly because it's the initial spark that triggered the whole chain reaction of complexity.
Cloud providers automate things like lease renewals, dealing with customs and part time labor contract compliance disputes for that datacenter in that Asian country that you don't know the language of.
I'm constantly fascinated how people handwaivingly underestimate the cost and headaches of actually running on prem global infrastructure.
I’m constantly fascinated by people who think they need on prem global infrastructure when the vast majority of applications both have very loose latency requirements (multiple seconds) or no users outside of the home country.
Two datacenters on opposite sides of the US from different providers will get you more uptime than a cloud provider and is super simple.
While some of the complexity goes away when it's on prem in to parts of the US, having to order actual hardware, putting it into racks, hiring, training, retaining the people there to debug actual hardware issues when they arise, dealing with HVAC concerns, etc is a lot of complexity that's probably completely outside of your core business expertise but that you'll have to spend mental cycles on when actually operating your own data center.
It's totally worth it for some companies to do that, but you need to have some serious size to be concerned with spending your efforts on lowering your AWS bill by introducing details like that into your own organization when you could alternatively spend those dollars to make your core business run better. Usually your efforts are better spent on the latter unless you are Netflix or Amazon or Google.
I recently rented a rack with a telecom and put some of my own hardware in it (it's custom weird stuff with hardware accelerators and all the FIPS 140 level 4 requirements), but even the telecom provider was offering a managed VPS product when I got on the phone with them.
The uptime in these DCs is very good (certainly better than AWS's us-east-1), and you get a very good price with tons of bandwidth. Most datacenter and colo providers can do this now.
I think people believe that "on prem" means actually racking the servers in your closet, but you can get datacenter space with fantastic power, cooling, and security almost anywhere these days.
On top is AWS lambda or something where you are completely removed from the actual hardware that's running your code.
At the bottom is a free acre of land where you start construction and talk to utilities to get electricity and water there. You build your own data center, hire people to run and extend it, etc.
There is tons of space in between where compromises are made by either paying a provider to do something for you or doing it yourself. Is somebody from the datacenter where you rented a rack or two going in and pressing a reset button after you called them a form of cloud automation? How about you renting a root VM at Hetzner? Is that VM on prem? People who paint these tradeoffs in a black and white matter and don't acknowledge that there are different choices for different companies and scenarios are not doing the discussion a service.
On the other hand, somebody who built their business on AppEngine or CloudFlare workes could look at that other company who is renting a pet pool of EC2 instances and ask if they are even in the cloud or if they are just simulating on-prem.
I think the question people are really interested in is usually "What percentage over my costs would I pay to outsource X?" (where X is some component of the complexity stack)
Which, first order approximated, is a function of (1) how big a company you are (aka "Can you even afford to hire two people to just do X?") and (2) how competitive the market is for X.
Colo and dedicated VMs are so reasonably priced because it's a standardized, highly-competive market.
Similarly, certain managed cloud services are ridiculously expensive because they have a locked-in customer base.
Which would suggest outsourcing components that have maximum vendor competition and standardization, as they're going to be offered at the lowest margin.
There's also a good point here (at least at the top of the stack) about reliability: the top of the spectrum goes down relatively frequently due to its dependencies, but even plain old boring EC2 has much better reliability than services like Lambda.
>I think people believe that "on prem" means actually racking the servers in your closet, but you can get datacenter space with fantastic power, cooling, and security almost anywhere these days.
That's because that is what on prem means. What you're describing is colocating.
When clouds define "on-prem" in opposition to their services (for sales purposes), colo facilities are lumped into that bucket. They're not exactly wrong, except a rack at a colo is an extension of your premises with a landlord who understands your needs.
> having to order actual hardware, putting it into racks, hiring, training, retaining the people there to debug actual hardware issues when they arise, dealing with HVAC concerns, etc is a lot of complexity that's probably completely outside of your core business
Vertical integration is a widely known and understood business strategy - running your own infrastructure will help you reclaim the cloud margins back for yourself.
You can do it as a one man band or a huge multinational.
I use hotels and other rental offerings, including the cloud. But I when it is advantageous to do so, I buy and own. Even though it comes with maintenance burdens.
I would say that a large portion of b2b or internal software tends to come into this category. If you are building something for a single business that only operates in one jurisdiction, and you don't do i18n, why bother with global distribution? A lot of b2b stuff covers process that have legal stuff baked in, like tax handling or other assumptions.
Are the majority of applications even developed by "companies"? I'm honestly not sure at all, or even how to go about measuring that. By the numbers, most games on steam are developed by individuals or small teams, even if the bulk of sales are driven by games produced by larger companies. I'd imagine the same is true of app stores, too.
Do you know how many businesses software is used for? Look at every local business you go to and enumerate the things they use software to accomplish.
When they pay their local utility bills, is that international? How about paying their rent? How about filing their state taxes? How about ordering from local suppliers?
Very little of the world is international business. That tiny slice that is just dominates the zeitgeist because it’s international.
For some examples of things well known that absolutely don’t need global data centers:
- airbus and Boeing
- Coca-Cola
- Marriott and Hilton
- the entire US federal government (apart from some maybe military applications)
- McDonald’s
The list goes on forever because it’s literally nearly every business. Unless you’re in real time markets or operating store fronts globally where latency hurts sales, putting up regions all over the world is a complete and utter waste of money.
Making global regions as easy as a click of a button was one of the greatest marketing ploys of cloud providers to date.
“Of course Good Will needs a Singapore data center!? How will we meet our P99 goals otherwise?”
McDonald's absolutely does need global data centers. Even now the kiosks are frustratingly slow here in Europe, can't imagine what a round-trip to US servers would do.
My siblings work there and their work information system is not so bad but definitely it'd be totally frustrating if it didn't run on AWS in EU.
If McD do need a global data centers to show the menu in the kiosks then they should fire their whole IT dept and start from scratch.
There is absolutely zero reason for a kiosk to touch 'a global data center'. No, not even for a payment, because it's just asks the payment terminal if the payment succeeded or not.
The only motivation would be latency, but you could have specialized services that run at the edge, if that's so important, for example if payment verification should take 500ms instead of 3000ms.
But you could also just rewrite the protocol to have less back-and-forth sequential data exchange, which is a smarter approach.
I absolutely disagree. The restaurant managers really shouldn't need to manage servers too. They get the kiosk as a service that they don't need to care about and that is correct.
So McDonald's should be dispatching kiosk admins all around the world? That's very much not eco friendly. And of course, expensive... And a total nightmare to manage. K8s and AWS is a night walk through the rose garden compared to that.
> No, not even for a payment, because it's just asks the payment terminal if the payment succeeded or not.
Yeah sure, works great as long as the kiosk doesn't crash during the payment.
> The restaurant managers really shouldn't need to manage servers too.
And they don't. Why the hell do you pull this nonsense?
In case you don't know (looks like you don't) these are Windows machines[0]. Any Windows machine is capable running IIS or even Apache on it. No need for the servers for managers to manage. Just effing serve it locally, if you can't provide each McD with a mini box what is managed by the central IT.
> should be dispatching kiosk admins all around the world
LOL
> K8s and AWS is a night walk through the rose garden compared to that.
You need a sysadmin to teach you how to build robust and resilient local applications and services without being Eco-unfriendly and without racking billions in AWS bills.
> Yeah sure, works great as long as the kiosk doesn't crash during the payment.
Do you understand what it is the payment terminal which is processing the payment and you can't offload reading CC/NFC to the 'cloud'?
And finally, there is already 'IT' infrastructure in each McD: Ethernet switches, UPSes, wireless AP and maybe a controller, menu and order screens at the counter, networked PoS and whatever else. If you claim what restaurant managers manage all these then you don't really know anything about nor retail nor IT. Or just bickering in a bad faith.
[0] and if in your case they are Linux ones... do I even need to continue?
What sys admin? They don't have any. The kiosks are entirely blackboxes managed from the US central. It works if it has a working network. The store manager can manage it from their iPad lying down on a beach. The card terminals send data directly to a bank and it doesn't matter whether the box burns down while it crashes, the payment is recorded.
I built similar kind of self service box (self checkout) for a local store chain. I know the problem well. Main problems are costs, system administration and management. Our solution is just a simple Chrome kiosk mode browser window too. It's a smart solution.
We run it on AWS because no reason not to - simply pushing the SPA to S3 behind Cloudfront beats any other kind of deployment/hosting method. Some backend stuff runs in Lambda, some runs in ECS containers. All data in a managed RDS DB. Easy to use, easy to maintain, easy to upgrade, easy to deploy, easy to scale from 0 to several thousands kiosks...
I used to be a Windows admin (though admittedly the last Windows I managed was 2003 R2). Just the word IIS makes my neck hair stand up.
Especially if it should be holding payment data... Huh, damn. Wow.
You'd need sync to a global service anyways - it's a multi store chain. Your suggestion just makes everything harder and more convoluted - and it's just wrong, the payment terminal has its own network connection and doesn't need the kiosk to work at all. The kiosk just initiates the payment but the rest of it is handled out of it - and the kiosk waits for central service to confirm the payment. It's bullshit to connect the kiosk to the bank, we support many banks and keep adding support for more, and what if it's stolen, should we always be ready to rotate the certs on 1000s devices - the bank doesn't have appropriate api, so it's either that or again, a central service? Nah, we just turn it off in our admin panel.
I also used to be a Linux sys admin way before clouds. I built one of the first cloud services in my country to solve the problems of that. The company ultimately was out competed - but not by traditional hosting. By the big public clouds. Apparently the problems are real.
BTW coincidentally I'm just about to launch a new cloud platform. Nothing fancy really, but it's built on dedicated servers for performance. Last 3 weeks I spent working on stuff I could've done with 100 lines of Terraform/Pulumi with AWS. Maybe a proper sys admin like you could teach me? I am not happy with it at all, it's a major headache and I'm considering a hybrid setup because I just don't want to lose sleep over customer data and site availability.
Most software is internal software used by a handful of people in a certain department of a company. Even in large multinational conglomerates the applications used by every single country subsidiary can be counted using your fingers.
There are tons of examples where low latency is good for business, even small businesses. I'm sure you've seen the studies from Amazon that every 100ms of page load latency is costing them 1% of revenue, etc. Also everything communication is very latency sensitive.
Of course there are plenty of scenarios where latency does not matter at all.
So you can trade off 300ms of additional roundtrip time (on anything non-CDNable) at a cost of 3% revenue and reduce your infrastructure complexity a lot
Disagreed, once we're not talking a worldwide shop for non-critical buys like Amazon the picture changes dramatically. Many people on local markets have no choice and will stick around no matter how slow the service is.
Evidence: my wife buying our groceries for delivery at home. We have 4-5 choices in our city. All their websites are slow as hell, and I mean adding an item to a cart takes good 5-10 seconds. Search takes 20+ seconds.
She curses at them every time yet there's nothing we can do. The alternative is both of us to travel by foot 20 minutes to the local mall and wait on queues. 2-3 times a week. She figured the slow websites are the lesser evil.
Your mind is stuck in low cost retail shopping. Even setting aside the likely self-serving nature of that “study”, most interactions are not latency sensitive like that.
When I go to book my colonoscopy on my hospital’s reservation system, I don’t bail out and look for a new doctor if it takes me 10 tries.
There are very few businesses where UX latency at the sub second level matters and the ones that do are not the ones you want to be in.
I've spent a good chunk of my career in realtime audio/video conferencing and am now working on APIs for B2B SaaS, thanks ;)
I also know that user experience of any application suffers a lot when latencies are high. Your point seems to be that there is lots of software that doesn't care about its user experience (mostly because the people making buying decisions are not the people suffering from those decisions) and that's a fair point, but I don't think that is a great business strategy for any software business.
Of course there are lots of scenarios where latency literally doesn't matter at all.
My startup hosts our own training servers in a colo-ed space 10 min from our office. Took less than 40 hours to get moved in, with most of the time tinkering with fortigate network appliance settings.
Cloudflare zero trust for free is a huge timesaver
To call a halt to your constant fascination: they don't all have that problem. They still get the complexity of cloudy things regardless when they use one.
They also get some of the complexity of cloudy things when they run their own datacenter. In the end you find stuff like OpenStack which becomes its own nightmare universe.
Not sure how how you could jump all the way to running your own Asian datacenter from my post. A bit amusing though :). I even wrote that it's worth running the LB/DB in the Cloud?
Oh it was more of an addition to your point about "reducing complexity of managing the most frustrating things like the load-balancer and the database, albeit at a price of course". There is a whole mountain of complexity that most software engineers never think about when they dream about going back to the good old on prem days.
Alright, just feels like taking a bit too far into the exceptions. Even back then only large companies would consider that. Renting servers, renting a server rack (co-location), or even just a in-office server rack for what would be a startup today.
I know it's fashionable to hate on Kubernetes these days, and it is overly complex and has plenty problems.
But what other solution allows you to:
* declarative define your infrastructure
* gives you load balancing, automatic recovery and scaling
* provides great observability into your whole stack (kubectl, k9s, ...)
* has a huge amount of pre-packaged software available (helm charts)
* and most importantly: allows you to stand up mostly the same infrastructure in the cloud, on your own servers (k3s), and locally (KIND), and thus doesn't tie you into a specific cloud provider
The answer is: there isn't any.
Kubernetes could have been much simpler, and probably was intentionally built to not be easy to use end to end.
I like to think that most people who are upset at Kubernetes don't hate on all of it. I think the configuration aspect (YAML) and the very high level of abstraction is what get people lost and as a result they get frustrated by it. I've certainly fall in that category while trying to learn how to operate multiple clusters using different topologies and cloud providers.
But from an operational standpoint, when things are working, it usually behaves very well until you hit some rough edge cases (upgrades were much harder to achieve a couple of years back). But rough edges exist everywhere, and when I get to a point where K8s hits a problem, I would think that it would be much worse if I wasn't using it.
> I like to think that most people who are upset at Kubernetes don't hate on all of it. I think the configuration aspect (YAML) …
I question the competence of anyone who does not question (and rag on) the prevalence of templating YAML.
> But rough edges exist everywhere, and when I get to a point where K8s hits a problem, I would think that it would be much worse if I wasn't using it.
Damn straight. It’s only bad because everything else is strictly worse.
What are the reasons to not use JSON rather than YAML? From my admittedly-shallow experience with k8s, I have yet to encounter a situation in which I couldn't use JSON. Does this issue only pop up once you start using Helm charts?
at the surface level yaml is a lot easier to read and write for a human. less "s. but once you start using it for complex configuration it becomes unwieldy. but at that point json is also not better than yaml.
after using cdk i think that writing typescript to define infra is a significantly better experience
There is no easy solution to manage services and infrastructure: people who hate kubernetes complexity often underestimate the efforts of developing on your own all the features that k8s provides.
At the same time, people who suggest everyone to use kubernetes independently on the company maturity often forget how easy it is to run a service on a simple virtual machine.
In the multidimensional space that contains every software project, there is no hyperplane that separates when it’s worth to use kubernetes or not. It depends on the company, the employees, the culture, the business.
Of course there are general best practices, like for example if you’re just getting started with kubernetes, and already in the cloud, using a managed k8s service from your cloud provider could be a good idea. But again, even for this you’re going to find opposing views online.
> There is no easy solution to manage services and infrastructure: people who hate kubernetes complexity often underestimate the efforts of developing on your own all the features that k8s provides.
This. I was trying to create some infrastructure and application once, using various AWS and off the shelf components. I stopped halfway through when I realized I was reinventing k8s, very poorly. That's when I switched gears and learned k8s.
With that said, I use it sparingly due to the inherent complexity it brings, but at least I have a better handle on how and when it should be used and when it should not, and what problems it solved, since I myself was trying to solve some of the problems.
The thing is, "kubernetes" doesn't give you that either. You want a LB? Here's a list of them that you can add to a cluster. But actually pick multiple, because the one you picked in AWS doesn't support bare metal.
Bare metal kubernetes is certainly a lot less complete out of the box when it comes to networking and storage but, people can, and often should, use a managed k8s service which provides all those things out of the box. And if you’re on bare metal once the infra team has abstracted away everything into LoadBalancers and StorageClasses it’s basically the same experience for end users of the cluster.
If you're talking about OpenShift on rented commodity compute, maybe. If you're talking about GKE/AKS/EKS or similar, I disagree wholeheartedly; you're then paying several multiples on the compute and a little extra for Kubernetes.
> because the one you picked in AWS doesn't support bare metal
That's just because AWS's Kubernetes offering is laughably bad.
There is huge difference in your experience whether you use Kubernetes via GKE (Autopilot) or any other solution (at least as long you don't have a dedicated infrastructure team).
I think "bare" Kubernetes is still a quite nice tool that allows for learning transferable skills across clouds (similar to Terraform). E.g. even if I have to spin up my own nginx-ingress to be able to handle ingress resources, after having learned that initially, I can basically do the same thing across clouds.
It's just that GKE (Autopilot) does a lot of those out of the box for you that, so you get a much easier end-to-end experience for non-admins (= "request resources -> have them instantiated").
When I reflect what Netflix did back in 2010ish on AWS:
* The declarative infra is EC2/ASG configurations plus Jenkins configurations
* Client-side load balancing
* ASG for autoscaling and recovery
* Amazing observability with a home-grown monitoring system by 4 amazing engineers
Most of all, each of the above item was built and run by one or two people, except the observability stack with four. Oh, standing up a new region was truly a non-event. It just happened and as a member of the cloud platform team I couldn't even recall what I did for the project. It's not that Netflix's infra was better or worse than using k8s. I'm just amazed how happy I have been with an infra built more than 10 years ago, and how simple it was for end users. In that regard, I often question myself what I have missed in the whole movement of k8s platform engineering, other than people do need a robust solution to orchestrate containers.
> * has a huge amount of pre-packaged software available (helm charts)
> * and most importantly: allows you to stand up mostly the same infrastructure in the cloud, on your own servers (k3s), and locally (KIND), and thus doesn't tie you into a specific cloud provider
NixOS. I have no clue about kubernetes, but I think NixOS even goes much deeper in these points (e.g. kubernetes is at the "application layer" and doesn't concern itself with declaratively managing the OS underneath, if I understand right). The other points seem much more situational, and if needed kubernetes might well be worth it. For something that could be a single server running a handful of services, NixOS is amazing.
There are lots of native NixOS tools for managing whole clusters (NixOps, Disnix, Colmena, deploy-rs, Morph, krops, Bento, ...). Lots of people deploy whole fleets of NixOS servers or clusters for specific applications without resorting to Kubernetes. (Kube integrations are also popular, though.) Some of those solutions are very old, too.
Disnix has been around for a long time, probably since before you ever heard of NixOS.
I wouldn't say completely orthogonal. E.g. the points I've cited are overlap between the two, and ultimately both are meant to host some kind of services. But yes NixOS by itself manages a single machine (although combined with terraform it can become very convenient to also manage a fleet of NixOS machines). Kubernetes manages services on a cluster, but given how powerful a single machine can be I do think that many of those clusters could also just be one beefy server (and maybe a second one with some fail over mechanism, if needed).
If the cluster is indeed necessary though, I think NixOS can be a great base to stand up a Kubernetes cluster on top of.
> and thus doesn't tie you into a specific cloud provider
It ties you to k8s instead, and it ties you to a few company wide heroes, and that is not a 'benefit' as it's being touted here.
Being tied to a cloud is not a horrible situation either. I suspect "being tied to a cloud" is a boogeyman that k8s proponents would like to spread, but just like with k8s, with the right choices, cloud integration is a huge benefit.
There are an enormous number of tools that meet these requirements, most obviously Nomad. But really any competently-designed system, defined in terms of any cloud-agnostic provisioning system (Chef, Puppet, Salt, Ansible, home-grown scripts) would qualify.
And, for the record, observability is something very much unrelated to kubectl or k9s.
You’re right that kubernetes is a bit batteries included, and for that its tempting to take it off the shelf because it “does a lot of needed things”, but you don’t need one tool to do all of those things.
It is ok to have domain specific processes or utilities to solve those.
* your stack almost always ends up closely tied to one cloud provider. I've done and seen cloud migrations. They are so painful and costly that they often just aren't attempted.
* Cloud services make it much harder to run your stack locally and on CI. There are solutions and workarounds, but they are all painful. And you always end up tied to the behaviour of the particular cloud services
> but you don’t need one tool to do all of those things
To get the same experience, you do. And I don't see why you would want multiple tools.
If anything, Kubernetes isn't nearly integrated and full-featured enough, because it has too many pluggable parts leading to too much choice and interfacing complexity. Like pluggable ingress, pluggable state database, pluggable networking stack, no simple "end to end app" solution ( KNative, etc), ...
This overblown flexibility is what leads to most of the pain and perceived complexity, IMO.
> This overblown flexibility is what leads to most of the pain and perceived complexity, IMO.
Huh, I guess you are spot on. My first experience with kubernetes was k3s and I couldn't for a long time figure out what's all the fuss is about and where is all that complexity people talk so much about. But then I tried vanilla kubernetes.
Perhaps a little on the tinfoil hat side of things, but it isn't completely unreasonable to think that some of the FUD could originate from cloud providers. Kubernetes is a commoditizing force to some extent.
Far from it. TF is mostly writing static content, maybe read one or two things. It’s missing the runtime aspect of it, so are most cloud offerings, without excessive configuration. Rollouts, health probes, logs, service discovery. Just to name a few.
You missed what I think is the most important point in OP's list: it does all of the above in a cloud agnostic way. If I want to move clouds with TF I'm rewriting everything to fit into a new cloud's paradigm. With Kubernetes there's a dozen providers built in (storage, loadbalancing, networking, auto scaling, etc.) or easy to pull in (certificates, KMS secrets, DNS); and they make moving clouds (and more importantly) running locally much easier.
Kubernetes is currently the best way to wrap up workloads in a cloud agnostic way. I've written dozens of services for K8s using different deployment mechanisms (Helm, Carvel's kapp, Flux, Kustomize) and I can run them just as easily in my home K8s cluster and in GCP. It's honestly incredible; I don't know of any other cloud tech that lets me do that.
One thing I think a lot of people miss too, is how good the concepts around Operators in Kubernetes are. It's hard to see unless you've written some yourself, but the theory around how operators work is very reminiscent of reactive coding in front end frameworks (or robotics closed loop control, what they were originally inspired by). When written well they're extremely resilient and incredibly powerful, and a lot of that power comes from etcd and the established patterns they're written with.
I think Kubernetes is really painful sometimes, and huge parts of it aren't great due to limitations of the language it's written in; but I also think it's the best thing available that I can run locally and in a cloud with a FOSS license.
> it does all of the above in a cloud agnostic way.
I'll give you the benefit of the doubt here and say that some of the basics are indeed cloud agnostic.
However, it's plain for many or most to see that outside of extremely "toy" workloads you will be learning a specific "flavour" of Kubernetes. EKS/GKE/AKS etc; They have, at minimum, custom resource definitions to handle a lot of things and at their worst have implementation specific (hidden) details between equivalent things (persistent volume claims on AWS vs GCP for example is quite substantially different).
For multicloud I usually think of my local K8s cluster and GKE, it's been a few years since I touched EKS. I'd love to hear your opinions on the substantive differences you run into. When switching between clouds I'm usually able to get away with only changing annotations on resources, which is easy enough to put in a values.yml file. I can't remember the last time I had to use a cloud specific CRD. What CRD's do you have to reach for commonly?
Thinking about it; the things I see as very cloud agnostic: Horizontal pod autoscaling, Node autoscaling, Layer 4 loadbalancing, Persistent volumes, Volume snapshots, Certificate managment, External DNS, External secrets, Ingress (when run in cluster, not through a cloud service),
That ends up covering a huge swath of my usecases, probably 80-90%. The main pain points I usually run into are: IAM, Trying to use cloud layer 7 ingress (app loadbalancers?)
I totally agree the underlying implementation if resources can be very different, but that's not the fault of Kubernetes; it's an issue with the implementation from the operator of the K8s cluster. All abstractions are going to be leaky at this level. But for PVCs I feel like storageclasses capture that well, and can be used to pick the level of performance you need per cloud; without having to rewrite the common provision of block device.
Something feels very off and mantra-like with the proportionality of how often cloud migration benefits are being presented as something very important to how often that actually happens in practice. Not to even mention that it also assumes that simpler setups are automatically harder to move around between clouds, or at least that there are a significant difference in required effort.
When I say it's easy to move between clouds, I'm not referring to an org needing to pick up everything and move from AWS to GCP. That is rare, and takes quite a bit of rearchitecting no matter what.
When I say something is easy to move, I mean that when I build on top of it, it's easy for users to run it in their cloud of choice with changes in config. It also means I have flexibility with where I choose to run something after I've developed it. For example I develop most stuff against minikube, then deploy it to GCP or a local production k8s. If I was using Terraform I couldn't do that.
Not sure what kind of apps this is but I can't see the big value-add on a golang app binary wrt to being cloud agnostic, nor wrt local development. It makes even less assumptions on the user's env. Still need some cloud conf (terraform, database etc) either way
If you'll excuse a slight digression, but I think there's a tendency atm to rather pay $1 in extra complexity a 100 times over time, than pay a $5 one-time fee. Like if repeating something similar twice - even if it's easy and not really a lot of effort - is a sign of failure and thus unbearable.
This is how I accomplished these things before. It involved simpler independent pieces which would not collapse whenever something went wrong. They were easy to reason about and building and fixing such tooling did not require me to hire an expensive consultant.
* declarative define your infrastructure
Declare in README.md that we have 3 web servers and that
Bob Jones set them up and manages them. Include Bob's email address and phone number.
* gives you load balancing, automatic recovery and scaling
Load balancing via a load balancer or another scheme. DNS is good enough for some cases. There are other solutions.
Automatic recovery - daemon scripts on the box to start all services when the box boots. VPS provider bounces the box when it crashes. That's one. There are others.
Scaling - automatic scaling is not needed at vast majority of companies that are starting out. When we need to scale to 4 servers, change README.md and send Bob Jones a quick message.
* provides great observability into your whole stack (kubectl, k9s, ...)
This is a need that is introduced because of k8s. There's a lot less to observe without k8s, and tools exist for it.
It's like saying "my backhoe has great diagnostic tools for diagnosing backhoe issues." That's true, but I don't have a backhoe and don't need a backhoe for what I am doing.
* has a huge amount of pre-packaged software available (helm charts)
This is a need that is introduced because of k8s. See above.
* allows you to stand up mostly the same infrastructure in the cloud, on your own servers (k3s), and locally (KIND), and thus doesn't tie you into a specific cloud provider
> doesn't tie you into a specific cloud provider
Not a real problem for most companies. If you're preparing to change cloud providers from day one, you are likely spending time on the wrong problem.
> same infrastructure in different envs
This is a benefit, which other solutions come close to, but k8s shines at. You can go a long way without having reproducible multi-machine setups in different envs and can come pretty close when needed, with manual work.
> Kubernetes could have been much simpler, and probably was intentionally built to not be easy to use end to end.
If true, this is a strange design choice. I'd be wary of anything that was made complex just for the sake of it.
k8s gets enough flack without having to accuse it of being complex just for funsies.
> But it's still by far the best we've got.
k8s is the best we've got when we want k8s. The trick is to not want k8s for the sake of wanting k8s.
There are times when k8s provides tremendous value. Most companies who decide to use it do not have the problems that k8s promises to solve, and never will. Sadly, sometimes it's because they've spent their time and money on unnecessary complexity like k8s instead of building a product that delivers value.
I suppose I’m the guy pushing k8s on midsized companies. If there have been unhappy engineers along the way - they’ve by the vast majority stayed quiet and lied about being happier on surveys.
Yes, k8s is complex. The tool matches the problem: complex. But having a standard is so much better than having a somewhat simpler undocumented chaos. “Kubectl explain X” is a thousand times better than even AWS documentation, which in turn was a game changer compared to that-one-whiteboard-above-Dave’s-desk. Standards are tricky, but worth the effort.
Personally I’m also very judicious with operators and CRDs - both can be somewhat hidden to beginners. However, the operator pattern is wonderful. Another amazing feature is ultra simple leader election - genuinely difficult outside of k8s, a 5 minute task inside. I agree with Paul’s take here tho of at least being extremely careful about which operators you introduce.
At any rate, yes k8s is more complex than your bash deploy script, of course it is. It’s also much more capable and works the same way as it did at all your developers previous jobs. Velocity is the name of the game!
I have to say that I don't believe the problem is all that complex unless you make it hard. But on the flip side, if you're a competent Kubernetes person, the correct Kubernetes config is also not that complex.
I think a lot of the reaction here is a result of the age-old issues of "management is pushing software on me that I don't want" and people adopting it without knowing how to use it because it's considered a "best practice."
In other words, the reaction you probably have to an Oracle database is the same reaction that others have to Kubernetes (although Oracle databases are objectively crappy).
Good point about k8s vs. AWS docs — a lot of the time people say “just use ECS” or the AWS service of the day, and it will invariably be more confusing to me and more vendor-tied than just doing the thing in k8s.
And then if you're unlucky you might hit one of the areas where the AWS documentation has a "teaser" about some functionality that is critical for your project, you spend months looking for the rest of the documentation when initial foray doesn't work, and the highly paid AWS-internal consultants disappear into thin air when asked about the features.
So nearly a year later you end up writing the whole feature from scratch yourself.
My current company is split... maybe 75/25 (at this point) between Kubernetes and a bespoke, Ansible-driven deployment system that manually runs Docker containers on nodes in an AWS ASG and will take care of deregistering/reregistering the nodes with the ALB while the containers on a given node are getting futzed with. The Ansible method works remarkably well for it's age, but the big thing I use to convince teams to move to Kubernetes is that we can take your peak deploy times from, say, a couple hours down to a few minutes, and you can autoscale far faster and more efficiently than you can with CPU-based scaling on an ASG.
From service teams that have done the migrations, the things I hear consistently though are:
- when a Helm deploy fails, finding the reason why is a PITA (we run with --atomic so it'll rollback on a failed deploy. What failed? Was it bad code causing a pod to crash loop? Failed k8s resource create? who knows! have fun finding out!)
- they have to learn a whole new way of operating, particularly around in-the-moment scaling. A team today can go into the AWS Console at 4am during an incident and change the ASG scaling targets, but to do that with a service running in Kubernetes means making sure they have kubectl (and it's deps, for us that's aws-cli) installed and configured, AND remembering the `kubectl scale deployment X --replicas X` syntax.
The problem with bespoke, homegrown, and DIY isn't that the solutions are bad. Often, they are quite good—excellent, even, within their particular contexts and constraints. And because they're tailored and limited to your context, they can even be quite a bit simpler.
The problem is that they're custom and homegrown. Your organization alone invests in them, trains new staff in them, is responsible for debugging and fixing when they break, has to re-invest when they no longer do all the things you want. DIY frameworks ultimately end up as byzantine and labyrinthine as Kubernetes itself. The virtue of industry platforms like Kubernetes is, however complex and only half-baked they start, over time the entire industry trains on them, invests in them, refines and improves them. They benefit from a long-term economic virtuous cycle that DIY rarely if ever can. Even the longest, strongest, best-funded holdouts for bespoke languages, OSs, and frameworks—aerospace, finance, miltech—have largely come 'round to COTS first and foremost.
Personally, I don't like Helm. I think for the vast majority of usecases where all you need is some simple templating/substitution, it just introduces way more complexity and abstraction than it is worth.
I've been really happy with just using `envsubst` and environment variables to generate a manifest at deploy time. It's easy with most CI systems to "archive" the manifest, and it can then be easily read by a human or downloaded/applied manually for debugging with. Deploys are also just `cat k8s/${ENV}/deploy.yaml | envsubt > output.yaml && kubectl apply -f output.yaml`
I've also experimented with using terraform. It's actually been a good enough experience that I may go fully with terraform on a new project and see how it goes.
You might like kubernetes kustomize if you don't care for helm (IMO, just embrace helm, you can keep your charts very simple and it's straight forward). Kustomize takes a little getting used to, but it's a nice abstraction and widely used.
I cannot recommend terraform. I use it daily, and daily I wish I did not. I think Pulumi is the future. Not as battle tested, but terraform is a mountain of bugs anyway, so it can't possibly be worse.
Just one example where terraform sucks: You cannot both deploy a kubernetes cluster (say an EKS/AKS cluster) and then use kubernetes_manifest provider in a single workspace. You must do this across two separate terraform runs.
I haven’t used kubernetes in a few years, but do they have a good UI for operations? Your example of the AWS console where you can just log in and scale something in the UI but for kubernetes. We run something similar on AWS right now, during an incident we log into the account with admin access to modify something and then go back to configure that in the CDK post incident.
AWS has a UI for resources in the cluster but it relies on the IAM role you're using in the console to have configured perms in the cluster, and our AWS SSO setup prevents that from working properly (this isn't usually the case for AWS SSO users, it's a known quirk of our particular auth setup between EKS and IAM -- we'll fix it sometime).
I have to say that when you have more buy in from delivery teams and adoption of HPAs your system can become more harmonious overall. Each team can monitor and tweak their services, and many services are usually connected upstream or downstream. When more components can ebb and flow according to the compute context then the system overall ebbs and flows better. #my2cents
IMO the big win with Kubernetes is helm or operators. If you're going to pay the complexity costs you might as well get the wins which is essentially a huge 'app-store' of popular infrastructure components and an entirely programmatic way to manage your operations (deployments, updates, fail-overs, backups, etc).
For example if you want to setup something complex like Ceph - Rook is a really nice way to do that. It's a very leaky abstraction so you aren't hiding all the complexity of Ceph but the declarative interface is generally a much nicer way to manage Ceph than a boatload of ansible scripts or generally what we had before. The key to understand is that helm or operators don't magically make infrastructure a managed 'turn-key' appliances, you do generally need to understand how the thing works.
I see Kubernetes the same way as git. Elegant fundamental design, but the interface to it is awful.
Kubernetes is designed to solve big problems and if you don't have those problems, you're introducing a tonne of complexity for very little benefit. An ideal orchestrator would be more composable and not introduce more complexity than needed for the scale you're running at. I'd really like to see a modern alternative to K8S that learns from some of its mistakes.
I was once talking to an ex google site reliability engineer. He said there are maybe a handful of companies in the world that _need_ k8s. I tend to agree. A lot of people practice hype driven development.
Kubernetes scales down pretty well. I don't use network layers or crazy ingress setups. I keep it simple and Kubernetes works great.
What's wonderful is that when I work on multiple clouds, my knowledge transfers just fine. I don't think of the AWS solution or the GCS solution, I use the same kubectl to check out both, view logs, inspect and fix.
Even when I got tired of waiting for GKE to spin up a node, running Github actions on a self-hosted microk8s meant instant pod starts and very little fuss. But using Kubernetes meant I got to take advantage of the Github operator, which let me reuse the same machine for multiple builds without the headaches.
When I want to run some open source, I often find a helm chart the helps me get set up. Nowadays running open source packages can involve all kinds of dependencies, but getting it running on a k8s cluster to check it out, or even in prod, is a relatively straight forward editing of some values files. I've recently ran Uptrace and Superset that way. They're not a bajillion requests per second setups, they don't have to be, and it was far easier to set up than most methods.
I would say your friend is right. Few people _need_ k8s but it's one interface to a bunch of complicated proprietary stuff. I can know core small, core set of k8s tools really well and forget half of the junk that I ever knew about public clouds. It's all the same patterns, transferable and reliable.
I push for k8s because I know it. Why not use something that I know how to use? I know how to quickly set up a cluster, what to deploy, and teach other team members about fundamentals.
How many people out there really need C# or object oriented programming?
The argument you present might be valid if you decide to use a tech stack prior having much experience with it.
If you're expecting app/FE devs to have to learn it you're putting a ton of barriers in their way in terms of deploying. Just chucking a container on a non-k8s managed platform (e.g. Cloud Run) would be much simpler, and no pile of bash scripts.
PaaSes are for companies with money to burn, most of the time. A good k8s team (even a single person, to be quite honest) is going to work towards providing your application teams with simple templates to let them deploy their software easily. Just let them do it.
Also, in my experience, you either have to spend ridiculous amounts of money on SaaS/PaaS, or you find that you have to host a lot more than just your application and suddenly the deployment story becomes more complex.
Depending on where you are and how much you're willing to burn money, you might find out that k8s experts are cheaper than the money saved by not going PaaS.
> If you're expecting app/FE devs to have to learn it
Why would anyone expect it? It's not their job, is it? We don't expect backend devs to know frontend and vice-versa, or any of them to have AWS certification. Why would it be different with k8s?
> Just chucking a container on a non-k8s managed platform (e.g. Cloud Run) would be much simpler, and no pile of bash scripts.
Simpler to deploy, sure, but not to actually run it seriously in the long term. Though, if we are talking about A container (as in singular), k8s would indeed be some serious over-engineering
That might be true, but unfortunately the state of the art infrastructure tooling is mostly centered around k8s. This means that companies choose k8s (or related technologies like k3s, Microk8s, etc.) not because they strictly _need_ k8s, but because it improves their workflows. Otherwise they would need to invest a disproportionate amount of time and effort adopting and maintaining alternative tooling, while getting an inferior experience.
Choosing k8s is not just based on scaling requirements anymore. There are also benefits of being compatible with a rich ecosystem of software.
Continuous deployment systems like ArgoCD and Flux, user friendly local development environments with tools like Tilt, novel networking, distributed storage, distributed tracing, etc. systems that are basically plug-and-play, etc. Search for "awesome k8s" and you'll get many lists of these.
It's surely possible to cobble all of this together without k8s, but k8s' main advantage is exposing a standardized API that simplifies managing this entire ecosystem. It often makes it worth the additional overhead of adopting, understanding and managing k8s itself.
It's a dumb statement especially from an SRE, it's typically a comment from people that don't understand k8s and think that k8s is only there to have the SLA of Google.
For most use case k8s is not there to give you HA but to give you a standard way of deploying a stack, that being on the cloud or on prem.
He understood it fully, he was running a multi day course on it when I spoke to him. He was candid about the tech, most of us where there at the behest of our orgs.
In my personal experience, Google SREs as well as k8s devs sometimes didn't grok how wide k8s usability was - they also can be blind to financial aspects of companies living outside of Silly Valley.
Just a thought as well in my corpo experience: Unfortunately, there are some spaces that distribute solutions as k8s-only... Which sucks. I've noticed this mostly in the data science/engineering world. These are solutions that could be easily served up in a small docker compose env. The complexity/upsell/devops BS is strong.
To add insult to injury, I've seen more than one use IaC cloud tooling as an install script vs a maintainable and idempotent solution. It's all quite sad really.
You either recreate a less reliable version of kubernetes for workload ops or you go all in on your cloud provider and hope they'll be responsible for your destiny.
Vanilla Kubernetes is just enough abstraction to avoid both of those situations.
Doesn't really mesh with my experience, especially the longer k8s been out.
It can be cheaper to depend on cloud provider to ship some features, but with tools like crossplane you can abstract that out so developers can just "order" a database service etc. for their application.
At this point I found out that k8s knowledge is more portable, whereas your trove of $VENDOR_1 knowledge might suddenly have issues because for reasons outside of your capacity to control there's now big spending contract signed with $VENDOR_2 and a mandate to move.
And with smaller companies I tend to find k8s way more cost effective. I pulled things I wouldn't be able to fit in a budget otherwise.
I joined a team that used AWS without kubernetes. Thousands of fragile weird python and bash scripts. Deployment was always such a headache.
A few months later I transitioned the team to use containers with proper CI/CD and EKS with Terraform and Argo CD. The team and also the managers like it, since we could deploy quite quickly.
And that hype is in large part created by Google and other cloud vendors.
To be honest I hardly see any reasonable/actionable advice from Cloud/SAAS vendors. Either it is to sell their stuff or generic stuff like "One should be securing / monitoring their stuff running in prod". Oh wow, never thought or done any such thing before.
Most companies in the world don't need to develop software. Software development itself is hype. But there's lots of money in it, despite no actual value being created most of the time.
If you're on AWS, yeah, I'd say just use ECS until you need more complexity. Our ECS deployments have been unproblematic for years now.
Our K8s clusters never goes more than a couple days without some sort of strange issue popping up. Arguably it could be because my company outsourced maintenance of it to an army of idiots. But K8s is a tool that is only as good as the operator, and competence can be hard to come by at some companies.
Agreed. But if you're already on AWS, I'd say the quality floor is already higher than the potential at 95%+ of other companies.
So I say unless you're at a company that pays top salaries for the top 5% of engineering talent, you're probably better off just using the AWS provided service.
I used to have a saying back when Heroku was more favourable, is that you use Heroku because you want to go bankrupt. AWS is at times similar.
Depending on your local market, AWS bills might be way worse than the cost of few bright ops people who will let you choose from offerings including running dev envs on random assortment of dedicated servers and local e-waste escapees
Cloud run, etc, but there seem to be some biggish gaps in what those tools can do (probably because if deploying a container was too easy the cloud providers would lose loads of profit).
I honestly think docker compose is the best default option for single-machine orchestration. The catch is that you either need to do some scripting to get fully automated zero downtime deploys. I have to imagine someone will eventually figure out a way to trivialize that, if they haven't already. Or, you could just do the poor man's zero downtime deploy: run two containers, deploy container a, wait for it to be ready, then deploy container b, and let the reverse proxy do the rest.
Docker Swarm takes the Compose format and takes it to multi-node clusters with load balancing, while keeping things pretty simple and manageable, especially with something like Porainer!
For larger scale orchestratiom, Hashicorp Nomad can also be a notable contender, while in some ways still being simpler than Kubernetes.
And even when it comes to Kubernetes, distros like K3s and tools like Portainer or Rancher can keep managing the cluster easy.
If you want to stick on one machine, you can always just use a single node Docker Swarm to get the fully automated zero downtime deploys you want with Docker Compose:
> But we often do multiple deploys per day, and when our products break, our customer’s products break for their users. Even a minute of downtime is noticed by someone.
Kubernetes might be the right tool for the job if we accept that this is a necessary evil. But maybe it's not? The idea that I might fail to collaborate with you because a third party failed because a fourth party failed kind of smells like a recipe for software that breaks all the time.
It really comes down to, I don't ever want to have the conversation “is this a good time to deploy, or should we wait until tonight when there’s less usage”. We have had some periods where our system was more fragile, and planning our days around the least-bad deployment window was a time suck, and didn't scale to our current reality of round-the-clock usage.
You can achieve this without k8s, though. If your goal is, "I want zero-downtime deploys," that alone is not sufficient reason to reach for something as massively complex as k8s. Set up a reverse proxy and do blue-green deploys behind it.
"Set up a reverse proxy and do blue-green deploys behind it."
I think this already introduces enough complexity and edge cases to make reinventing the wheel a bad idea. There's a lot involved in doing it robustly.
There are alternatives to Kubernetes (I prefer ECS/Fargate if you're on AWS), but trying to do it yourself to a production-ready standard sets you up for a lot of unnecessary yak shaving imho.
This sounds like terrible advice. Managing a reverse proxy with blue-green deploys behind it is not going to be trivial, and you have to roll most of that yourself. The deployment scripts alone are going to be hairy. Getting the same from K8s requires having a deploy.yaml file and a `kubectl apply -f <file>`. K8s is way less complex.
I ran such a system in prod over 7 years with >5-9s uptime, multiple deploys per day, and millions of users interacting with it. Our deploy scripts were ~10 line shell scripts, and any more complex logic (e.g. batching, parallelization, health checks) was done in a short Go program. Anyone could read and understand it in full. It deployed much faster than our equivalent stack on k8s.
k8s is a large and complex tool. Anyone who's run it in production at scale has had to deal with at least one severe outage caused by it.
It's an appropriate choice when you have a team of k8s operators full-time to manage it. It's not necessarily an appropriate choice when you want a zero-downtime deploy.
> It's an appropriate choice when you have a team of k8s operators full-time to manage it.
Are you talking about a full self-run type of scenario where you setup and administer k8s entirely yourself, or a managed system or semi-managed (like OpenShift)? Because if the former then I would agree with you, although I wouldn't recommend a full self-run unless you were a big enough corp to have said team. But if you're talking about even a managed service, I would have to disagree. I've been running for years on a managed service (as the only k8s admin) and have never had a severe outage caused by K8s
It isn’t, sadly, but the logic is straightforward. Have a set of IPs you target, iterate with your deploy script targeting each, check health before continuing. If anything doesn’t work (e.g. health check fails), stop the deploy to debug. There’s no automated rollback—simply `git revert` and run the deploy script again.
Did you manually promote deployments from one stage to another? This level of manual intervention is not sustainable if you deploy multiple times a day. How often did you deploy?
>> Managing a reverse proxy with blue-green deploys behind it is not going to be trivial, and you have to roll most of that yourself.
There are a lot of reverse proxies that will do this. Traditionally this was the job of a load balancer. With that being done by "software" you get the fun job of setting it up!
The hard part is doing it the first time, and having a sane strategy. What you want to do is identify and segment a portion of your traffic. Mostly this means injecting a cooking into the segmented traffics HTTP(S) requests. If you dont have a group of users consistently on the new service you get some odd behavior.
The deployment part is easy. Cause your running things concurrently then ports matter. Just have the alternate version deployed on a different port. This is not a big deal and is supper easy to do. In fact your deployments are probably set up to swap ports anyway. So all your doing is not committing to a final step in that process.
But... what if it is a service to service call inside your network. That too should be easy. Your passing id's around between calls for tracing right? Rather than "random cookie" you're just going to route based on these. Again easy to do in a reverse proxy, easier in a load balancer.
It's not like easy blue green deploys are some magic of kuberneties. We have been doing them for a long time. They were easy to do once set up (and highly scripted as a possible path for any normal deployment).
Kubernetes is to operations what rails is to programing... Its good, fast, helpful... till it isnt and then your left having buyers remorse.
As I see it, managed Kubernetes basically gives me the same abstraction I’d have with Compose, except that I can add nodes easily, have some nice observability through GKE, etc. Compose might be simpler if I were running the cluster myself, but because GKE takes care of that, it’s one less thing that I have to do.
> Hand-writing YAML. YAML has enough foot-guns that I avoid it as much as possible. Instead, our Kubernetes resource definitions are created from TypeScript with Pulumi.
LOL so, rather than linting YAML, bring in a whole programming language runtime plus third party library, adding yet another vendor lock, having to maintain versions, project compiling, moving away from K8S, adding mental overhead...
Most devops disaster stories I’ve heard lately are the result of endless addition of new tools. People join the company, see a problem, and then add another layer of tooling to address it, introducing new problems in the process. Then they leave the company, new people join, see the problems from that new tooling, add yet another layer of tooling, continuing the cycle.
I was talking to someone from a local startup a couple weeks ago who was trying to explain their devops stack. The number of different tools and platforms they were using was in the range of 50 different things, and they were asking for advice about how to integrate yet another thing to solve yet another self-inflicted problem.
It was as though they forgot what the goal was and started trying to collect as much experience with as many different tools as they could.
Would you believe that there is a company that is using cdk8s to handle its K8S configuration, and that such "infrastructure as code" repo ("infrastructure as code", this is the current hype) counts 76k YAML LoCs and 24k TypeScript LoCs to manage a bunch of Rails apps together with their related services? Like, some of such apps have less LoC.
Managing structures in programming language is easier than dealing with finicky optional serialization format.
I have drastically reduced the amount of errors, mistakes, bugs, plain old wtf-induced hair pulling, by just mandating avoidance of YAML (and Helm) and using Jsonnet. Sure, there was some up-front work to write library code, but afterwards? I had people introduced to JSonnet with example deployment on one day, and shipping production-ready deployment for another app the next day.
We use Pulumi for IAC of non-k8s cloud resources too, so it doesn't introduce anything extra. In reality all but the smallest Kubernetes services will want something other than hand-written YAML: Helm-style templating, HCL, etc. TypeScript gives us type safety, and composable type safety. E.g. we have a function that encapsulates our best practices for speccing a deployment, and we get type safety for free across that function call boundary. Can't do that with YAML.
yaml is objectively a bad language for complicated configurations, and once you add string formatting on top of it, you now have a complicated and shitty system, yay.
hopefully jsonnet or that apple thing will get more traction and popularlity.
Good article. I used to be a k8s zealot (both CKAD and CKA certified) but have come to think that the good parts of k8s are the bare essentials (deployments, services, configmaps) and the rest should be left for exceptional circumstances.
Our team is happy to write raw YAML and use kustomize, because we prefer keeping the config plain and obvious, but we otherwise pretty much follow everything here.
k8s is really about you and if it makes sense for your use case. It’s not universally bad or universally good, and I don’t feel that there is a minimum team size required for it to make sense.
Managing k8s, for me at least, is a lot easier than juggling multiple servers with potentially different hardware, software, or whatever else. It’s rare that businesses will have machines that are all identical. Trying to keep adding machines to a pool that you manage manually and keep them running can be very messy and get out of control if you’re not on top of it.
k8s can also get out of control though it’s also easier to reason about and understand in this context. Eg you have eight machines of varying specs but all they really have installed is what’s required to run k8s, so you haven’t got as much divergence there. You can then use k8s to schedule work across them or ask questions about the machines.
This matches our experience as well. As long as you treat your managed k8s cluster as autoscaling-group as-a-service you'll do fine.
k8s's worst property is that it's a cleverness trap. You can do anything in k8s whether it's sane to do so or not. The biggest guardrail against falling into is managing your k8s with terraform-ish so that you don't find yourself in a spot where "effort to do it right" >> "effort to hack in YAML" and finding your k8s cluster becoming spaghetti.
Re: cleverness trap. I feel like this is the tragedy of software development. We like to be seen as clever. We are doing "hard" things. I have way more respect for engineers that do "simple" things that just work using boring tech and factor in whole lifecycle of the product.
Sorry, I could have explained that better. The biggest value add that k8s has is that it gives you as many or as few autoscaling groups as you need at a given time using only a single pool (or at least fewer pools) of heterogeneous servers. There's lots of fine print here but it really does let you run the same workloads on less hardware and to me that's the first and last reason you should be using it.
I wouldn't start with k8s and instead opt for ASGs until you reach the point where you look at your AWS account and see a bunch of EC2 instances sitting underutilized.
Not everyone has money to burn, even back in ZIRP era.
And before you trot out wages for experienced operations team - I've regularly dealt with it being cheaper to pay for one or two very experienced people than deal with AWS bill.
For the very simple reason that cloud provider's prices are scaled to US market and not everyone has US money levels.
When people call Kubernetes a "great piece of technology", I find it the same as people saying the United States is the "greatest country in the world". Oh yeah? Great in what sense? Large? Absolutely. Powerful? Definitely. But then the adjectives sort of take a turn... Complicated? Expensive? Problematic? Threatening? A quagmire? You betcha.
If there were an alternative to Kubernetes that were just 10% less confusing, complicated, opaque, monolithic, clunky, etc, we would all be using it. But because Kubernetes exists, and everyone is using it, there's no point in trying to make an alternative. It would take years to reach feature parity, and until you do, you can't really switch away. It's like you're driving an 18-wheeler, and you think it kinda sucks, but you can't just buy and then drive a completely different 18 wheeler for only a couple of your deliveries.
You probably will end up using K8s at some point in the next 20 years. There's not really an alternative that makes sense. As much as it sucks, and as much as it makes some things both more complicated and harder, if you actually need everything it provides, it makes no sense to DIY, and there is no equivalent solution.
People forgot just how much of a mess Mesos environment was in comparison.
And often pushed Nomad to this day surprises me with randomly missing a feature or two that turns out to be impactful enough to want to deal with more complexity because ultimately the result was less complexity in total.
We've found kubernetes to be surprisingly fragile, buggy, inflexible, and strictly imperative.
People make big claims but then it's not declarative enough to look up a resource or build a dependency tree and then your context deadline is exceeded.
> if a human is ever waiting for a pod to start, Kubernetes is the wrong choice.
As someone who is always working "under" a particular set of infrastructure choices I want people who write this kind of article to understand something: the people who dislike particular infrastructure systems are by-in-large those who are working under sub-optimal uses of them. No one who has the space to think about "if their infrastructure choices will create an effect" in the future hates any infrastructure system. Their life is good. They can choose and most everyone agrees that any system can be done well.
The haters come from being in situations where a system has not been done well - where for whatever combination of reasons they are stuck using a system that's the wrong mix of complex / monitorable / fragile / etc. It's true enough that, if that system had been built with more attention to its needs, that people would not hate it - but that's just not how people come to hate k8s (or any other tool).
* then had a regular aws load balancer that just combined the ami with the correctly specced (for each service) ec2 instances to cope with load
it was SIMPLE + it meant we could super easily spin up the previous version ami + ec2s in case of any issues on deploys (in fact, when deploying, we could keep the previous ones running and just repoint the load balancer to them)
ps putting the jar on a docker image was arguably unnecessary, we did it mostly to avoid "it works on my machine" style problems
> Above I alluded to the fact that we briefly ran ephemeral, interactive, session-lived processes on Kubernetes. We quickly realized that Kubernetes is designed for robustness and modularity over container start times.
Is there a clear example of this? E.g. is kubernetes inherently unable to start a pod (assuming the same sequence of events, e.g. warm/cold image with streaming enabled) under 500ms, 1s etc?
I am asking this as someone who spent quite a bit of time and wasn't able to bring it down 2s< mark, which eventually led us to rewrite the latency sensitive parts to use Nomad. But we are currently in a state where we are re-considering kubernetes for its auxilary tooling benefits and would love to learn more if anyone had experiences with starting and stopping thousands of pods with the lowest possible latencies without caring for utilization or placement but just observable boot latencies.
I do believe that with the right knowledge of Kubernetes internals it's probably possible to get k8s cold start times competitive with where we landed without Kubernetes (generally subsecond, often under 0.5s depending on how much the container does before passing a health check), but we'd have to understand k8s internals really well and would have ended up throwing out much of what already existed. And we'd probably end up breaking most of the reasons for using Kubernetes in the first place in the process.
Yeah, with plain Kubernetes I'd also see the practical limit around ~0.5s. If you are on GKE Autopilot where you also have little control over node startup there is likely also a lot more unpredictability.
Something like Knative can allow for faster startup times if you follow the common best-practices (pre-fetching images, etc.), but I'm not sure if it supports enough of the session-related feature that you were probably looking for to be a stand-in for Plane.
Not much internals needed, but actual in depth understanding of Pod kube-api plus at least basics of how scheduler, kubelet, and kubelet drivers interact.
Big possible win is custom scheduling, but barely anyone seems to know it exists
Yeah, looking into writing a scheduler was basically where we stepped back and said “if we write this ourselves, why not the rest, too”. As I see it, the biggest gains that we were able to get were by making things happen in parallel that would by default happen in sequence, and optimizing for the happy path instead of optimizing for reducing failure. In Kubernetes it's reasonable to have to wait for a dozen things to serially go through RAFT consensus in etcd before the pod runs, but we don't want that.
(I made up the dozen number, but my point is that that design would be perfectly acceptable given Kubernetes' design constraints)
Not surprising to me. People are complaining about how difficult it is to know k8s when you talk about the basic default objects. Getting into the weeds of how the api and control plane work (especially since it has little impact on day to day dev) is something devs tend to just avoid.
Honestly, devs of the applications that run on top probably should not have to worry about it. Instead have a platform team provide the necessary features.
Yeah, I disagree with the OP on the dangers there. They work fairly well for us and aren't the source of headache. Though, I still try and teach my dev teams that "just because bitnami puts in variables everywhere, doesn't mean you need to. We aren't trying to make these apps deployable on homelabs."
Is there something like a k1s? What I’d love is “run this set of containers on this machine. If the machine goes down, I don’t care—I will fix it.” If it wired into nginx or caddy as well, so much the better. Something like that for homelab use would be wonderful.
I run all my projects on Dokku. It’s a sweet spot for me between a barebones VPS with Docker Compose and something a lot more complicated like k8s. Dokku comes with a bunch of solid plugins for databases that handle backups and such. Zero downtime deploys, TLS cert management, reverse proxies, all out of the box. It’s simple enough to understand in a weekend and has been quietly maintained for many years. The only downside is it’s meant mostly for single server deployments, but I’ve never needed another server so far.
Just a note: Dokku has alternative scheduler plugins, the newest of which wraps k3s to give you the same experience you’ve always had with Dokku but across multiple servers.
Dokku really is a game changer for small business. It makes me look like a magician with deploys in < 2m (most of which is waiting for GitHub Actions to run the tests first!) and no downtime.
You've basically described k3s, I think. I run it in my homelab (though I am enough of a tryhard to have multiple control planes) as well as on a couple of cloud servers as container runtimes (trading some overhead for consistency).
k3s really hammers home the "kubernetes is a set of behaviors, not a set of tools" stuff when you realize you can ditch etcd entirely and use sqlite if you really want to, and is a good learning environment.
Docker Compose probably fits the bill for that. They also have a built in minimalist orchestrator called Swarm if you do want to extend to multiple machines. I suppose it's considered "dead" since Kubernetes won mindshare, but it still gets updates.
Docker bare bones or docker compose. Run as systemd services and have docker run the container as a service account. Manual orchestration is all you need. Anything else like rancher or whatever are just fluff.
People don't understand k8s and are thus hating. K8s is a wonderful tool for most things many teams need. It may not be useful for homelab type of stuff as the learning curve is steep, but for professional use it cannot be beat currently. Just a bunch of I know what I'm doing and don't need this complicated thing I don't understand. Pretty simple and especially in a forum such as HN where we all are "experts" and need to explain to ourselves, and crucially others, why we are right not to use k8s. Bunch of children really.
I honestly don't understand it either. Familiarity? K8s has like, what, 5 big concepts to know and once you are there the other concepts (generally) just build from there.
- Containers
- Pods
- Deployments
- Services
- Ingresses
There are certainly other concepts you can learn, but you aren't often dealing with them (just like you aren't dealing with them when working with something like docker compose).
Which is why you don't lose your SRE/OPS team just because you k8s.
I'd say that if you aren't big enough to have dedicated SRE then k8s is not for you. However, it really only takes 1 or 2 people to manage pretty large clusters with 100s or 1000s of deployments.
Something struck with me here that I've been thinking about. OP says a human should never wait for a pod. Agreed, it is annoying and sometimes means waiting for an EC2 and the pod.
We have jobs that users initiate that use 80+GB of memory and a few dozen cores. We run only one pod per node because the next size up EC2 costs a fortune and performance tops out on our current size.
These jobs are triggered via a button click that trigger a lambda that submits a job to the cluster. If it is a fresh node, user has to wait for the 1gb container to download from ECR. But it is the same container that the automated jobs that kick off every few minutes also uses, to rarely is there any waiting. But sometimes there is.
Should we be running some sort of clustering job scheduler that gets the job request and distributes work amongst long running pods in the cluster? My fear is that we just creat another layer of complexity and still end up waiting for the EC2, waiting for the pod to download, waiting for the agent now running on this pod to join the work distribution cluster.
However, we probably could be more proactive with this because we could spin up an extra pod+EC2 when the work cluster is 1:1 job:ec2.
Thoughts?
We're in the process of moving to Karpenter, so all this may be solved for us very soon with some clever configuration.
If you don't want to change the setup too much, consider running your nodes off an AMI with pre-loaded image. Maybe also ensure how exactly the images are layered, so if necessary you can reduce amount of "first boot patch" download.
There is a difference between waiting and waiting.
For an hourly batch job that already takes 10 minutes to run, the extra time for pod scheduling and container downloading is negligible anyway.
What you shouldn’t do is put pod scheduling in places where thousands of users per minute expect sub-second latency.
In your case, if the time for starting up the EC2 becomes a bigger factor than the job itself, you can add placeholder pods that just sleep, while requiring exactly that machine config but request 0 cpus, just to make sure it stays online.
> It’s also worth noting that we don’t administer Kubernetes ourselves
This is the key point. Even getting to the point where I could install Kubernetes myself on my own hardware took weeks, just understanding what hardware was needed and which of the (far too many) different installers I had to use.
Interesting that they avoid helm. It is the "plug and play" solution for Kubernetes. However, that is only in theory. My experience with most operators out there was clunky, buggy, or very limited and did not expose everything needed. But I still end up using helm itself with the combination of ArgoCD.
Helm is just a mess. If you're going to deploy something from helm, you're better off taking it apart and reconstructing it yourself, rather than depending on it to work like a package manager
In my experience, if you use first-party charts (= published by the same people that publish the packaged software) that are likely also provided to enterprise customers you'll have a good time (or at least a good starting point). For third-party charts, especially for more niche software I'd also rather avoid them.
We avoided Helm as well. We found that Kustomize provides enough templating to cover almost all the common use cases and it's very easy for anyone to check their work, kubectl kustomize > compiled.yaml. FluxCD handles postbuild find and replace.
At most places, your cluster configuration is probably pretty set in stone and doesn't vary a ton.
I think the important detail here is that he mentions he doesn't use it because of operators. That may mean they tried it in previous major version which used teller. That was quite a long time ago.
That being said, helm templates are disgusting and I absolutely hate how easily developers complicate their charts. Even the default empty chart has helpers. Why, on Earth, why?
I almost fully relate to OPs aproach to k8s but I think with their simplified approach helm (the current one) could work quite well.
My main issue with Kubernetes is the cluster-scoped Custom Resource Definition (CRD).
We are here because we wanted a way to deal with software that each need different versions of libraries or configuration, so we invented containers. Then we wanted to run multiple services, so we invented orchestrators. But now applications are deployed with cluster-scoped operators that depend on cluster-scoped definitions, and we need to run separate services in separate cluster. It's like with need to create containerization for Kubernetes apps again.
This article talks about using k8s but trying not to use it as much as possible. First example being operators, this is the underlying mechanism that makes k8s possible. To me taking a stance not to use operators but use k8s is less than optimal, or plainly stupid. The whole stack is built on operators, which you inherently trust as you use k8s, but choosing not to use them. Sorry but this is hard to read.
The only thing I learned is about Caddy as a cert-manager replacement, even though I have used, extended and been pretty happy with cert-manager. The rest is hard to read ;(.
When I checked out an operator repo for some stateful services, say, elasticsearch, the repo most likely would contain 10s of thousands of lines of YAML and 10s of thousands lines of Go code. Is this due to essential complexity of implementing auto-pilot of a complex service, or is it due to massive integration with k8s' operators framework?
If Kubernetes is the answer ... you very likely asked the wrong questions.
Reading about JamSocket and what it does, it seems that it essentially lets you run Docker instances inside the Jamsocket infrastructure.
Why not just take Caddy in a clustered configuration, add some modules to control Docker startup/shutdown and reduce your services usage by 50%? As one example.
I’m not sure what you mean by that reducing service usage.
The earliest version of the product really was just nginx behind some containers, but we outgrew the functionality of existing proxies pretty quickly. See e.g. keys (https://plane.dev/developing/keys) which would not be possible with clustered Caddy alone.
My understanding was that K8s itself has overhead, which ultimately has to be paid for, even if using a managed service (it might be included in the cost of what you pay, of course).
I did add the caveat of "with modules" and the idea of sharing values around to different servers would be easy to do, since you have Postgres around as a database to hold those values/statuses.
HTTP proxying is not much of our codebase. I wouldn’t want to shoehorn what we’re doing into being a module of a proxy service just to avoid writing that part. That proxy doesn’t run on Kubernetes currently anyway, so it wouldn’t change anything we currently use Kubernetes for.
There were some "hype cycles" (in Gartner's lingo) that I avoided during my career. The first one was the MongoDB/NoSQL hype - "Let's use NoSQL for everything!" trend. I tried it in a medium sized project and burnt my finger and it was right around when HN was flooded with full of "Why we migrated to MongoDB" stories.
The next one was Microservices. Everyone was doing something with microservices and I was just on a good 'ole Ruby on Rails monolith. Again, the HN stories came and went "Why we broke down our simple CRUD app into 534 microservices".
The final one was Kubernetes. I was a Cloud consultant in my past life and had to work with a lot of my peers who had the freedom to deploy In any architecture they saw fit. A bunch of them were on Kubernetes and I was just on a standard Compute VM for my clients.
We had a requirement from our management that all of us had to take some certification courses so they would be easily to pitch to clients. So, I prepped for one and read about Kubernetes and tried deploying a bunch of applications only to realize it was a very complex piece of moving parts - unnecessarily I may add. I was never able to understand why this was pushed on as normal. It made my decision to not use it only stronger.
Over the course of the 5 year journey, my peers' apps would randomly fail and they would be sometimes pulled over the weekends to push fixes to avert the P1 situation whilst I would be casually chilling in a bar with my friends. My compute engine VM, till date, to its credit has only had one P1 situation yet. And that was because the client forgot to renew their domain name.
Out of all the 3 hype cycles that I avoided in my career, the Kubernetes is the one I really am thankful of evading the most. This sort of complexity should not be normalised. I know this maybe unpopular opinion on HN, but I am willing to bite the bullet and save my time and my clients' money. So, thanks for the hater's guide. But, I prefer to remain one. I'd rather call a spade one.
Early on in the container hype cycle we decided to convert some of our services from VMs to ECS. It was easy to manage and the container build times were so much better than AMI build times.
Some time down the road we got acquired, and the company that acquired us ran their services in their own Kubernetes cluster.
When we were talking with their two person devops team about our architecture, I explained that we deployed some of our services on ECS. "Have you ever used it?" I asked them.
"No, thank goodness" one of them said jokingly.
By this time it was clear that Kubernetes had won and AWS was planning its managed Kubernetes offering. I assumed that after I became familiar with Kubernetes I'd feel the same way.
After a few months though it became clear that all these guys did was babysit their Kubernetes cluster. Upgrading it was a routine chore and every crisis they faced was related to some problem with the cluster.
Meanwhile our ECS deploys continued to be relatively hassle free. We didn't even have a devops team.
I grew to understand that managing Kubernetes was fun for them, despite the fact that it was overkill for their situation. They had architected for scale that didn't exist.
I felt much better about having chosen a technology that didn't "win".
A lot depended on whether the ECS fit what you needed. ECSv1, even with FarGate, was so limited that my first k8s use was pretty much impossible on it at sensible price points, for example.
So you don't use things you don't understand, valid point. But, saying others are using k8s as a way to use up free time is pretty useless too as we have managed k8s offerings and thus don't need the exercise. If you don't need k8s don't use it, thanks. Pretty useless story honestly
The people who dislike kubernetes are, in my experience, people who don’t need to do all of the things kubernetes does. If you just need to run an application, it’s not what you want.
Sure, everyone has their own product and experience and it's fine to express it, but I don't get the usage of other decisions such as "no to services meshes", "no to helm" and many more.
You don't want to ideally reinvent the wheel for every workload you need (say you need a OIDC endpoint, an existing application): you are tempted to write everything from scratch by yourself, which is also fine, but the point is: why?
Many products deliver their own Helm package. And if you are sick of writing YAML, I would look for Terraform over Pulumi, for the reason that you use the same tool for bringing up Infrastructure and then workloads.
Kubernetes itself isn't easy to be used, in many cases you don't need it, but it might bring you nice things straight out of the box with less pain than other tooling (e.g. zero downtime deployments)
The problem with Helm is that it did the one thing you should not do, and refused to fix it even when their promised to.
They do text-replacement templating for YAML.
I have once spent a month, being quite experienced k8s wrangler, trying to figure out why Helm 2 was timing out, only to finally trace it down to how sometimes we would get wrong number of spaces in some lines.
I admit that I use some Helm stuff in my home environment, but for production I'm genuinely worried about the need to support whatever they've thrown into it. At minimum I'm going to have to study the chart and understand exactly what they propose to open-palm slam into my cluster, and for many/most applications at that point it might genuinely be worth just writing a manifest myself. Not always. Some applications are genuinely complex and need to be! But often, this has been the case for me. For all my stuff, though, I use kustomize and I'm pretty happy with it; it's too stupid for me to be clever, and this is good.
Service meshes are a different kettle of fish. hey add exciting new points of failure where they need not exist, and while there are definitely use cases for them, I'd default to avoiding them until somebody proves the need for one.
I understand where most of the complexity in K8S comes from, but it still horrifies and offends me and I hate it. But I don't think it's Kubernetes' fault directly. I think the problem is deeper in the foundation. It comes from the fact that we are trying to build modern, distributed, high availability, incrementally upgradeable, self-regulating systems on a foundation of brittle clunky 1970s operating systems that are not designed for any of that.
The whole thing is a bolt-on that has to spend a ton of time working around the limitations of the foundation, and it shows.
Unfortunately there seems to be zero interest in fixing that and so much sunk cost in existing Unix/Posix designs that it seems like we are completely stuck with a basic foundation of outdated brittleness.
What I think we need:
* An OS that runs hardware-independent code (WASM?) natively and permits things like hot updates, state saving and restoration, etc. Abstract away the hardware.
* Native built-in support for clustering, hot backups, live process migration between nodes, and generally treating hardware as a pure commodity in a RAIN (redundant array of inexpensive nodes) configuration.
* A modern I/O API. Posix I/O APIs are awful. They could be supported for backward compatibility via a compatibility library.
* Native built-in support for distributed clustered storage with high availability. Basically a low or zero config equivalent of Ceph or similar built into the OS as a first class citizen.
* Immutable OS that installs almost instantly on hardware, can be provisioned entirely with code, and where apps/services can be added and removed with no "OS rot." The concept of installing software "on" the OS needs to be killed with fire.
* Shared distributed network stack where multiple machines can have the same virtual network interfaces, IPs, and open TCP connections can migrate. Built-in load balancing.
I'm sure people around here can think of more ideas that belong in this list. These are not fringe things that are impossible to build.
Basically you should have an immutable image OS that turns many boxes into one box and you don't have to think about it. Storage is automatically clustered. Processes automatically restart or, if a hardware fault is detected in time, automatically migrate.
There were efforts to build such things (Mosix, Plan 9, etc.) but they were bulldozed by the viral spread of free Unix-like OSes that were "good enough."
Edit:
That being said, I'm not saying Kubernetes is good software either. The core engine is actually decent and as the OP said has a lot of complexity that's needed to support what it does. The ugly nasty disgusting parts are the config interface, clunky shit like YAML, and how generally arcane and unapproachable and ugly the thing is to actually use.
I just loathe software like this. I feel the same way about Postgres and Systemd. "Algorithmically" they are fine, but the interface and the way you use them is arcane and makes me feel like I'm using a 70s mainframe on a green VT220 monitor.
Either these things are designed by the sorts of "hackers" who like complexity and arcane-ness, or they're hacks that went viral and matured into global infrastructure without planning. I think it's a mix of both... though in the case of Postgres it's also that the project is legitimately old. It feels like old-school Unix clunkware because it is.
Agreed. If Linux were a distributed OS, people would just be running a distro with systemd instead of K8s. (Of course, systemd is just another kubernetes, but without the emphasis on running distributed systems)
That whole concept is bizarre. It's like wanting to fly, so rather than buy a plane, you take a Caprice Classic and try to make it fly.
If CoreOS actually wanted to make distributed computing easier, they'd make patches for the Linux kernel (or make an entirely different kernel). See the many distributed OS kernels that were made over 20 years ago. But that's a lot of work. So instead they tried to go the cheap and easy route. But the cheap and easy route ends up being much shittier.
There's no commercial advantage to building a distributed OS, which is why no distributed OS is successful today. You would need a crazy person to work for 10 years on a pet project until it's feature-complete, and then all of a sudden everyone would want to use it. But until it's complete, nobody would use it, and nobody would spend time developing it. Even once it's created, if it's not popular, still nobody will use it (you can use Plan9 today, but nobody does).
I’ll add to this:
Boot and compute need to be entirely disconnected from one another.
If I have a block of storage that boots on one system, it should boot on another. An OS should poll for available services, allow them to define APIs and be utilized, but no bloat should be added to a base to support elective services.
Everything should be microkernel too, but now I’m just venting.
I'm not entirely convinced that there isn't a better way. With AWS Lambda and alternatives able to run containers on demand, and OpenFaas, they all point to "a better way".
[Edit] Parent comment is almost entirety completely different after that edit to what I responded to. But I think my point still stands.
One day, hopefully in my lifetime, we shall see it.
Yeah I do think lambda-style coding where you move away from the idea of processes toward functions and data are another possibly superior way.
The problem is that right now this gets you lock-in to a proprietary cloud. There are some loose standards but the devil's in the details and once you are deployed somewhere it's damn hard to impossible to move without serious downtime and fixing.
Completely agree, but that's where OpenFaas (or another open standard) comes in.
Hopefully we should get OpenFaas and Lambda, in the same way we have ECS and EKS. Standardised ways to complete tasks, rather than managing imaginary servers.
I don't know it either, but a vague understanding I got in the past was the language itself wasn't very user-friendly. I think Elixir was supposed to solve that.
OT: Can something be done about HN commenting culture so that the comments stay more on topic?
Some technologies (like Kubernetes) tend to attract discussions where half of the commenters completely ignore the original article, so we end up having a weekly thread about Kubernetes where the points of the article (which are interesting) are not able to be discussed because they are drowned out by the same unstructured OT discussions?
At the time of this posting there are ~20 comments with ~2 actually having anything to do with the points of the article rather than Kubernetes in general.
What you're seeing is the early-crowd. With most (not all) posts, comments will eventually rise to the top that are more what you're looking for. IME it usually takes a couple hours. If it's a post where I really want to read the relevant comments, I'll usually come back at least 8 to 12 hours later and there's usually some good ones to choose from. Even topics like Apple that attract the extreme lovers and haters tend to trend this direction
Having read the article, isn't the point of the article kubernetes in general and what the author prescribes you sign up for/avoid?
Discussions of k8s pitfalls and successes in general seems to be very much in line with what the article is advocating. And, to that point, there's frankly just not a whole lot interesting in this article for discussion "We avoid yaml and operators"... Neat.
> Having read the article, isn't the point of the article kubernetes in general and what the author prescribes you sign up for/avoid?
Yeah, and I think that provides a good basis to discussion, where people can critique/discuss whether the evaluation that the author has made are correct (which a few comments are doing). At the same time a lot of that discussion is being displaced by what I would roughly characterize as "general technology flaming" which isn't going anywhere productive.
The solution to that is to flag boring/generic articles and/or post/upvote more specific, interesting articles. Generic articles produce generic, mostly repetitive comments but then again that's what the material the commenters are given.
I remember back when the Cloud first started getting a foothold that what people was drawn to was that it would enable reducing complexity of managing the most frustrating things like the load-balancer and the database, albeit at a price of course, but it was still worth it.
Stateless app servers however, was certainly not a large maintenance problem. But somehow we've managed to squeeze in things like k8s in the there anyway, we just needed to evangelize microservices to create a problem that didn't exist before. Now that this is part of the "culture" it's hard to even get beyond hand-wavy rationalizations that microservices is a must, assumingly because it's the initial spark that triggered the whole chain reaction of complexity.