Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Problem with the Software at Southwest Airlines (reddit.com)
16 points by usui on Dec 31, 2022 | hide | past | favorite | 5 comments


Whether or not any of the post is accurate, it really just comes off as a begrudged ex-employee trying to throw dirt. As a supposedly former "senior software engineer", their entire description of the problem is that their system "went offline due to its outdated software packages and over utilized server resources aka cpu, memory and disk space.". There's zero technical detail, just an incredibly generic statement.


Catastrophic failures are complex and often have many causes. Sure there might be a single root cause, but there’s a ton of other failures along the way.

Each engineer, in their speciality, will point out the problems they are most intimately aware of. This one is possibly doing just that. Unless they’re uniquely positioned and highly perceptive, they’re only aware of a small piece of the whole.


Maybe all the "outdated software packages" just wore out at the same time? Packages are like tires and ya gotta change'em from time to time etc. I agree with your sentiment on this post.


Ironic the OP wants a blameless post mortem culture. But also wants to blame and fire.

The better way is: we’re doing post mortems. Starting here. Anything that doesn’t work gets tossed. Anyone who resists beneficial change without good reason goes too. (Forward looking, results oriented)

I’m also curious what specific thing happened that caused the crew scheduling system to fail. We’ve heard a lot about the hell of trying to schedule by hand, but the COO in his statement seemed to indicate that they had to schedule by hand because the software simply didn’t have the features required to handle the cold start scenario.

If it just crashed because it was out of ram or disk or had old dependencies that’s an entirely different thing.

Overall I don’t see anything of substance in the post. Specifics or GTFO.


Seemed clear to me what they mean: assign full blame to management. The blameless part is for engineers.

> Southwest has many highly technical developers, cloud architects, Site Reliability Engineers and DevOps Engineers, so talent at the engineering level is not a problem. It is the non-technical senior and middle management (particularly the ones who have tenure) in the Technology Services and Operations department that destroy any chance to implement best practices..

then

> Create a blameless culture that allows engineers to...

It would be quite ironic if this post really is from a senior engineer and this is the most granular they can get about the issues. Maybe it really is the engineers who need to be blamed! (But probably not.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: