What I object to here is the core principle that there's something to fix in software development terms here.
There's not. We've got robust, reliable tools for creating nearly bug-free software. We have the usual engineering principles; we have automated testing; we have virtual machines and documented environments and specifications and APIs and all the rest.
The problem is, there is a cost associated with preventing software bugs, and it's a big one. There's a reason Facebook kept saying "move fast and break things" - their software was buggy as shit, but the cost of that was minimal; in situations where bugs don't matter as much, time invested in robust bug-free software is an undesirable expense.
In the Heartbleed case, this is reversed. There's a highly-sensitive, critical bit of Internet infrastructure; there have been minimal resources invested into it, but it is relied on by a lot of key systems.
So the problem is a "social" one, in a sense. OpenSSL as a project did not invest enough in reducing the occurrence of bugs, and this resulted in the vulnerability. It's not the maintainers' fault, really; it was irresponsible for so many of us to rely on the software without considering it's quality.
I suspect we'll start seeing renewed focus on bits of important software like OpenSSL, and greater attention to the engineering tradeoffs. So there's an upside to the whole thing...
No. We don't. As Alan Kay said, programmers don't have a solid concept like an arch to build on. In real engineering, when you build a bridge, a single rusty rivet won't cause it to fall down. In software, a single pair of misplaced braces give you critical "goto fail" bugs. (And don't fool yourself into thinking that static analysis catches all these errors.)
It has become fashionable to say that automated tests somehow turn programming into a rigorous discipline. This is bullshit. Tests are a decent band-aid which kinda help sometimes if you spend half your programming time writing them. And will still miss plenty of horrible bugs. In truth, programming remains a poorly-understood endeavor. Checklists as auto-written automatic tests might actually be a step in the right direction, especially if compilers start to analyze the kinds of mistakes individual programmers tend to make.
> In real engineering, when you build a bridge, a single rusty rivet won't cause it to fall down. In software, a single pair of misplaced braces give you critical "goto fail" bugs.
In real engineering, you design things to be built out of a whole bunch of tiny, redundant components, each of which might or might not fail catastrophically with a given probability, but where the failure of any given component won't bring down anything else. You also source your components such that the failure of one component won't increase the likelihood of a global failure of all instances of that component in your design--you don't get all your rivets from the same manufacturing batch, etc.
So, if we wanted to do software engineering the way we did other kinds of engineering, we'd have to have our computer systems resemble the physical, componentized systems that engineering principles actually apply to. I think this would involve, at least, two things:
1. Building only tiny components that could fail/crash without bringing down the entire project. Most usually, this is accomplished by separating the components into independent networked services.
2. Building several independent implementations of each component, and multiplexing requests to them using worker-pools (with client retry) and quorum-agreement protocols.
I was trying to think of what in software is the equivilent as a 'rivet'. Something small, with a near singular purpose, one who's design has changed relatively little in decades. The only thing I could argue we have in software engineering is perhaps machine operands. For example, just about every processor will have a register and can a operand that will increment that register by one.
The result of this is that most software projects are as complex as the most advanced 'real engineered' projects. As software engineers we design whole "space shuttles" in a month. The 'buggyness' of software engineering compared to 'real engineering' is the manifestation of the complexity of what software engineering does. The tools and processes of 'real engineer' are completely ill-adapted to the rapid development of the massive levels of complexity that is software.
However, there are certain projects where a developer examines every instruction, at the detail that the 'real engineering' principles can help, however, the tools at the forefront in software engineering enable the addition of billions of washers to billions of rivets in days if not hours of design.
No. We don't. As Alan Kay said, programmers don't have a solid concept like an arch to build on. In real engineering, when you build a bridge, a single rusty rivet won't cause it to fall down. In software, a single pair of misplaced braces give you critical "goto fail" bugs. (And don't fool yourself into thinking that static analysis catches all these errors.)
Yes, we do. Misplaced braces can cause catastrophic failure, but that's precisely because engineering principles were not followed; the system was left with a critical vulnerability which meant that a typo could cause catastrophic failure.
We can, however, build systems that don't suffer from these problems. Look at the Shuttle, for instance; multiple redundant machines making the same calculations, and additional machines running totally independently developed software. AFAIK the system has never failed.
So I dispute that there's a problem here in terms of our ability to develop critical software with engineering-level quality. The problem is that software in an order of magnitude more complex to reason about than e.g. a bridge, and as a result developing software to the same level of quality would in most cases be a waste of time.
> software [is] an order of magnitude more complex to reason about than e.g. a bridge
I think you're trying to flatter our profession. Software is not more complex; it's just that we don't know what we're doing well enough to have useable, reliable components. Like arches. Or even just rivets.
To stretch the bridge analogy further (probably almost to its breaking point), it's as if I'm about to start work on the Hoover Dam Bridge, but I have to start by getting some rivets. So I first go out and pick up some pieces of iron ore, and start trying to figure out how to build a smelter that can introduce just the right amount of carbon into them to make decent steel. Knowing a bit about smelter designs, I decide to use a Japanese tatara furnace. I still don't have rivets, but I figure that in a few weeks I'll have enough steel to start making them. (After I make the buy-or-build-build decision on the furnace, that is. I'm not allowed to reuse the one I made for my last project of building music stands, and the commonly-used open-source design has been known to mysteriously turn iron ore into lead instead.) The rest of the bridge has yet to be tackled...
Point being: this is not a "we know enough, it's just complicated" problem. We know almost nothing and our tools are appallingly primitive. To quote some classic hacker humor: "I was taught assembler / In my second year of school / It's kinda like construction work / With a toothpick for a tool." Assembler and Haskell have much more in common than a toothpick and a Caterpillar D9.
> The problem is that software in an order of magnitude
> more complex to reason about than e.g. a bridge, and
> as a result developing software to the same level of
> quality would in most cases be a waste of time.
That's only true while we continue to develop software the way we do. If we change the way we develop software, if we create new tools, if we follow different processes, if the discipline grows up and leaves adolescence, then the cost-benefit balance will change, and maybe we can get software developed at a more reasonable cost, and also with fewer bugs.
This is already happening with unit tests, regression tests, warnings in compilers, and languages that remove entire classes of errors, but still some remain. More, some of those that remain are catchable, if only we extended our checking and processes to capture them. That seems to me what this article is about - finding ways of enhancing existing techniques in a way that gives more benefit than costs.
Say what you like about "move fast and break things", following that idea, things are getting broken and need fixing! If the balance is changed then we can move just as fast, break fewer things, and then re-allocate those resources that would have spent fixing them, deploying them instead in more profitable ways.
Using wiki's in two different companies provided us with these checklist like capabilities. We used MoinMoin at a prior company and now use MediaWiki, but the technology isn't the point.
Everyone contributes to the wiki and it accrues knowledge from experience. Some process documentation is carefully reviewed for accuracy, but it also left a bit free-form so as to accommodate situation specific variations from the norm.
The verification team (I dislike the term QA) encapsulates as much as possible into the testing framework, but since the wiki based descriptions are so much more rich, they remain a go to resource.
Recently I moved the IT/Networking staff to start making their project plans in the wiki. Their defect rate has plummeted to nearly zero - mostly because they are making plans! But what they've discovered is that they now have frameworks of plans (checklists) from which they can derive their next plan.
My current team is the first one in my career as a software engineer in which our testing infrastructure is robust enough to think like this. I just resolved a bug this week by writing a test that failed due to its presence (and leaving a comment with the ID of the bug), changing the code, then re-running the test to ensure that it passed. This confidence in our code base is wonderful; I have never felt so little fear when deploying.
Looking back, it amazes me that I ever didn't develop software this way.
Go does. So does Rust. Both languages have some sort of unit testing facilities packaged with the standard distribution. And both are dead simple to use.
Whenever I introduce a bug that could have been caught automatically I either add the missing unit test or I add a new git hook (there is of course a hook to run unit tests). From missing license files, build failures, to just spelling mistakes in the commit message there is no reason for these to exists in the repository.
I created https://github.com/icefox/git-hooks to manage hooks across repositories, users, and system wide. For example if in a project I add a new hook making sure I never commit an invalid xml file, all repos that have xml files will get the hook for free.
I find the comparison of software development and surgery/flying an air plane very lacking. It restricts the term "Software" in the post title to very critical systems (OpenSSL admittedly being one of them).
The author proposes automated tests/Software "checklists" as the solution to all problems. The problem with automated tests is that they can only catch "expected" bugs. The author describes an idea for an automated test that _might_ have caught the Heartbleed bug. Big deal, now that we know all about the bug. A next unforeseen type of vulnerability will come, and no automated test will catch it.
I think the article greatly exaggerates the applicability and capabilities of automated tests (or "check lists"). I'm a big fan of all things TDD and consider myself a very rigorous programmer (surely often to the detriment of speed/pragmaticism). But I don't need a check list of the complexity of a flight check to describe the rules I use in my work.
> A next unforeseen type of vulnerability will come, and no automated test will catch it.
But Heartbleed wasn't a new type of vulnerability. It's the same type of vulnerability we've known about at least since the dawn of C. Almost all bugs are of known types. How often do we see new types of vulnerabilities? Maybe a couple a year. If we had only a couple vulnerabilities a year across all software, we'd be many orders of magnitude better off.
Sure, I'm not saying the proposed automated test is bad. I think it might be sensible to add it. As I said, I'm 100% for automated tests (I should be, as the co-founder of a startup specializing in test automation tools). But ascribing magical capabilities to catch all future bugs to automated tests / "check lists" is IMHO detrimental to the cause because it leads to too high expectations.
A lot of stuff that gets on checklists is because not knowing about it killed someone. Other documented procedures came from time in the simulator - somewhat anagolous to manual user testing.
To be honest, the only people I know ascribing "magical capabilities" to unit testing are those disparaging it. The rest of us think it's great, and makes life a lot easier, but we've all been bitten by that unexpected bug when you see it working in the real world.
Exactly. I've always hated the "You can't test what you don't know so it's worthless." Well sorry, I can test what I do know and make sure that I don't break that. As new errors come upon, you add it to your tests (akin to checklists, essentially), and make sure you don't break that use case again.
Except that lots of errors are already known, and get repeated anyhow. Saying that we can't guard against unforeseen errors is not quite the point.
Errors we already know about are known because they are common. Preventing those errors is a net win, regardless of whether unforeseen errors will occur.
Checklists have been proven to work, but the software I write (low-level high-security/crypto software) is less repetitive than (most) surgeries. I haven't managed to use anything checklist-like while writing code; however, quickly pattern-matching against a mental library has been exceedingly useful when _reviewing_ code.
Here are some sources I would recommend to people building their own mental checklists:
- The Art of Software Security Assessment (thanks to tptacek); I actually learnt most of this from bits 'n pieces scattered over the web, but it's a very convenient collection;
- Common crypto flaws. Always look for authentication, _then_ encryption, _and_ replay protection. (The latter is often forgotten even in otherwise good protocols.) If a password is used anywhere, consider bruteforcing it (bcrypt may stop such attacks - or not.) Check that IVs are used as demanded by the mode (in particular, that CTR or GCM IVs are not reused.) Look for timing attacks; the easy-to-spot and high-impact ones are use of memcmp() with HMACs and use of CBC on unauthenticated data (padding oracle).
- Buffer overflows may be bad, but integer under-/overflows are much more common (in decent code.) Basically, you need a check before any operation involving a user-supplied length, and this is very commonly forgotten. (Using functions with internal overflow checking, like a decent calloc, helps.)
- Consider the wider context. Spotting a buffer overflow is easy. Noting that you forgot to <? include auth.php ?> is harder. You likely won't notice that the latest "cleanup" leaves the firmware updater open to the internet unless you actually spend five seconds thinking about the wider system.
(I'm quite interested in other's tips; the above is clearly very slanted. Also, I feel vaguely guilty about doing all of this in memory instead of from a paper checklist - does anyone have positive or other experiences with those?)
Perhaps it's worth considering that maybe the reason that checklists aren't the norm in the FOSS "meritocracy" is that they hinder progress, for a certain value of progress. Maybe there was a stealth project that could have been OpenSSL, developed with scrict adherance to checklists, but OpenSSL won because it didn't have that burden? I suspect this applies more broadly to startups, too.
Maybe checklists are a silent killer in the natural selection of the software ecosystem, and that's why so much of our software is tripping over peacock feathers?
With software this should be your test suite. No test is too simple. No feature or detail too small to test. Replace the line "relentless checklist" with "relentless testing" and you're good to go. Don't deploy before all tests are green and that's the same as not flying without running through your checklist.
I grew up flying with my dad in his Cesna 172 and admiring how well the pre-flight checklist worked. When I first heard about TDD, I immediately recognized it as the same process, except automated.
That's what I first thought. Then I realized that test suites as most people think of them would not have caught the Heartbleed bug, whereas a checklist like this would have done so.
Testing is testing, but there are different ways of thinking about it, and different ways to have it embodied. Test suites are one embodiment, and higher-level checklists can complement them.
I guess I've always thought of my integration tests as the higher-level checklist. I've only done web development, so that works. I could see how if you're writing software for something as important as OpenSSL then you might need an extra set of checks, but there's still room for human error. Seems like there should be someway to automate everything required to deploy software.
But I realize this is an idealistic approach. The reality is that a good test suite with higher-level checklists is probably more practical.
> This can be automated! This could be mandated as a procedure to be gone through before the code is released. And doing this would prevent this kind of bug from ever happening again.
I might be missing something, but this sounds to me very much like a usual test case. :) Having one in a suite of system or acceptance tests would indeed prevent further occurrences of the bug.
The optimist's response to "What could possibly go wrong?" is a swing for the fences. The pessimist's response is a checklist. It's seeking to avoid the worst outcomes and allowing success to come in due time. It's braking early and accelerating from the apex as opposed going full speed and breaking things fast.
This seems like a fallacy of false dilemmas. The optimist can swing harder because, thanks to professional pessimism, the bat won't crack halfway through the swing, the stands don't collapse before the ball reaches them, the ball doesn't explode on contact, and the rules for what constitutes a home run don't have to be constantly reinvented by each player individually.
Yes, we agree that the optimist's imagination limits the list of things that could possibly go wrong to the absurd. And as you suggest perhaps an optimist's optimism derives from a concept of professionalism born the love child of dunning-kruger. Or perhaps the horns are grown on a strawman dividing the world into naught but optimists and pessimists according to the law of the excluded muddle.
I think he's missing an important point. C should not be used for safety critical systems code if possible, period. A language that prevents those leaks should be employed in its place.
Checklists are good, but you need to enforce them. If such a language is used instead, these errors are impossible by design.
I think you're missing the point. Whether or not we're talking about C is a detail, what is more important is that the lessons we learn need to be encapsulated and encoded into procedures that everyone can apply and learn from.
Equivalent mistakes can be made in any language powerful enough to do real work. Off-by-one, behaving inappropriately on parameters outside those expected, lots of things can go wrong in any language. Capturing your knowledge in a form that can be used for testing is the idea, and that seems to be missing entirely from computing.
Languages can eliminate large classes of such mistakes.
Taking your own two examples: most off-by-one errors go away if you replace "for (i = 0; i < datalength; i++)" with "for i in data" or something like that; and by replacing < or <= comparisons with checks for range membership. A checklist might say - "don't have assignments in your if statement conditions like 'if (x=0)', it's risky"; but your language or tools around it may make it impossible to do it.
"Behaving inappropriately on parameters outside those expected" is correlated with the frequency of getting unexpected parameters. No static typing? You'll have to check at runtime if the parameters are of the type that you expect. Null/undefined values? Again, you may remember to check everywhere or the language can force you to check really every time, not almost every time. Niche languages like Eiffel can help you ensure that you know that the parameters are within the expected range, etc.
Capturing such knowledge and encoding in a way suitable for everyone is done in this way - it does break backward compatibility and requires rewriting code and abandoning libraries or even languages; but the checklist equivalent of "don't ever turn on the engine before checking X, even if you think it's okay" is "don't use C strcpy ever, even if you think you know it's correct there".
> what is more important is that the lessons we learn need to be encapsulated and encoded into procedures that everyone can apply and learn from.
But in software, we can do better than that! We don't have to check a checklist ourselves if we can make a tool that makes it impossible to do the wrong thing (or rather, we don't have to have that thing on the checklist).
Some places do have checklists for software engineering, if not software. For example, a simplified process for deploying code might be:
1. Check that code compiles
2. Run test suite
3. Get code reviewed
4. Push to production
And having such a list does help to ensure that code doesn't get pushed to production without being tested and reviewed. But it's better to use a system which automatically runs the tests and checks that it's been reviewed before the code goes into production (unfortunately the review itself can't be automated -- though parts of it can be, and that's useful too).
Some languages, by design, exclude or drastically reduce certain classes of error.
Edit: I think I'm missing the point too. Proposing the perfect world as an engineering solution is silly. So in the context of "it's written in C", checklists would be a step forward.
And when we moved away from C for a lot of really important things, we saw memory issues and buffer overruns replaced by other trivial mistakes. In the late 90s, buffer overruns became a joke bug/security issue. In the late 00s we saw SQL injection (and similar injection attacks), XSS and other parser confusion tricks become the joke bug/security issue. Using python or ruby didn't magically fix these, and there are still fairly regular issues in using the libraries that enforce input and sanity checking. Heck those libraries still get big holes - there were a few active record issues not that long ago where the tool to sanitize data actually opened a hole! (Not to pick on any tool/framework - that one was just well publicized.)
I think the main point speaks more to the old saw "If you make something idiot proof, someone will just make a better idiot". Tools that automatically "fix old problems" are generally complicated an imperfect, and it becomes easy to accidentally trust the system to do the right thing on an edge case it doesn't handle. Having the automation of as many cases as possible is good, but having a full system to do in-depth knowledge preservation and issue capture is even better. The two things complement each other well.
Going back to the OP's original example/analogy - planes these days are full of automated systems, higher reliability parts, better interfaces, and all sorts of other improvements and failsafes so some errors from the past just can't happen. That doesn't mean they still don't do checklists and other manual checks to prevent the new issues from happening, nor have they abandoned some of the old basic checks, even if they "can't happen".
tl;dr - better tools are good, but they don't fix everything
Gerald Weinberg put it this way: when you solve the most important problem, you've promoted the second most important problem into first place.
That doesn't change that problem #1 was worthy to solve. The middle road is that we should neither rest on our laurels nor give up because of the impossibility of the perfect world.
With some discipline, it's possible to achieve the same level of security in C as in pretty much any other high-level language.
Some useful techniques: always strive for simplicity, use types to encapsulate data dependencies and enforce error checking (e.g. store the length and pointer of an array as part of a struct and use methods that perform bounds checks to modify or read from it -- granted, this style of programming can add overhead, but C is so fast to begin with that it rarely matters), don't use unsafe standard library functions, test rigorously, valgrind everything, use static code analysis tools.
Doing any of these things goes a long way to eliminate the vast majority of bugs specific to C, but unfortunately way too many C projects hardly do any of them. It can evidently be done, though, as there are some very robust C libraries out there.
The problem is that C requires positive effort, above the baseline, to have those guarantees.
Any system which relies on positive human effort is more likely to fail than a system which simply sets a higher baseline.
You can be safe on a motorbike if you are very careful, avoid dangerous conditions, drive more slowly than you want and practice extreme care around cars. You are still more likely to die. The baseline is simply lower and, when things do go wrong, you have less safety buffer than someone in a car.
Using a system like this really helps for complicated and/or important code. All of our safety critical and embedded code references a checklist before it is made into a pull request. It really helps catch all of the "rubber ducky" bugs.
Watts Humphreys was probably the first to the idea of scaling development processes down, rather than up, arriving in the 90s at the Personal Software Process.
The PSP is very data-intensive. You're meant to log pretty much everything you do.
For example, you log your mistakes. Missed a semicolon? Log the mistake. Typo in a variable name? Log the mistake. Critical misunderstanding of how a library works? Log the mistake.
After a while you use your data to correct your common errors with ... checklists! And other tools and practices, of course, but checklists are a big part of the PSP.
It's worth reading A Discipline for Software Engineering, even if you never apply it.
Yeah, I haven't touched the site in what feels like forever...unfortunately, don't have the time to fill it up with information... I might return to it one day
Isn't this what static analysis, penetration testing, and various other automated smoke tests made for? Instead of adding to a checklist these tests are updated to account for new vulnerabilities.
The checklist is about making sure you have used those tools.
This is not how I would recommend running the average startup, but for mission critical software having someone other than the author review all the code using a checklist is probably the best way to ensure quality. The author of the code would have the same checklist and would be expected to hit all the points before handing the code over to the reviewer.
Atul Gawande is probably the foremost proponent of surgical checklists and in that article is responding to an Ontario study reporting that checklists were not significant---it sounds like a pretty bad study.
Surgeons don't particularly like checklists.
"For starters, the most controversial idea for teams to accept is perhaps the simplest item in the checklist. Require all team members say their names prior to the launch of the procedure.
"'This has been one of the most important things that help people feel comfortable speaking up' if they're unsure or unclear, for example, that this is the right patient, right site, right procedure.
"'It acknowledges that you're part of a team and are allowed to speak.'
"Gawande says that there has been resistance to accepting checklists at another level. 'The concept has forced us rethink what it means to be great at what we do. And I hadn't grasped this until I saw it recur over and over again. There's a set of values in the idea of a checklist, and they're in distinct conflict with some of the values we have in medicine.'
"'We value physician autonomy, which works well when there are just two full time equivalents providing care, but when we have 19.5 FTEs trying to make things work, it becomes a problem.'"[1]
Great reference. That first point relates more to Crew Resource Management (CRM)[1], an important concept making its way from aviation to surgery. After a decade flying F-18s for the Navy, I value checklists but communication is probably even more important and often overlooked. Finding the courage to speak up is more challenging then it seems. It's so easy to assume that the Captain, Surgeon, or whoever is in charge knows what they are doing. One of the worst accidents in history[2] that helped kick off CRM, could have likely been prevented if the co-pilot had assertively told the Captain that they weren't cleared for take-off.
I still remember the Navy's version and believe it's applicable to many places other than aviation.
D - decision making
A - assertiveness
M - mission analysis
C - communication
L - leadership
A - adaptability / flexibility
S - situational awareness
But they most assuredly want you to use a checklist if you're going to do surgery on them. See this telling passage at the very end of Chapter 7 of Gawande's book:
Then we asked the staff one more question. “If you were having an operation,” we asked, “would you want the checklist to be used?” A full 93 percent said yes.
There's not. We've got robust, reliable tools for creating nearly bug-free software. We have the usual engineering principles; we have automated testing; we have virtual machines and documented environments and specifications and APIs and all the rest.
The problem is, there is a cost associated with preventing software bugs, and it's a big one. There's a reason Facebook kept saying "move fast and break things" - their software was buggy as shit, but the cost of that was minimal; in situations where bugs don't matter as much, time invested in robust bug-free software is an undesirable expense.
In the Heartbleed case, this is reversed. There's a highly-sensitive, critical bit of Internet infrastructure; there have been minimal resources invested into it, but it is relied on by a lot of key systems.
So the problem is a "social" one, in a sense. OpenSSL as a project did not invest enough in reducing the occurrence of bugs, and this resulted in the vulnerability. It's not the maintainers' fault, really; it was irresponsible for so many of us to rely on the software without considering it's quality.
I suspect we'll start seeing renewed focus on bits of important software like OpenSSL, and greater attention to the engineering tradeoffs. So there's an upside to the whole thing...