"the description sounds like the procedure is working directly on the textual re...

touisteur · on Sept 11, 2023

Giving up if something doesn't validate is indeed standard to avoid propagating badly interpreted data, causing far more complex bugs down the line. Validate soon, validate strongly, report errors and don't try to interpret whatever the hell is wrong with the input, don't try to be 'clever', because there lie the safety holes. Crashing on bad input is wrong, but trying to interpret data that doesn't validate, without specs (of course) is fraught with incomprehension and incompatibilities down the line, or unexpected corner cases (or untested, but no one wants to pay for a fully tested all-goes system, or just for the tools to simulate 'wrong inputs' or for formal validation of the parser and all the code using the parser's results).

There are already too many problems with non-compliant or legacy (or just buggy) data emitters, with the complexity in semantics or timing of the interfaces, to try and be clever with badly formatted/encoded data.

It's already difficult (and costly) to make a system work as specified, so subtle variations to make it more tolerant to unspecificied behaviour is just asking for bugs (or for more expensive systems that don't clear the purchasing price bar).

cratermoon · on Sept 11, 2023

There's a difference between parsing and validating. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

You're right about all the buggy stuff out there, and that nobody wants to pay to make it better, though.

touisteur · on Sept 12, 2023

From a safety-critical standpoint, I've always found this article interesting but strange. You want both, before taking into account any data from anything outside of the system. Do both. As soon as possible. Don't propagate data you haven't validated in any way your spec says so. If you have more stringent specs than any standard you're using, be explicit about it, reject the data with a clear failure report. Check for anything that could be corrupted, misformated, something that you're not expecting and could cause unexpected behaviour.

I feel the lack of investment in destroying the parsing- (and validation-) related classes of bugs is the worst oversight in the history of computing. We have the tools to build crash-proof parsers (spark, Frama-C, and custom model checked code generators such as recordflux) that - not being perfect in any way - if they had a tiny bit of the effort the security industry put in mending all the 'Postel's law' junk out there, we'd be working on other stuff.

I built, with an intern, an in-house bit-precise code generator for deserializers that can be proved absent of runtime errors, and am moving to semantics checks ('field X and field Y can only present together', or 'field Y must be greater or equal to the previous time field Y was present'). It's not that hard, compared to many other proof and safety/security endeavours.

cratermoon · on Sept 12, 2023

> It's not that hard, compared to many other proof and safety/security endeavours.

Yes, but the code has to understand and model the input into a program representation: the AST. That's the essence of the "parse, don't validate" paradigm. Instead of looking at each piece of a blob of data in isolation to determine if it's a valid value, turn the input into a type-rich representation in the problem domain.

In the case of the FPRSA-R system in question, it does none of that. It's simply a gateway to translate data in format A to data in format B, like an ETL system. It's not looking at the input as a flight plan with waypoints, segments and routes.

Why the programmers chose to do the equivalent of bluescreening on one failed input, I can't say. As others have pointed out, the situation it gave up on isn't so rare: 1 in 15 million will happen. Of course switching to an identical backup system is a bad choice, too. In safety-critical work, there needs to be a different backup, much like the Backup Flight System in the space shuttle or the Abort Guidance System on the Apollo Lunar Module: a completely different set of avionics, programmed independently.

touisteur · on Sept 12, 2023

One of the reasons developers 'let it crash' is because no one wants to pay for error recovery, and I mean the whole design (including system level), testing, and long-term maintenance of barely used code.

THAT SAID isolation of the decoding code and data structures, having a way back to either checkpoint/restore or wipe out bad state (or, proving the absence of side effects, as SPARK dataflow contracts allow, for example) is better design, I wish would be taught more often. I really dislike how often exception propagation is taught without showing the handling of side effects...

jameshh · on Sept 11, 2023

That's super interesting (and a little terrifying). It's funny how different industries have developped different "cultures" for seemingly random reasons.

cratermoon · on Sept 11, 2023

It was terrifying enough for me in the gig I worked on that dealt with reservations and check-in, where a catastrophic failure would be someone boarding a flight when they shouldn't have. To avoid that sort of failure, the system mostly just gave up and issued the passenger what's called an "Airport Service Document": effectively a record that shows the passenger as having a seat on the flight, but unable to check-in. This allows the passenger to go to the airport and talk to an agent at the check-in desk. At that point, yes, a person gets involved, and a good agent can usually work out the problem and get the passenger on their flight, but of course that takes time.

If you've ever been a the airline desk waiting to check-in and an agent spends 10 minutes working with a passenger (passengers), it's because they got an ASD and the agent has to screw around directly in the the user-hostile SABRE interface to fix the reservation.

3pac · on Sept 11, 2023

SABRE is pretty good compared to the card file it replaced.

cratermoon · on Sept 11, 2023

It's better to say SABRE replicated, in digital form, that card file. And even today the legacy of that card form defines SABRE and all the wrappers and gateways to it.