The standard approach is be liberal in what you accept and be specific in what y...

zajio1am · 2025-05-27T15:05:10 1748358310

There is already RFC 7606 (Revised Error Handling for BGP UPDATE Messages), which specifies in detail how broken BGP messages should be handled.

The most common approach is 'treat-as-withdraw', i.e. handle the update (announcement of a route) as if it was a withdraw (removal of previously announced route). You should not just drop the broken message as that whould lead to keeping old, no longer valid state.

hannob · 2025-05-27T17:52:38 1748368358

> The standard approach is be liberal in what you accept and be specific in what you emit.

What you're paraphrasing here is the so-called "robustness principle", also known as "Poestel's law". It is an idea from the ancient history of the 1980s and 09s Internet. Today, it's widely understood that it is a misguided idea that has led to protocol ossification and countless security issues.

senderista · 2025-05-27T18:13:30 1748369610

Postel's Law certainly has led to a lot of problems, but is it really responsible for protocol ossification? Isn't the problem the opposite, e.g. that middleboxes are too strict in what they accept (say only the HTTP application protocol or only the TCP and UDP transport protocols)?

fc417fc802 · 2025-05-27T18:52:55 1748371975

Overly strict and overly liberal both lead to ossification. That's merely the observation that buggy behavior in either direction can potentially come to be relied on (or to be unpredictably forced on you, in the case of middleboxes filtering your traffic).

I'd only expect security issues to result from being overly liberal but 1. I wouldn't expect it to be very common and 2. I'm not at all convinced that's a compelling argument to reduce the robustness of an implementation.

lambdaone · 2025-05-28T10:18:52 1748427532

"Overly strict" only leads to ossification if the designers of the system forget to build extensibility into their system design from the beginning.

fc417fc802 · 2025-05-28T23:02:16 1748473336

"Overly" here refers to restrictions that exceed the relevant standard. An extensibility mechanism is useless if a nonzero fraction of the network filters out messages that make use of it in certain ways.

NooneAtAll3 · 2025-05-27T23:25:16 1748388316

what does ossification mean?

open source something?

jraph · 2025-05-28T04:22:07 1748406127

Ossification comes from os, ossis: bones in Latin. Turning into bones. Stops being flexible. Common behavior becomes de facto specification. There's stuff that's allowed by the specification but not expected by implementations because things have always worked like this.

It's not related to open source software. The seemingly matching prefix is coincidence :-)

https://en.m.wiktionary.org/wiki/ossification

SchemaLoad · 2025-05-28T04:09:07 1748405347

Pretty much when something in the spec in theory could change, but in practice never does. So software and hardware gets built around the assumption that it never changes.

For example for networking you can have packets sent using TCP or UDP, but actually there could be any number of protocols used. But for decades it was literally only ever those two. Then when QUIC came about, they couldn't implement it at the layer it was meant to be because all the routers and software were not built to accept anything other than TCP or UDP.

There's been a bunch of thought in to how to stop this stuff like making sure anything that can change, regularly does. Or using encryption to hide everything from routers and software that might want to inspect and tamper with it.

remexre · 2025-05-27T23:52:12 1748389932

It's when existing implementations' inflexibility prevent protocol evolution.

https://en.wikipedia.org/wiki/Protocol_ossification

baq · 2025-05-28T10:14:33 1748427273

literally it means that something is slowly turning into stone, like dinosaur bones. protocols and standard libraries suffer from this in figurative sense.

hoseja · 2025-05-28T10:02:35 1748426555

Sclerotization.

AnthonyMouse · 2025-05-27T23:36:19 1748388979

The trouble is it fails to specify what you're supposed to be liberal with.

Suppose you get a message that violates the standard. It has a length field for a subsection that would extend beyond the length of the entire message. Should you accept this message? No, burn it with fire. It explicitly violates the standard and is presumably malicious or a result of data corruption.

Now suppose you get a you don't fully understand. It's a DNS request for a SRV record but your DNS cache was written before SRV records existed. Should you accept this message? Yes. The protocol specifies how to handle arbitrary record types. The length field is standard regardless of the record type and you treat the record contents as opaque binary data. You can forward it upstream and even cache the result that comes back without any knowledge of the record format. If you reject this request because the record type is unknown, you're the baddies.

zajio1am · 2025-05-28T00:27:41 1748392061

I would say the proper way to apply Postel's law is to reasonable interpretations of standards. Internet standards are just text documents written by humans and often they are underspecified or have multiple plausible interpretations. There is no IETF court, which would gives canonical interpretation (well, appropriate working group could make a revision of the standard but that is usually multi-year effort). So unless we want to break up to multiple non-interoperable implementations, each strictly adhering to their own interpretation, we should be liberal about accepting plausible interpretations.

AnthonyMouse · 2025-05-28T03:06:36 1748401596

That's not really the issue though.

There are many cases where the RFC is not at all ambiguous about what you're supposed to do, and then some implementation doesn't do it. What should you do in response to this?

If you accept their garbage bytes, things might seem less broken in the short term, but then every implementation is stuck working around some fool's inability to follow directions forever, and the protocol now contains an artificial ambiguity because the bytes they put there now mean both what they're supposed to mean, and also what that implementation erroneously uses them to mean, and it might not always be detectable which case it is. Which breaks things later.

Whereas if you hard reject explicit violations of the standard then things break now and the people doing the breaking are subject to complaints and required to be the ones who stop doing that, rather than having their horkage silently and permanently lower the signal to noise ratio by another increment for everyone else.

One of the main problems here is that people want to be on the side of the debate that allows them to be lazy. If the standard requires you to send X and someone doesn't want to do the work to be able to send X then they say the other side should be liberal in what they accept. If the standard requires someone to receive X and they don't want to do the work to be able to process X then they say implementations should be strict in what they accept and tack on some security rationalization to justify not implementing something mandatory and thereby break the internet for people who aren't them.

But you're correct that there is no IETF court, which is why we need something in the way of an enforcement mechanism. And what that looks like is to willingly cause trouble for the people who violate standards, instead of the other side covering for their bad code.

DrillShopper · 2025-05-28T14:17:02 1748441822

> If you accept their garbage bytes, things might seem less broken in the short term, but then every implementation is stuck working around some fool's inability to follow directions forever, and the protocol now contains an artificial ambiguity because the bytes they put there now mean both what they're supposed to mean, and also what that implementation erroneously uses them to mean, and it might not always be detectable which case it is. Which breaks things later.

And, if your project is on GitHub, gets your Issues page absolutely clowned on because you're choosing to do the right thing technically and the leeching whiners shitting up the Issues don't want to contribute a goddamn thing other than complaints, and they definitely don't want to go to the authors of the thing that doesn't work with your stuff and try and get that fixed either.

senderista · 2025-05-28T04:42:00 1748407320

That is the only context in which Postel's Law actually works, but it is obviously not the world of the internet.

thaumasiotes · 2025-05-27T19:05:37 1748372737

It's a description of how natural language is used, so what you'd expect is constant innovation, with protocols naturally developing extensions that can only be understood within local communities, even though they aren't supposed to.

Something like "this page is best viewed in Internet Explorer" as applied to HTML.

arccy · 2025-05-27T22:17:48 1748384268

see https://datatracker.ietf.org/doc/html/rfc9413#section-4.2

fc417fc802 · 2025-05-27T18:49:44 1748371784

> Today, it's widely understood that ...

Widely claimed by some but certainly not "widely understood" because such phrasing implies a lack of controversy regarding the claim that follows it.

oblio · 2025-05-27T19:33:37 1748374417

It's kind of common sense, though. Look at HTML. So badly/under defined that it wasn't even testable for close to 2 decades.

The sane approach is to be strict and provide great error messages.

fuddy · 2025-05-27T20:14:33 1748376873

This "sane" approach lost to HTML.

oblio · 2025-05-28T13:23:05 1748438585

It's called "backwards compatibility/legacy systems inertia" and plenty of bad but old techs will never die. It doesn't make them good.

A good HTML might not even look like HTML.

fuddy · 2025-05-29T09:44:42 1748511882

That's only one distinct component. HTML vs XHTML was also a distinct aspect (syntax ambiguity was a lesser problem than larger ambiguity. The WHATWG fiasco is IMO more important to the point that low quality half baked new features is not an accident but a goal.)

XHTML reveals though that HTML won on ambiguity over pedantic error identification. The adopters it needed rallied against anything that would tell them what they should do from day 1 to unambiguously say what they mean. Starting with a fundamentally flawed demo, blog, shop that ropes in some commitment and gradually fixing things on the in-for-a-dime-in-for-a-dollar investor is basically the whole business model of most fields if you exclude exchanges between the top 1-10% of buyers and sellers, which have an entirely different structure.

Even things like Facebook are an example of the manure first model. I wouldn't be stupid enough to let Zuckerberg plan lunch and as an investor I'm about as savvy as someone who bet against HTML. A billion flies can't be wrong as the saying goes.

runlevel1 · 2025-05-28T02:22:38 1748398958

You're referring to XHTML 2?

lambdaone · 2025-05-28T10:17:08 1748427428

Postel's law is absolutely great if you want to make new things and get them going in a hurry, and I think it was one of the major reasons the TCP/IP stack beat the ISO model. But as you say, it's a disaster if you want to build large robust systems for the long term.

arp242 · 2025-05-28T12:16:14 1748434574

1970s was also just a different time: documentation was harder to get, it was harder to do quality implementations for protocols, people had less of an idea what may or may not work because everyone was new at this (both in terms of protocols and implementations), shipping bugfixes took a lot longer, few people were writing tests (and there wasn't a standard test suite), few people had long experience with these protocols, and general quality of software was a lot lower.

sc68cal · 2025-05-27T13:04:05 1748351045

The problem is that folks took advantage of the behavior of BGP where it would forward unknown attributes that the local device didn't understand, as a means to do all sorts of things throughout the network. People now rely on that behavior.

Now, we're experiencing the downside of this "feature"

spwa4 · 2025-05-27T14:29:31 1748356171

BGP has classes attributes that it forwards. While it is true that it forwards route attributes it doesn't know about, this was an attribute that it DID know about and knows it shouldn't forward.

In fact it's a bit strange just how lenient Juniper's software was here. If a session is configured as IBGP on one end and EBGP on the other end, it should never get past the initial message. Juniper not only let it get past the connection establishment but forwarded obviously wrong routes.

sc68cal · 2025-05-27T15:00:05 1748358005

Yes but you are seeing a symptom of what I believe is a fundamental design decision to be liberal in passing on data and then _later_ go through and build logic that stops certain things from being forwarded, and the result is that things slip through the cracks that shouldn't.

Rather than the inverse where you only forward things explicitly and by default do not forward.

icehawk · 2025-05-27T17:36:02 1748367362

As far as I'm aware "a session is configured as IBGP on one end and EBGP on the other end" isn't possible.

You can't configure it like that, most of the BGP implementations I'm familiar with automatically treat the a same-AS neighbor as iBGP and a different-AS neighbor as eBGP.

Juniper explicitly has 'internal' and 'external' neighbors, but you can't configure a different peer AS than your own on an internal neighbor or the same peer AS on an external neighbor.

BGP sessions also have the AS of the neighbor specified in the local config, and will not bring up the session if it's not what's configured.

zajio1am · 2025-05-27T15:07:43 1748358463

Without this behavior it would be impossible to deploy newer BGP attributes globally.

sc68cal · 2025-05-27T15:14:48 1748358888

I understand that, but it's a double edged sword. We enjoyed that flexibility for a long time, but lately we are now experiencing the downsides of this flexibility.

billfor · 2025-05-27T14:03:22 1748354602

Author makes this point in a related post:

At a glance this “feature” seems like an incredibly bad idea, as it allows possibly unknown information to propagate blindly through systems that do not understand the impact of what they are forwarding. However this feature has also allowed widespread deployment of things like Large Communities to happen faster, and has arguably made deployment of new BGP features possible at all.

klysm · 2025-05-27T12:58:40 1748350720

I disagree with this approach. Being very very specific in what you accept and very very specific in what you emit seems better to me.

erincandescent · 2025-05-27T14:22:27 1748355747

Being that prescriptive is fundamentally unworkable in practice. Propagating unknown attributes is fundamentally what made the deployment of 32-bit AS numbers possible (originally RFC 4893; unaware routers pass the `AS4_PATH` attribute without needing to comprehend it), large communities (RFC 8092), the Only To Customer attribute (RFC 9234) and others.

A BGP Update message is mostly just a container of Type-Length-Value attributes. As long as the TLV structure is intact, you should be able to just pass on those TLVs without problems to any peers that the route is destined for.

The problem fundamentally is three things:

1. The original BGP RFC suggests tearing down the connection upon receiving an erroneous message. This is a terrible idea, especially for transitive attributes: you'll just reconnect and your peer will resend you the same message, flapping over and over, and the attribute is likely to not even be your peer's fault. The modern recommendation is Treat As Withdraw, i.e. remove any matching routes from the same peer from your routing table.

2. A lack of fuzz testing and similar by BGP implementers (Arista in this case)

3. Even for vendors which have done such testing, a number of have decided (IMO stupidly) to require you to turn on these robustness features explicitly.

hinkley · 2025-05-27T18:15:30 1748369730

PNG solved this problem when BGP was still young: each section of an image document is marked as to whether understanding it is necessary to process the payload or not. So image transform and palette data is intrinsic, but metadata is not. Adding EXIF for instance is thus made trivial. No browser needs to understand it so it can be added without breaking the distribution mechanism.

zajio1am · 2025-05-27T18:41:29 1748371289

This is also how BGP (mostly) solved it. Each attribute has 'transitive' bit. Unknown attributes with 'transitive' bit set are passed, one without are discarded.

immibis · 2025-05-28T11:10:19 1748430619

... Except for acTL, which is a special exception because it turns out that wasn't sufficient to ensure consistency in 100% of cases.

hinkley · 2025-05-28T16:54:42 1748451282

I was never that enthusiastic about motion PNGs in the first place. We have so many other ways to achieve that now.

Diggsey · 2025-05-27T21:02:02 1748379722

You're suggesting that being liberal in what you accept is necessary for forward evolution of the protocol, but I think you're presenting a false dichotomy.

In practice there are many ways to allow a protocol to evolve, and being liberal in what you accept is just about the worst way to achieve that. The most obvious alternative is to version the protocol, and have each node support multiple versions.

Old nodes will simply not receive messages for a version of the protocol they do not speak. The subset of nodes supporting a new version can translate messages into older versions of the protocol where it makes sense, and they can do this because they speak the new protocol, so can make an intelligent decision. This allows the network to function as a single entity even when only a subset is able to communicate on the newer protocol.

With strict versioning and compliance to specification, reference validators can be built and fitted as barriers between subnetworks so that problems in one are less likely to spread to others. It becomes trivial for anyone to quickly detect problems in the network.

tinco · 2025-05-27T13:03:31 1748351011

That's in conflict with the philosophy behind the internet. If you'd just drop anything because some part of it you don't understand, you lose a lot of flexibility. You have to keep in mind that some parts of the internet are running on 20 year old hardware, but some other parts might work so much better if some protocol is modified a little. Just like with web browsers, if everything is a little bit flexible in what they accept, you both improve the smoothness of the experience and create room for growth and innovation.

SAI_Peregrinus · 2025-05-27T13:59:24 1748354364

Postel's Law is important, but it creates brittle systems. You can force them further from the ideal operating state before failure, but when they fail they tend to fail suddenly and catastrophically. I like to call it the "Hardness Principle" as opposed to the "Robustness Principle" in analogy to metallurgy.

emmelaich · 2025-05-28T04:47:55 1748407675

Surely the opposite? If everything was very pedantic and strict, the 'net would be so brittle as to be non-functional.

You're imagining a world where things get specified and implemented completely correctly. Which does not exist and probably can't!

immibis · 2025-05-28T11:12:35 1748430755

That's what Postel thought. He was wrong. Allowing everything creates a brittle system because the system has to accept all the undocumented behaviour that other broken systems emit. If broken files were rejected quickly, nobody would generate them.

There's a difference between unknown extensions following a known format, and data that's simply broken (e.g. offset pointer past end of data).

emmelaich · 2025-05-29T01:27:55 1748482075

You're not accounting for the incorrectly rejected file / protocols. And incomplete protocol specifications.

And generally I think critics of Postel are lacking the context in which they were made. You and probably others would actually make similar decisions than Postel for many particular issues.

SAI_Peregrinus · 2025-05-29T13:07:18 1748524038

I disagree that I'd make similar decisions. Postel's Law is a big part of the reason Bleichenbacher attacks (adaptive chosen-ciphertext attacks)[1] stayed so common for so long. As an engineer responsible for the security I absolutely reject malformed inputs.

https://en.wikipedia.org/wiki/Adaptive_chosen-ciphertext_att...

emmelaich · 2025-05-29T22:53:34 1748559214

But that's what I'm saying; Postel may well have ALSO rejected malformed inputs in this particular case.

bluGill · 2025-05-27T13:58:12 1748354292

There is a place for both. The accept everything model made some extensions better, but it also allowed for various malware when junk was accepted.

colejohnson66 · 2025-05-27T14:44:07 1748357047

Postel's law doesn't mean "accept everything", but that you should accept de-facto rules people have created. If everyone says, "this is how we do it", you should ignore the RFC and just copy what others do.

bluGill · 2025-05-27T21:40:30 1748382030

There are several problems with that.

One, if everyone is doing something different from the spec it is hard to figure out what they are really doing and what they mean. Long term you have confidence things will continue to work even when someone else writes their own version which otherwise might also deviate from the spec.

Two, it is easier to modify the spec as more features are dreamed up if you have confidence that the spec is boss meaning someone else didn't already use that field for something different (which you may not have heard about yet).

Three, if you agree to a spec you can audit it (think security), if nobody even knows what the spec is that is much harder.

Following the spec is harder in the early days. You have to put more effort into the spec because you can't discover a problem and just patch it in code. However the internet is far past those days. We need a spec that is the rule that everyone follows exactly.

arccy · 2025-05-27T22:21:35 1748384495

This is so wrong, read up on https://datatracker.ietf.org/doc/html/rfc9413

The internet is ossified because middleboxes stick their noses where they shouldn't. If they just route IP packets, we could have had nice things like SCTP...

bolognafairy · 2025-05-27T13:21:51 1748352111

Alright. See you over on the XHTML Internet. Oh, wait.

hombre_fatal · 2025-05-27T20:41:43 1748378503

Browsers are permissive not because it's technically superior but as a concession for the end user who still wants to be able to use a poorly built website, and they're competing with browsers who will bend over backwards to render that crappy website so that they look good and your browser looks bad.

It's not a concession you want to make unless you really have to.

immibis · 2025-05-28T11:14:50 1748430890

In other words, because the early Web followed Postel's law, we're now stuck in this local maximum.

hombre_fatal · 2025-05-28T19:37:54 1748461074

Well, my point is that there's unique pressure for browsers to be permissive for practical reasons beyond Postel's law even if you were building a browser in 2025 and the whole internet reset to xhtml.

And that's because the end-user is at the mercy of, but not party to, an over the air interface between the producer and consumer that you can't verify ahead of time.

So if you're consuming a stream of supposed xhtml `<p>foo<p>bar</p>`, you have to decide if you want to screw the user for the producer's mistake for a single fuck up in the website's footer.

hinkley · 2025-05-27T18:17:38 1748369858

If the iPhone had come out just a little bit later I think xhtml-basic would have gotten more traction. It was pretty nice to implement.

oblio · 2025-05-27T19:36:12 1748374572

HTML is a nightmare that had to be reverse engineered as in, rebuilt with proper engineering standards in mind, several times. HTML and CSS are both quite horrible.

debugnik · 2025-05-27T19:34:49 1748374489

XHTML still lives on in the epub spec. I kinda wish we had an "epub web".

eqvinox · 2025-05-27T15:20:37 1748359237

All of this is understood and has been discussed to death, it's just that Arista didn't implement the agreed-best approach (RFC7606) correctly.

AdamJacobMuller · 2025-05-27T19:31:46 1748374306

I would perhaps argue that juniper's behavior is the preferable one.

Remember the definition of this "drop the message I think is broken" not inherently "drop the broken message," it's entirely plausible that the message is fine but you have a bug which makes you THINK it's a broken message.

There is also a huge difference between considering it a broken message and a broken session, which is what Arista did.

ExoticPearTree · 2025-05-27T13:30:11 1748352611

Arista did 2, but it also dropped the whole connection which was probably bad.

IMHO, just drop the broken attributes in the message and log them, and pass on the valid data if there's any left. If not, pretend you did not receive an UPDATE message from that particular peer.

Monitoring will catch the offending originator and people can deal with this without having to deal with any network instability.

eqvinox · 2025-05-27T16:36:45 1748363805

In case you want to calibrate your sense of armchair-ness: you have completely missed the point that discarding an individual attribute can quite badly change the meaning of a route, and since we're talking about the DFZ here, such breakage can spread around the planet to literally every DFZ participant. The only safe thing you can do is to drop the entire route. Maybe there was a point to this being discussed at quite some length by very knowledgeable people, before 7606 became RFC ;)

(I haven't downvoted your comment, but I can see why others would — you're making very simple and definite statements about very complicated problems, and you don't seem to be aware of the complications involved. Hence: your calibration is a bit off.)

ExoticPearTree · 2025-05-27T19:04:17 1748372657

Funny enough, I actually have a few routers with a DFZ, so I have an idea or two about how BGP works.

My point is that:

- if you drop a connection, especially one through which you announce the full routing table, it is going to create a lot of churn to your downstreams. Depending on the kind of routers they use, it can create some network instability for quite a while. And if you drop it again when you receive that malformed route, the instability continues

- removing only the malformed attribute maybe changes the way you treat traffic but you still route it. OK, you send it to maybe another interface, but no biggie

- if you’re using a DFZ setup, dropping that single route could blackhole traffic to that destination if you’re the only upstream to another router

eqvinox · 2025-05-28T00:10:48 1748391048

> Funny enough, I actually have a few routers with a DFZ, so I have an idea or two about how BGP works.

And I'm TSC emeritus and >10 year maintainer on FRRouting, and active at IETF. Yet I hugely respect the other people there, all of whom have areas of expertise where they far outrank my own.

ExoticPearTree · 2025-05-28T05:45:28 1748411128

Nice! (no sarcasm)

I have very strong opinions about some subjects, one of them being BGP.

I believe sessions should not be tear down just because you receive malformed data. You should be able to remove just the corrupt data. Or treat as a withdraw message like one of the RFC recommends.

I for one one would like knobs to match on any attribute and value and remove/rewrite them at will. Imagine something akin to a very smart HTTP proxy.

wbl · 2025-05-27T23:38:36 1748389116

Blackholes are acceptable when it comes to broken attributes. Blackholes spreading is not.

ExoticPearTree · 2025-05-28T05:26:45 1748410005

If you just drop malfromed attributes, there is no blackhole spreading.

vbernat · 2025-05-28T05:43:28 1748411008

If the attribute says "encapsulate this", dropping just the attribute will create a blackhole as you will attract traffic that should be encapsulated and packets following this route will be dropped it if not.

ExoticPearTree · 2025-05-28T08:12:02 1748419922

I guess you're referring to RFC9012.

Yes, but then again since you have logs of why it was dropped (like I suggested in my first post, to log everything dropped), you can easily troubleshoot the problem. A much better outcome than flapping a BGP session for no good reason and creating route churn and network instability.

wbl · 2025-05-28T14:58:31 1748444311

Or just drop the announced route (not the session) with the attribute you can't work with