Hacker News new | past | comments | ask | show | jobs | submit login

> Not so much. (This response seems to repeat, please refrain from ignoring details in such bold claim.)

I said what I meant; feel free to disagree but this policing is pretty condescending. I don't need to constantly repeat that the fundamental data format is lifted from MP and the extra features/process Bormann added on top of it are uniformly poorly thought out.

Just like CBOR's tags, extension types are additional, non core types. Bormann renamed them and bumped up the size so you can have way more of them in CBOR, but the tag also takes up more space, and since the odds of needing billions of extension types are basically zero it's not a good tradeoff.

> MP extension types are opaque and applications can do nothing about them. CBOR tags are extensions to the existing data model and can be processed to some extent

I think CBOR's "have your cake and eat it too" design has confused you. Yes CBOR establishes a tag registry, but implementations are free to ignore all tags. In practice what this means is if you can control the receiver you can use whatever tags you want, and if you don't control the receiver you have to either avoid tags or potentially limit your audience (i.e. do I use the "Standard date/time string" and eat the extra int8 or do I just send it as a string and note it as a date/time in my docs). You might think, "oh pish posh what can't process a date/time string", but the answer is "many embedded devices you'd want to use CBOR on". It's yet another feature added with no consideration for real world use cases.

> for example unknown tags don't prevent implementations from inspecting their contents. And I don't think MP had any sort of registry for extension types, they are more like "reserved for future expansion" instead of "less used but still spec-worthy types to be defined with a reasonable proposal".

You fundamentally misunderstand MP's extension types. Instead of guessing you can read about them in the MP spec [0]:

---

Extension types

MessagePack allows applications to define application-specific types using the Extension type. Extension type consists of an integer and a byte array where the integer represents a kind of types and the byte array represents data.

Applications can assign 0 to 127 to store application-specific type information. An example usage is that application defines type = 0 as the application's unique type system, and stores name of a type and values of the type at the payload.

MessagePack reserves -1 to -128 for future extension to add predefined types. These types will be added to exchange more types without using pre-shared statically-typed schema across different programming environments.

[0, 127]: application-specific types

[-128, -1]: reserved for predefined types

Because extension types are intended to be added, old applications may not implement all of them. However, they can still handle such type as one of Extension types. Therefore, applications can decide whether they reject unknown Extension types, accept as opaque data, or transfer to another application without touching payload of them.

---

[0]: https://github.com/msgpack/msgpack/blob/master/spec.md#exten...




> I said what I meant; feel free to disagree but this policing is pretty condescending.

Sorry for that feeling, but when the same thing repeats three times (I think) I have to note that something is off in your messaging. I'll try to be more cautious in the future.

> You fundamentally misunderstand MP's extension types. Instead of guessing you can read about them in the MP spec [0]:

Maybe my line of thought is confusing to you, but I have read all of that in order to avoid relying on my fragile recollection. And they are qualitatively different to me. You can't really do that much with an encoded bytes `c7 05 00 94 01 02 03 04` if the application-specific type 0 is unknown, even though `94 01 02 03 04` is a valid MP sequence and the author probably have intended so. So tag-unaware tools like diagnostics or compression algorithms would have to guess. The equivalent CBOR bytes `c0 84 01 02 03 04` clearly express such intent. If there is no such intent, you can put a byte string instead (`c0 45 84 01 02 03 04`).

As you have acknowledged, the tag registry has its pros and cons. It might not be obvious which tag should be used in a given use case. Tags are prone to be ill-designed and stuck forever (this already happened for IPv4/v6 tags, to be clear). But the registry means that the spec development can happen in the distributed manner and for more specific situations. I mean, the only extension type ever defined by MP is a timestamp. It even doesn't have other obvious tags like UUID. Is it justified?


The registry isn't useful for this. Either you're defining a format to be consumed by a generic decoder and therefore can't rely on tags in the registry, or you're defining a format to be consumed by a custom decoder you control, so it can understand whatever tags/extension types you make it understand. The registry is strictly a negative because--again--you can't rely on it, and it requires extensions to go through the registration process. You can't define application-specific types in CBOR.

> It even doesn't have other obvious tags like UUID. Is it justified?

Yes; UUIDs are huge 128-bit values and many popular embedded platforms are 32-bit. If your app needs them in MP that's what extension types are for.

---

I think maybe what makes us talk past each other is: there's no use-case for a generic CBOR (or MP) decoder on its own. JSON/XML/HTML won in that space (you know things are bad when there are more public XML APIs than public CBOR APIs). There's no serious use-case for a "tag-unaware diagnostic" tool for CBOR or MP APIs. You will always build things on top of the CBOR/MP decoder, there will always be API docs, or reverse engineering the wire format is trivial. CBOR really wants this to not be true; it really wants to be the binary JSON despite the fact this is more or less an oxymoron. The questions that illustrate the difference are:

- how does the format avoid forcing things on you you don't need

- how does the format provide for extension

MP's answers to these questions are:

- be very conservative about what's required of implementations

- extension types

CBOR's answers to these questions are:

- interact with IETF

- interact with IETF

Different people will react to that differently, but that's the bottom line.


> Yes; UUIDs are huge 128-bit values and many popular embedded platforms are 32-bit.

Size is irrelevant because UUIDs are meant to be used as is (see my other comment). `b210cdca-5d10-4c2e-a604-0fdd9502f02b` has no intrinsic meaning as a number 236689833926310967579631802650001076267; every parsing and formatting against UUID can be done bytewise for that reason.

> If your app needs them in MP that's what extension types are for.

No, I mean that why aren't UUIDs built-in (negative) extension types. I'm not talking about application-specific types which would be literally anything by definition, and CBOR does support such "private" tags (starting at 80000) if you want anyway [1].

[1] In fact, I would argue that about a half of tags past them should be made private.

> there's no use-case for a generic CBOR (or MP) decoder on its own. JSON/XML/HTML won in that space (you know things are bad when there are more public XML APIs than public CBOR APIs).

And that's your claim, not the verifiable fact. Public CBOR APIs are now starting to appear (even though very slowly), while I have never seen a single public MP API---please let me know if there is. CBOR API is rare mainly because CBOR is new. The same thing can't be said for MP API, which existed much longer than CBOR. MP API is even rare if not non-existent because of some other reason. Maybe we can tell whether CBOR API was indeed rare for the same reason as MP API in, say, 10 years though.

> CBOR really wants this to not be true; it really wants to be the binary JSON despite the fact this is more or less an oxymoron.

So, everything you claimed seems to be ultimately originated from this line of thought. And you know what? Needs for binary JSON were always high, otherwise we didn't even need any sort of schematic serialization and so many people tried to design one! CBOR is probably one of the best alternatives as binary JSON we have ever seen. (Again, that doesn't mean that MP is not one of them. But I will avoid any sort of irrelevant judgement here.) Maybe you may have been right, but I think there is no concrete evidence for nor against your claim right now.

It is not true that I'm totally satisfied about CBOR, of course. Some tags are proposed too late to undo harm already done, like private tags I've mentioned. Bormann in particular seems to be more interested in adding more tags instead of doing the most out of the optimal number of tags, and I don't like his attitude in general. So my ideal is actually somewhere between CBOR and MP, it just happens that it can be implemented as a subset of CBOR and MP is just insufficient.


> Size is irrelevant because UUIDs are meant to be used as is (see my other comment). `b210cdca-5d10-4c2e-a604-0fdd9502f02b` has no intrinsic meaning as a number 236689833926310967579631802650001076267; every parsing and formatting against UUID can be done bytewise for that reason.

I don't understand your point here. Either there's benefits to representing them numerically (size, speed of comparison, etc) that can be realized w/ MP's extension types, or we can just leave them as strings and MP supports everything you'd want to do with them. What's your issue w/ MP here again?

> CBOR does support such "private" tags (starting at 80000) if you want anyway

I really thought this too, but I can't find it in the spec. The spec links to a big ass list of tags [0] which, holy shit haha, what is going on here? "YANG bits datatype"? "Gordian Envelope"? "Bigfloat with arbitrary exponent"? "Extended bigfloat"? What on earth supports any of this? Anyway, can you link what you're looking at?

Later: Oh, I found it! It's in the big ass list of tags. Although I don't think it's really official? I read through the linked email thread and they don't mention the port range. They seem like they settle on using 1010 and then switching on data after the tag.

> And that's your claim, not the verifiable fact.

You're not seriously claiming CBOR has anywhere near the usage of JSON/XML/HTML.

> CBOR API is rare mainly because CBOR is new

CBOR is over ten years old. That's not new.

> Needs for binary JSON were always high

Where are all these binary JSON APIs? Is there a list anywhere near as large as this big public APIs list on GitHub [1]?

---

I've been nerd sniped by this enough so I'm gonna quit following these threads. I want to leave you with the fact that I've been right about everything all along, and that the world would be a better place if everyone just listened to me always. Good luck out there.

[0]: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml

[1]: https://github.com/public-apis/public-apis




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: