I like JS for this use case, and React on web, but really not fond of the Ink usage. Idk if it's Ink itself or the way it gets used, but somehow people are making CLIs that lag and waste terminal space now.
Same reason AIs also use Python and DBMSes offer JS or Py UDFs easily, interpreted languages take no build time and are more portable. JS is also very popular.
Might also be a context window thing. Idk how much boilerplate C# has, but others like Java spam it.
I've gone the all-JSON route many times, and pretty soon it starts getting annoying enough that I lament not using protos. I'm actually against static types in languages, but the API is one place they really matter (the other is the DB). Google made some unforced mistakes on proto usability/popularity though.
I once converted a fairly large JS codebase to TS and I found about 200 mismatching names/properties all over the place. Tons of properties we had nulls suddenly started getting values.
Sounds like this introduced behavior changes. How did you evaluate if the new behavior was desirable or not? I’ve definitely run into cases where the missing fields were load bearing in ways the types would not suggest, so I never take it for granted that type error in prod code = bug
The most terrifying systems to maintain are the ones that work accidentally. If what you describe is actually desired behavior, I hope you have good tests! For my part, I’ll take types that prevent load-bearing absences from arising in the first place, because that sounds like a nightmare.
Although, an esoteric language defined in terms of negative space might be interesting. A completely empty source file implements “hello world” because you didn’t write a main function. All integers are incremented for every statement that doesn’t include them. Your only variables are the ones you don’t declare. That kind of thing.
It costs time, distracts some devs, and adds complexity for negligible safety improvement. Especially if/when the types end up being used everywhere because managers like that metric. I get using types if you have no tests, but you really need tests either way. I've done the opposite migration before, TS to JS.
Oh I forgot to qualify that I'm only talking about high level code, not things that you'd use C or Rust for. But part of the reason those langs have static types is they need to know sizes on stack at compile time.
Protos are great. Last time I did a small project in NodeJS, I set up a server that defines the entire API in a .proto and serves each endpoint as either proto or json, depending on the content header. Even if the clients want to use json, at least I can define the whole API in proto spec instead of something like Swagger.
So my question is, why didn't Google just provide that as a library? The setup wasn't hard but wasn't trivial either, and had several "wrong" ways to set up the proto side. They also bait most people with gRPC, which is its own separate annoying thing that requires HTTP/2, which even Google's own cloud products don't support well (e.g App Engine).
P.S. Text proto is also the best static config language. More readable than JSON, less error-prone than YAML, more structure than both.
highly recommend twirp, even in current year. connectrpc seems to have stalled so there isn't a ton of languages w/server support, and because of the grpc interop on top of their own protocol it's a bit of an undertaking to roll your own.
the twirp spec however is so simple you can throw together your own code generator in an afternoon for whatever language you want.
Yeah that one looked good. I don't remember why I didn't use it that time, maybe just felt it was easy enough to DIY that I didn't feel like using another dep (given that I already knew express and proto in isolation). The thing is, Google themselves had to lead the way on this if they wanted protobuf to be mainstream like JSON.
I've been working on and with Kerberos and PKIX for decades. I don't find ASN.1 to be a problem as long as you have good tooling or are willing to build it. The specs are a pleasure to read -- clear, concise, precise, and approachable (once you have a mental model for it anyways).
Of course, I am an ASN.1 compiler maintainer, but hey, I had to become one because the compiler I was using was awesome but not good enough, so I made it good enough.
Here's the problem though: people have used the absence of tooling to justify the creation of new, supposedly-superior schemas and codecs that by definition have strictly less tooling available on day zero and which invariably turn out to be worse than ASN.1/DER were in 1984 because the authors also refused to study the literature to see what good ideas they could pick up. That's how we end up with:
- PB being a TLV encoding, just like DER, with all the same problems
(Instead PB should have been inspired by XDR or OER, but not DER.)
- PB's IDL requiring explicitly tagging every field of every data structure(!) even though ASN.1 never required tagging every field, and even though ASN.1 eventually adopted automatic tagging.
- PB's very naive approach to extensibility that is just like 1984 ASN.1's.
It's a mistake.
Some people, when faced with a dearth of tooling, will write said tooling. Other people will say that the technology in question is a nightmare, and some of those people will then go on to invent a worse wheel.
I'd be ecstatic to use something other than ASN.1 if it wasn't a poor reinvention of it.
Protobuf ended up having more tooling in the end though, and it didn't take very long to get there. This is like how JSON replaced XML for many use cases.
If they had put the same energy towards building tooling for an existing IDL/codec then they would have had strictly less work to do. Besides being inefficient in the use of their resources they also saddled us with 15th system (probably more like a 25th system, but you get the reference), and a poor one at that. There is really nothing much good to say about PB.
This was the main reason. The asn.1 language has a ton of unnecessary features that make it harder to implement, but the stuff I dealt with was using those features so I couldn't just ignore it. I didn't write a compiler but did hack
around some asn1c outputted code to make it faster for our use case. And had to use asn1c in the first place because there was no complete Rust asn1 compiler at the time, though I tried DIY'ing it and gave up.
I also remember it being complicated to use, but it's been too long to recall why exactly, probably the feature bloat. Once I used proto3, I realized it's all you need.
> The asn.1 language has a ton of unnecessary features that make it harder to implement
Only if you want to implement them. You could get quite far with just a subset of UNIVERSAL types, including UTF8String, SEQUENCE/SET, SEQUENCE OF / SET OF, etc. There's a ton of features in x.680 you can easily drop.
I've implemented a subset of x.681, x.682, and x.683 to get automatic, recursive decoding through all typed holes in PKIX certificates, CRLs, CSRs, etc. Only a subset, and it got me quite far. I had a pretty good open source x.680 implementation to build on.
This is the story of how Heimdal's authors wrote its ASN.1 compiler: they wanted tooling, there wasn't a good option, they built enough for PKIX and Kerberos. They added things as they went along. OpenSSL does not-quite-DER things? Add support in the Heimdal decoder. They hacked a lot of things for a while which I later fixed, like they didn't support DEFAULT, so they changed DEFAULTed members to OPTIONAL, and they hacked IMPLICIT support, which I finished. And so on. It still doesn't have things like REAL (who needs it in security protocols? no one). Its support for GeneralString is totally half-assed just like... MIT Kerberos, OpenSSL, etc. We do what we need to. Someone could take that code, polish it up, add features, support more programming languages, and make some good money. In fact, Fabrice Belllard has his own not-open-source, commercial ASN.1 compiler and stack, and it must be quite good -- very smart!
it is not necessary to use or to implement all of the data types and other features of ASN.1; you can implement only the features that you are using. Since DER uses the same framing for all data types, it is possible to skip past any fields that you do not care about (although in some cases you will still need to check its type, to determine whether or not an optional field is present; fortunately the type can be checked easily, even if it is not a type you implement).
Yes but I don't want to worry about what parts of the spec are implemented on each end. If you removed all the unnecessary stuff and formed a new standard, it'd basically be protobuf.
I do not agree. Which parts are necessary depends on the application; there is not one good way to do for everyone (and Protobuf is too limited). You will need to implement the parts specific to your schema/application on each end, and if the format does not have the data types that you want then you must add them in a more messy way (especially when using JSON).
In what ASN1 application is protobuf spec too limited? I've used protobuf for tons of different things, it's always felt right. Though I understand certain encodings of ASN1 can have better performance for specific things.
These are only scalars that you'd encode into bytes. I guess it's slightly annoying that both ends have to agree on how to serialize rather than protobuf itself doing it, but it's not a big enough problem.
Also I don't see special ASN1 support for non-Unicode string encodings, only subsets of Unicode like ascii or printable ascii. It's a big can of worms once you bring in things like Latin-1.
ASN.1 has support for ISO 2022 as well as ASCII and Unicode (ASCII is a subset of Unicode as well as a subset of ISO 2022). (My own nonstandard extensions add a few more (such as TRON character code and packed BCD), and the standard unrestricted character string type can be used if you really need arbitrary character sets.) (Unicode is not a very good character set, anyways.)
Also, DER allows to indicate the type of data within the file (unless you are using implicit types). Protobuf has only a limited case of this (you cannot always identify the types), and it requires different framing for different types. However, DER uses the same framing for all types, and strings are not inherently limited to 2GB by the file format.
Furthermore, there are other non-scalar types as well.
In any of these cases, you do not have to use all of the types (nor do you need to implement all of the types); you only need to use the types that are applicable for your use.
I will continue to use ASN.1; Protobuf is not good enough in my opinion.
To be fair, if you don't need to support anything other than Unicode, then this is not an advantage, and over time we're all going to need non-Unicode less and less. That said I'm a big fan of ASN.1 (see my comment history).
I'm still confused how these ISO 2022 strings even work, and the ASN1 docs discourage using the UniversalString and GraphicString types. All these different string types are intimidating if I just want unicode/ascii, and even if I were using an obscure encoding, I'd use generic bytes instead of wanting asn1 to care about it.
GeneralString relies on control characters to "load" character sets into the C0 and C1 registers. This is madness -- specifically it's pre-Unicode madness, but before Unicode it made sense.
Oh gosh. Fair enough that this exists and something uses it, but I'd absolutely want to handle that on the ends only, not get asn1 involved in parsing it.
Oh GeneralString is madness. It's pre-Unicode madness. It exists because Unicode didn't exist in 1984, but people still wanted to be able to exchange text in multiple scripts, which necessitated being able to "switch codesets" in the middle. It's... yeah, it's.. it's nuts. I've _not_ implemented GeneralString, and practically no one needs to even when specs say to. E.g., in Kerberos the strings are GeneralString, but all the implementations just-send-8 and do not attempt to interpret any codeset switching escapes.
> I'm still confused how these ISO 2022 strings even work
There is C0, G0, C1, and G1 sets (C0 and C1 are control characters and G0 and G1 are graphic characters), and escape sequences are used to select the C or G set for bytes with or without the high bit set. Graphic string does not allow control characters and General string does allow control characters.
You probably do not need all control characters; your schema should probably restrict which control characters are allowed in each context (although the ASN.1 schema format does not seem to have any way to do this). This way, you will only handle the control characters which are appropriate for your use.
This is messy, although canonical form simplifies it by adding some restrictions (this is one of the reasons why DER is better than BER, in my opinion). TRON code is better and is much simpler than the working of ISO 2022. (Unicode has a different kind of mess; although decoding is simpler, actually handling the decoded characters in text is its own big mess for many reasons. Unicode is a stateful character set, even though the encoding is stateless; TRON code is the other way around (and with a significantly simpler stateful encoding than ISO 2022).)
> the ASN1 docs discourage using the UniversalString and GraphicString types
UniversalString is UTF-32BE and GraphicString is ISO 2022 without control characters. By knowing what they are, you should know in which circumstances they should be considered useful or not useful; I think that they should not be discouraged in general (although usually if you want Unicode, you would use UTF-8 rather than UTF-32, there are some circumstances where you might want to use UTF-32, such as if the data or program is already UTF-32 for other reasons).
(The data type which probably should be avoided is the UTC time type, which is not Y2K compliant.)
> All these different string types are intimidating if I just want unicode/ascii
If you only want ASCII, use the IA5 type (or Visible if you do not want control characters); if you only want Unicode, use the UTF-8 string type (or Universal if you want UTF-32 instead for some reason). ("IA5" is another name for ASCII that as far as I can tell hardly anyone other than ITU uses.)
However, Unicode is not a very good character set, and they should not force or expect you to use it.
As I had mentioned before, you do not need to use or implement all of the ASN.1 data types; only use the ones appropriate for your application (so, if you do not like most of the types, then don't use those types). I also made up some additional nonstandard ASN.1 types (called ASN.1X), which also might be useful for some applications; you are not required to use or implement these either.
> However, Unicode is not a very good character set, and they should not force or expect you to use it.
Unicode is an excellent character set, and for 99% of cases (much more probably) it's absolutely the best choice. So one should choose Unicode (and UTF-8) in all cases unless there is an excellent reason to do otherwise. As time passes there will be fewer and fewer cases where Unicode is not sufficient, so really we are asymptotically approaching the point at which Unicode is the only good choice to make.
This is all independent of ASN.1. But it is true that ASN.1 has built-in types for Unicode and non-Unicode strings that many other protocols lack.
Have you written up anything about ASN.1X anywhere? I'd love to take a look.
> Have you written up anything about ASN.1X anywhere? I'd love to take a look.
ASN1_BCD_STRING (64): Represents a string with the following characters:
"0123456789*#+-. " (excluding the quotation marks). Each octet encodes
two characters, where the high nybble corresponds to the first character
and the low nybble corresponds to the second character.
ASN1_PC_STRING (65): Represents a string of characters in the PC
character set. Note that the control characters can also be used as
graphic characters.
ASN1_TRON_STRING (66): Represents a string of characters in the TRON
character set, encoded as TRON-8.
ASN1_KEY_VALUE_LIST (67): Represents a set of keys (with no duplicate
keys) and with a value associated with each key. The encoding is the same
as for a SET of the keys, but with the corresponding value immediately
after each key (when they are sorted, only the keys are sorted and the
values are kept with the corresponding keys).
ASN1_UTC_TIMESTAMP (68): Represents a number of UTC seconds (and
optionally fractions of seconds), excluding leap seconds, relative to
epoch.
ASN1_SI_TIMESTAMP (69): Represents a number of SI seconds (and
optionally fractions of seconds), including leap seconds, relative to
epoch.
ASN1_UTC_TIME_INTERVAL (70): Represents a time interval as a number
of UTC seconds. The number of seconds does not include leap seconds.
ASN1_SI_TIME_INTERVAL (71): Represents a time interval as a number
of SI seconds (which may include fractions).
ASN1_OUT_OF_BAND (72): This type is not for use for general-purpose
data. It represents something which is transmitted out of band (e.g. a
file descriptor) with whatever transport mechanism is being used. The
transport mechanism defines how a value of this type is supposed to be
encoded with whatever ASN.1 encoding is being used.
ASN1_MORSE_STRING (73): Represents a string of characters in the
Morse character set. The encoding is like a relative object identifier,
where 0 means an empty space, and others is like bijective base 2 with
1 for dots and 2 for dashes, with the high bit for the first dot/dash,
e.g. 4 means A and 8 means U.
ASN1_REFERENCE (74): A reference to another node within the same file.
(Not all implementations will support this feature.) The encoding is like
a Relative Object Identifier; the first number is how many times to go to
the parent node (where 0 means the reference itself), and then the rest of
the numbers specify which child node of the current node to go to where 0
means the first child, 1 means the second child, etc. It can reference a
primitive or constructed node of a BER file, but you cannot specify a
child index for a child of a primitive node, since primitive nodes cannot
have child nodes. At least one number (how many levels of parents) is
required, but any number of numbers is potentially possible.
ASN1_IDENTIFIED_DATA (75): Data which has a format and/or meaning which
is identified within the data. The encoding is always constructed and
consists of two or three items. The first item is a set of object
identifiers, object descriptors (used only for display), and/or sequences
where the first item of the sequence is a object identifier. The receiver
ignores any items in this set that it does not understand. The second
item in a ASN1_IDENTIFIED_DATA can be any single item of any type; it is
interpreted according to the object identifiers in the first set that the
receiver understands. The third item is optional, and if it is present it
is a key/value list of extensions; the keys are object identifiers and
the values are of any type according to the object identifiers. The default
value of this key/value list is an empty key/value list.
ASN1_RATIONAL (76): Stored as constructed, containing two integers, being
the numerator and the denominator. The denominator must be greater than
zero. If it is canonical form, then it must be lowest terms.
ASN1_TRANSLATION_LIST (77): A key/value list where the keys identify
languages. If the key is null then it means the default in case no language
present in this list is applicable. The types of the values depends on the
application (usually they will be some kind of character strings).
In addition, the same number for the BMP string type can also be used for a UTF-16 string type, and there is a "OBJECT IDENTIFIER RELATIVE TO" type which encodes a OID as either relative or absolute (in canonical form, it is always relative when possible) in order to save space; the schema will specify what it is relative to. ANY and ANY DEFINED BY are allowed despite being removed from the most recent versions of standard ASN.1. (The schema format for these extensions is not defined, since I am not using the ASN.1 schema format; however, someone who does use it might do so if they need it.)
There is also SDER, which is a superset of DER but a subset of BER, in case you do not want the mess of BER but do not want to require strictly canonical form either; and also SDSER which uses the same encoding for types and values than SDER but but length works differently in order to support streaming better.
As is usual, you do not have to use any or all of these types, but someone might find them useful for some uses. I have used some of them in my own stuff.
ASN1_BCD_STRING can be just IA5String with a constraint attached...
Your time types can be just an INTEGER with a constraint attached... (In Heimdal we use INTEGER constraints to pick a representation in the programming language.) E.g.,
-- 64-bit signed count of seconds where 0 is the Unix epoch
ASN1_UTC_TIMESTAMP ::= INTEGER (-18446744073709551616..18446744073709551615)
ASN1_OUT_OF_BAND can just be a NULL with an APPLICATION tag or whatever:
Out-of-Band ::= [APPLICATION 100] NULL
or maybe an ENUMERATED or BIT STRING with named bits to indicate what kind of thing is referenced out of band. You might even use this with a SEQUENCE type instead where one member identifies an out of band datum as an index, and the other identifies the kind.
ASN1_REFERENCE is... interesting. I've not needed it, but some RPC protocols support intra-payload and even circular references, so if you have a need for that (hopefully you don't), then your ASN1_REFERENCE would be useful indeed.
ASN1_RATIONAL is just a tagged sequence of numerator and denominator, with a constraint that the denominator must not be zero.
OBJECT IDENTIFIER RELATIVE TO is just a CHOICE of OBJECT IDENTIFIER and RELATIVE IDENTIFIER.
Re: SDER... yeah, so Heimdal's codec produces DER but accepts a subset of BER for interop with OpenSSL and others. If you really want streaming then you'll want a variant of OER with fixed-length lengths (which IMO OER should have had, dammit), which then looks a lot like XDR but with different alignment and more types.
reply