Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe an example is useful. I want to build a generic CBOR decoder in C. I have 2 options:

- link GMP/mpdecimal/whatever (or hey, provide an abstraction layer and let a user choose)

- accept function pointers to handle bignum tags

Function pointers are an irritation (I know this because my MP library uses them), they're slower than not using them, you've gotta check for NULL a lot, you're also asking any application that uses your library and wants bignum support to include GMP itself (with all the attendant maintenance, setup, etc.)

Or, you can include it yourself, but welcome to doing all the maintenance yourself, and exposing all of GMP's knobs (ex: [0])

You might argue that these aren't the only options, but a deserialized value has to be understood by the application; your suggestions aren't good tradeoffs. sscanf (also do not use sscanf) doesn't work if the value is actually a bignum, and yielding a bespoke bignum format is just as unusable as simply returning whatever's encoded in CBOR. How would I add two such values together? How would I display it? This is what bignum libraries are for.

All this is made far worse by the fact that there are effectively no public CBOR (or MP) APIs where you're expecting them to be consumed entirely by generic decoders, so there's not even a need to force generic decoders to go through all this effort to support bignums (etc.) Further, unlike MP, CBOR doesn't let you use tags for application-specific purposes. Put it all together and it's uniformly worse: implementations are either more complex or have surprising holes, you can't count on generic decoders supporting tags when building an API or defining messages, and you can't even just say, "for this protocol, tag 31 is a UUID".

This is probably a big reason (though I can think of others) why the only formats you can think of w/ bignum support are obscure.

> That's what I expect for the supposed bignum support: round-trippability.

Round-tripping is only meaningful if a receiver can use the values before reserializing, otherwise memcpy meets your requirements. If a sender gives me a serialized bignum, the deserializing library has to deserialize it into a value I can understand and use; that's the whole point of a deserialization library.

MP's support for timestamps is a reasonable example here: it decomposes into a time_t, and it can do this because it defines the max size. You can't do that w/ a bignum--the whole point of a bignum is it's big beyond defining. A CBOR sender can send you an infinite series of digits, and the spec doesn't reckon with this at all.

[0]: https://gmplib.org/manual/Memory-Management




> I have 2 options: - link GMP/mpdecimal/whatever (or hey, provide an abstraction layer and let a user choose) - accept function pointers to handle bignum tags

I would just provide two kinds of functions:

    // For each representative native type...
    cbor_read_t cbor_read_float(struct cbor *ctx, float *f);

    // And there is a generic number handling:
    struct cbor_num {
        int sign; // -1, 0 or 1
        int base; // 10 or 16
        int exponent;
        const char *digits;
        size_t digits_len;
    };
    cbor_read_t cbor_read_number(struct cbor *ctx, struct cbor_num *num);

    // And then someone will define the following on top of cbor_read_number:
    cbor_read_t my_cbor_read_mpz(struct cbor *ctx, mpz_t num);
Memory lifetime and similar has to be also considered here (left as an exercise), but the point is that you never need function pointers in this case. In fact I would actively avoid them because proper function pointer support is indeed a PITA as you said. They can generally be avoided with the (sorta) inversion of control, which is popular in compact C APIs and to some extent also in Rust APIs. It is just you haven't thought of this possibility.

> sscanf (also do not use sscanf) doesn't work if the value is actually a bignum, and yielding a bespoke bignum format is just as unusable as simply returning whatever's encoded in CBOR. How would I add two such values together? How would I display it? This is what bignum libraries are for.

In practice many bignums are just left as is. For example X.509 certificate serial numbers are technically bignums, but you never compute anything out of them. So you don't need any bignum to read serial numbers. If you do need computation then you need an adapter function as above, but the library proper needs no knowledge about such adapter. What's a problem now?

By the way, sscanf is fine here because the API's contract constrains sscanf's inputs enough to be safe. Sscanf in general is also safe when every `char*` outputs are bounded. It is certainly a difficult beast, but so is everything about C.


This isn't responsive to what I wrote:

> and yielding a bespoke bignum format is just as unusable as simply returning whatever's encoded in CBOR. How would I add two such values together? How would I display it? This is what bignum libraries are for.

I know this is what you've been getting at. Maybe I've been unclear about why this isn't useful, but here are the main points:

- Without bignum functionality, your data structure doesn't provide any more functionality than memcpy. How do I apply the base? How do I apply the exponent? How would I add two of them together? This may as well just be a `char *`.

- Speaking of just being a `char *`, CBOR's bignums are just that, so you'd just call `mpz_init_set_str` on whatever is in the buffer (zero terminate it in a different buffer, I guess, whatever). Parsing into your struct here is counterproductive.

- Even the minimal functionality you're proposing here is added bloat to every application that doesn't care about bignums and wants to ignore the tag (probably almost all applications). Ameliorating this requires conditional compilation.

> In practice many bignums are just left as is.

I'd believe this; I'd also believe there's very little real need for them generally. This is an argument for not including them in a data serialization format.

> By the way, sscanf is fine here

The problem with sscanf isn't that it can never be safe, it's that if you're not safe every time you blow everything up. It's better to just not use it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: