What is wrong with TOML? (2019)

lolinder · on Sept 13, 2023

Honestly, all of these arguments feel pretty subjective to me.

This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited, which means the main thing left to disagree over is syntax, which is highly subjective and extremely difficult to get people to agree on.

Add too many syntactic features and a lot of people will disavow you for being too complicated. Add too few and you'll be missing someone's pet feature. Make white space significant and you'll frustrate people. Require extra characters to delineate and you'll frustrate another group.

It's worth noting that this article is primarily talking about TOML in the context of the Python ecosystem, and I think that's a healthier way to talk about config file formats: How well suited are their syntactic choices to the community they're targeting?

tzhenghao · on Sept 13, 2023

Yup. I've personally used both YAML and TOML for configurations, much more the latter recently and can see pros and cons for both.

> How well suited are their syntactic choices to the community they're targeting?

Also, "best" practices. One could reduce the pain of the other, but by no means is the right solution to a deeper problem at hand. For example, if one has very deep and complex nesting for configs, TOML "may be a lot nicer" compared to YAML, but that doesn't mean using TOML will make all the config parsing problems go away. It just mask away code smell. Maybe time to check if they're overcomplicating configurations in general.

jnxx · on Sept 13, 2023

> This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited [ ... ]

So, why not use Scheme ?

mindslight · on Sept 13, 2023

Scheme lacks most syntactic affordances that imply semantics. Even if some of those implications are dead wrong, they're still useful.

Personally I think the right answer for configuration files is to define them in terms of a generic object model. A program could even support multiple formats (TOML+JSON+YAML). If a user dislikes all the supported formats or the file is generated with something like NixOS, it can be handled with straightforward conversion.

withinboredom · on Sept 13, 2023

> A program could even support multiple formats

I invite you to check out Symfony where you configure your app using yaml, attributes in code, code itself, or a mix of all the above.

You will cry.

mindslight · on Sept 13, 2023

Point taken, but it would seem the problem there is probably due to the arbitrary placed mixins? My proposal was more for a single configuration object, just in whatever actual syntax you (or your team) prefers. If someome runs away with that and writes json that includes yaml that includes python that generates configuration from what it found on the filesystem, responsibility for that needless complexity rests squarely on the shoulders of the new programmer.

rini17 · on Sept 13, 2023

Having syntactic affordances for every nuance of semantics is what led to the current state of zoo. What is wrong with having trivial syntax and distinguishing semantics by labeling parts of the syntax tree with symbols?

mindslight · on Sept 13, 2023

Specifically with scheme, I believe the main problem is the symbols lack any distinction to know the difference between a function call and a syntactic form.

(Don't shoot the messenger, I've done my fair share of scheme. I've also done a lot of thinking about why some people are so turned off from the syntax, and it's certainly not that the opening parenthesis is in a slightly different place on the prefix function calls)

lapinot · on Sept 13, 2023

Some do, using s-expr as config files is pretty common in the ocaml ecosystem (ie dune).

HideousKojima · on Sept 13, 2023

>Make white space significant and you'll frustrate people.

Or worse: make whitespace significant and strike a blow in the eternal tabs vs. spaces holy war at the same time, like YAML.

GuB-42 · on Sept 13, 2023

...or more famously, Makefiles

MiguelX413 · on Sept 13, 2023

I would disagree

eternityforest · on Sept 14, 2023

It's kinda crazy we have 2 kinds of whitespace.

Even the customizability argument makes less sense now, IDEs could just change the width of leading spaces, I'm not sure why they don't.

reactordev · on Sept 13, 2023

Isn’t the whole premise behind this article that, coming from Python where indentation is program structure, that TOML confuses the reader with syntax foreign to the reader?

Like a C++ developer crying foul because inheritance doesn’t exist in YAML.

giaour · on Sept 13, 2023

Make that C++ developer's day by pointing out that YAML does support inheritance: https://dmitryrck.com/how-to-use-inheritance-in-yaml-files/

reactordev · on Sept 13, 2023

Pfffffft, you know very well C++ developers don’t write yaml. They write m4.

GoblinSlayer · on Sept 14, 2023

SciTE uses lua for configuration.

riwsky · on Sept 13, 2023

Thankfully, YAML supports SFINAE

sundarurfriend · on Sept 13, 2023

The first and the last ones, at least, are tradeoffs where TOML made the right decision for most users.

Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.

And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure, another thing to maintain and sync up and keep in your head, and not worth it for most use cases.

And that's the crux of it: it all depends on what you need from your markup language, what your use case is, today and through the lifetime of your project. "What's wrong with TOML" makes much less sense as a question than "What's wrong with TOML(/JSON/YAML/etc.) for _this project and its needs_".

nerdponx · on Sept 13, 2023

I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer. It's analogous to forbidding comments in JSON.

I love TOML and will continue to use it as my default choice for configuration files, because I think most applications simply do not need the power and flexibility of YAML, even if The outright safety problems are mostly resolved in YAML 1.2. But I do agree that the inability of the syntax to convey nested structure is a limitation and it definitely gets annoying in larger configuration files, such as pyproject.toml files that tend to accumulate in larger Python projects. I have considered just manually indenting nested table blocks, even though that would look pretty ugly and is decidedly non-standard.

masklinn · on Sept 13, 2023

> I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer.

I'm sure you'll be happy to know this is getting relaxed in toml 1.1, wherever that comes out (and the implementations adopt it): https://github.com/toml-lang/toml/issues/516

Though the difficulty will then be knowing whether a given piece of software uses 1.0 (and single-line tables) or 1.1 (and more flexible tables).

eternityforest · on Sept 14, 2023

YAML has a big problem in that you can't work with it in standard tools.

Most every common GCedlanguage these days supports native JSONlike object, YAML can represent things outside of strings, lists, dicts, bools, numbers, and null.

Lack of nested structure is a positive in some applications. Flat is better than nested. I've seen way too many config files where someone says "add foo=3" to the file and you can't even figure out where in the structure it goes.

And worse, sometimes people reorganize things into options. They'll move all the stuff for one subcomponent into its own nested thing, and you can't configure it without knowing the full architecture.

With flat stuff you get an obvious single way to represent any given config options. Maybe not the nicest way, but it's obvious and unique.

hitchstory · on Sept 13, 2023

Hi, I'm the author of this piece. Thanks for your comments.

>Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.

If the contextual information is relevant that's true. However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant. A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.

>And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure

Schemas aren't technically required by strictyaml provided you're happy mapping everything to a dict, list and string, but they're recommended because they make it much easier to prevent something from going wrong and it means you can directly use the types you were expecting.

Schemas in config files are equivalent to static types that generate compiler errors in a program. If you can use them, it's an easy way to get your program to fail fast on invalid input and save on debugging time.

If you don't have a schema and some invalid data gets put into your config file, instead of getting an error that says "didn't expect key "ip addresses" on line 14" you tend to get a really cryptic error a bit later on when your program tries to get a key from a dictionary that doesn't exist.

This is an example of the principle of https://en.wikipedia.org/wiki/Fail-fast design.

camgunz · on Sept 13, 2023

> However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant.

I don't know that ': ' is any better than ' = '. I get being opinionated about it but this feels squarely in the realm of the subjective to me. Further, adding an errant '-' and accidentally creating another object is real common in YAML, which is something you can't do with TOML's lists. I think this washes out, tbh.

hitchstory · on Sept 13, 2023

I actually tried to demonstrate this with numbers a while back. I tried taking a few random JSON files as a control and representing them with StrictYAML and TOML and the TOML varied from 30% to 100% longer.

There is an element of opinion here, but there is no question that equivalent TOML files are longer, and most of that is syntax.

It's much more pronounced when you have more than one or two levels of nesting. With 4 or 5 levels of object nesting TOML files grow huge, whereas YAML is still fine.

>Further, adding an errant '-' and accidentally creating another object is real common in YAML

Yep, this is one of the things that type safety helps with though. Similarly it's quite easy to mess up an indent in YAML, but a schema can catch that stuff.

marcosdumay · on Sept 13, 2023

> It's much more pronounced when you have more than one or two levels of nesting.

Yes. That's a clear indication that either you configuration type structure is badly designed, or that you aren't trying to create a configuration type and should use some data encoding or programing language that fits your problem better.

I guarantee you the people using your software will curse you every time they try to edit a YAML file with more than 3 levels of nesting.

camgunz · on Sept 13, 2023

I confess I feel strongly in opposite directions on this. I'm pretty biased against YAML: I think it's way way too complex, and its "declarative" nature and broad feature set essentially make it a complexity magnet. I've seen hellish YAML files, and it's also pretty common to combine them with Jinja (or w/e) for extra hellsauce.

But it's declarative, which is really cool! What else would you realistically use for like, AWS infra? The JSON version is so much worse. Using Python or whatever is its own can of worms.

This is the kind of thing that stuff like CUE and Dhall are aiming at, and I welcome it. It feels like they're the way out here.

marcosdumay · on Sept 13, 2023

> But it's declarative, which is really cool! What else would you realistically use for like, AWS infra?

Programing languages can be declarative too. As you already pointed, CUE and Dhall exist. (And lisp, and prolog...)

(And in fact, a lot of people just use Dhall anyway and derive the actual configuration files from it.)

camgunz · on Sept 14, 2023

True, I meant more non-Turing-complete but you're right. Also I do like using Lua for config when I can, so that's another point in that column.

camgunz · on Sept 13, 2023

Oh I don't doubt that there are various inefficiencies, but I dispute the characterization that this inefficiency is "wrong". I think it's clear by now that there's not "1 text format to rule them all", whether we're talking XML, JSON, HTML, SGML, TOML, YAML, JSON, RFC 2822, CSV, or whatever. Depending on your use case, one of these is better than the others. Like I definitely don't want to serialize objects into TOML. I also definitely don't want to deal with JSON for configuration.

> ...but a schema can catch that stuff.

100%; it's amazing to me the number of projects that essentially accept arbitrary structures and/or values here. I think no matter whether you're using YAML/TOML/JSON/etc. you oughta be using something to validate.

aeurielesn · on Sept 13, 2023

I don't find TOML files easy to read/understand.

Especially when I'm scrolling through a file, I encounter myself backtracking to understand again its structure.

taeric · on Sept 13, 2023

Similarly, I don't find yaml easy to read/understand. XML had the curse of people trying to use every feature possible in most documents. And as much as I do prefer the "program" approach of emacs, I will make no defense of giant emacs config files, either.

marcosdumay · on Sept 13, 2023

XML has the problem of too many features. If you personally decides to use them or not doesn't change the fact that whatever software reads it has to support them.

XML parsing is also not a pure computation.

That said, a XML-lite would quite probably be the best data encoding language one could create right now. It would still suck as a configuration language, as it's a very different problem.

networked · on Sept 13, 2023

https://kdl.dev/ is an XML-lite configuration language with a pleasant syntax. It has a neat syntactic feature for commenting things out called "'slashdash' comments". An example from the spec:

  mynode /-"commented" "not commented" /-key="value" /-{
    a
    b
  }

A problem with a node-based design similar to XML compared to something that parses to nested lists and dictionaries is that you pretty much need a query API. Implementing it takes nontrivial extra work. This affects KDL implementations: KDL specifies a query language, but I think no Python KDL library has implemented it so far. It limits how useful KDL is in Python, despite there being multiple working parsers.

Edit: Rephrased the second paragraph.

marcosdumay · on Sept 13, 2023

That language is very interesting, thank you.

But I do disagree on the work. A node-based language parses into a sequence of (name, node_type, data) tuples. What is just as easy to travel as the (name, data) from the data map ones. The thing is, a query language is much more useful for them than for a data map, so it's worthwhile to implement one. (There's probably some cultural aspect here too.)

networked · on Sept 13, 2023

You're welcome.

Yes, you are right. Node-based structures are not meaningfully harder to traverse than nested lists and dictionaries. That was a wrong comparison.

networked · on Sept 14, 2023

> The thing is, a query language is much more useful for them than for a data map, so it's worthwhile to implement one.

I agree about this.

masklinn · on Sept 13, 2023

OTOH you can factorise and modularise an emacs config file. Sometimes an XML file if it uses xinclude (though that can be an issue in other ways).

yaml? who knows. Whether and how (and the limitations) depends on whoever cooked up that pile.

dwattttt · on Sept 13, 2023

> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.

I don't think this is a particularly good yardstick. Code without comments is shorter than code with comments, but I wouldn't call comment-less code easier to read; the more information dense, the worse, really.

AndyKluger · on Sept 13, 2023

Thanks for your work!

I first encountered strictyaml years ago and have used it, happily. I especially appreciate that you made clear arguments for what ought to be excluded from a configuration format itself, and how proper validation ultimately requires real code anyway.

I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).

More recently I came to know and love NestedText, which seems very close to what a strictyaml spec could be/have been. I'm curious if you've engaged with that project/format, and what you think of it.

hitchstory · on Sept 13, 2023

>I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).

I would dearly love to do this. I would ideally like to work with somebody who can help me though because it's a lot of work and I struggle to find the time.

AndyKluger · on Sept 13, 2023

NestedText seems very close to what a strictyaml spec could be/have been. I'm curious if you've engaged with that project/format, and what you think of it.

slowmovintarget · on Sept 15, 2023

Doesn't YAML have the unfortunate issue of ambiguity in the variety of parsing and versions? (Edit: I see you're advocating for a new subset... never mind. :) )

If you want a "language" for expressing data (like configuration data), you might be interesting in having a look at EDN. https://github.com/edn-format/edn

Hendrikto · on Sept 14, 2023

> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain

According to that logic, binary files would be easiest to maintain and read, which is obviously bogus.

politelemon · on Sept 13, 2023

I don't even think this is about DRY and possibly misunderstands the DRY principle. DRY is about having a single authoritative source of information ("Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"). Repeating a portion of a key in a configuration file is not in violation, it would be if it were the value being repeated.

stared · on Sept 13, 2023

I beg to differ. If there are indentions, it is easy to fold long lists. For TOML, it takes mental effort to check whether items are from the same list or another. Additionally, in TOML, there are multiple ways to write a list (unlike in JSON), which makes it harder to parse - at least for me.

taeric · on Sept 13, 2023

The problem with that is that indentations within indentations are just obnoxious. It is far from uncommon to add an item at the wrong level in a large config file. Worst is when you have a large file that is showing several meaningful lines of indentation on one screen, where the roots of several of those levels is not visible.

hot_gril · on Sept 13, 2023

> Not being DRY is a good thing in a config file

Also unit test code is expected to be a lot less DRY.

arp242 · on Sept 13, 2023

The (upcoming) TOML 1.1 will alleviate some of this; that example document could then be written as:

  [params]
  profile = {
  name    = "Gareth",
    tagline = "..",
  }
  
  contact = {
    enable = true,
    list = [
      {class = "email", icon = "fa-envelope"},
      {class = "phone"},
    ]
  }

The whole business with the syntax typing has no "one correct way" to do it. No matter what you do it will cause problems and headaches for someone at some point somewhere.

> Dates and times, as many more experienced programmers are probably aware is an unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases. TOML experiences many of these edge cases because of this.

Eh? In the original text it links to three issues to back this up:

That first issue it links to is "failed to parse long floats like x = 0.1234567891234567891" and the third is a feature request for hex values (v = 0xff, not even a bug report). That has nothing to do with dates? The second issue did relate to dates, but was just a simple bug, not an "unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases".

This just seems repeating a tautology. I maintain a TOML implementation that sees some reasonable use. Dates have not been a huge source of bugs, confusion, or other issues. All you need is to be able to parse RFC 3339 style dates (and some things derived from that), which is usually just calling strftime() or whatever your language has for this.

I do think TOML had some bits I wouldn't have added (not dates though), but the feature sets and complexity of TOML and YAML and not even comparable; it's like comparing Iceland (pop: ~300k) to Ireland (pop: ~5M). Yes, they're both islands and both are small countries, yet the scale of their "smallness" is just completely different.

hddqsb · on Sept 13, 2023

The inline table syntax is awesome! Details:

https://github.com/toml-lang/toml/blob/main/toml.md#inline-t...

https://github.com/toml-lang/toml/pull/235

https://github.com/toml-lang/toml/issues/516

ollien · on Sept 13, 2023

This might convince me to start using TOML. I hate JSON for configs (not all parsers support comments, yes I know JSON5 exists), and TOML's table syntax really sucks. YAML has its flaws, but it fits the bill the best IME.

Now I just have to hope enough TOML tools support this syntax, lest I end up in the same boat as JSON5.

mdaniel · on Sept 13, 2023

> The (upcoming) TOML 1.1

Out of curiosity, how does the deserializer know which "standard" to use?

AIUI one can pin a version of YAML via its directive: %YAML <https://yaml.org/spec/1.2.2/#681-yaml-directives> with a missing one implying 1.2 although (heh) the 1.1 version says that documents which are missing their yaml directive are implied to be 1.1 <https://yaml.org/spec/1.1/#YAML%20directive/> so ... versioning, it's hard!

arp242 · on Sept 13, 2023

There is nothing for this, which is probably fine. Previous thread on this: https://news.ycombinator.com/item?id=36023321

tgv · on Sept 13, 2023

What's the difference with JSON in that example? No double quotes around keys, that seems to be about it. It seems more practical/readable than JSON if you have very limited need for nesting, but the old format would suffice then too.

arp242 · on Sept 13, 2023

In that specific example not too much, but obviously there's a whole bunch of differences between TOML and JSON, starting with the fact that you can add comments as has already been discussed, and many more.

rightbyte · on Sept 13, 2023

> you can add comments

Given how many comment remover util functions I have written to make almost JSON config files proper JSON ... why could they not just have included those in the spec.

arp242 · on Sept 14, 2023

Because JSON is not designed for configuration files, but for data interchange, and this makes JSON "better" for that use case (the original motivation was "people were adding comment-based pragmas to JSON files, so I removed comments", but without comments it's also easier to parse, concatenate, etc.)

mardifoufs · on Sept 13, 2023

Will python support the new version for pyproject.toml?

arp242 · on Sept 13, 2023

I would expect so, yes.

kzrdude · on Sept 13, 2023

It already supports inline tables

AndyKluger · on Sept 13, 2023

But the whitespace handling involved is an extra complication, at least the provided example fails to be parsed by `tomli`, which is following the currently release TOML spec: https://github.com/hukkin/tomli/issues/199

hprotagonist · on Sept 13, 2023

> StrictYAML, by contrast, was designed to be a language to write readable 'story' tests where there will be many files per project with more complex hierarchies, a use case where TOML starts to really suck.

I can’t be the only one who feels this way; isn’t it this use case the thing that sucks? “plain text but not really” config formats that aren’t “code” but have special syntax but lack a debugger and handy IDE tools and you’re never really sure of what you’re doing … isn’t that the thing that sucks?

masklinn · on Sept 13, 2023

That was also my first reaction opening the example, I’m perfectly fine with a half-assed programming language (embedding more programming languages) not working in TOML, I don’t think it works in yaml in the first place, and it’s exactly the sort of usual mess which makes me recoil at the sight of a yaml extension.

This specific example is like somebody saw cucumber and went “I think this should be a lot worse, and in an ecosystem which doesn’t want to come anywhere near that too”.

This would probably be half the size and actually comprehensible if it were just a pytest test file.

hitchstory · on Sept 13, 2023

Hi, I'm the author of hitchstory.

With the stories in YAML two new use cases (which are demonstrated in the example projects) are enabled:

* Automatically updated how-to docs. I used to write these types of docs manually and if they existed at all they would ALWAYS get out of sync with the code and were painful to maintain manually. Now I have YAML files and a simple jinja2 template I can push out new markdown how-to docs on each new build - with snippets of JSON, screenshots from the app, whatever.

* Tests that rewrite themselves. E.g. if I have a REST API test of the form "call API x and expect y blob of json", I don't have to actually write that blob of json into the test, I just write the code that produces it and run the test in rewrite mode so it updates the "expected JSON" field with actual json. I can then eyeball it and 20 seconds later it's part of the test and part of the docs.

The productivity improvements from doing both of these things means that writing tests is cheaper so I do more of them. Having how-to docs for all scenarios is way cheaper so I now always have them.

These use cases are impossible with pytest. They are impossible with cucumber.

They would be too painful to maintain with regular (i.e. not type-safe) YAML and those stories have enough indents that they would be an epic unreadable mess of syntactic noise if they were built in something like TOML or JSON.

dharmab · on Sept 13, 2023

The first thing reminds me of example tests in Go: https://go.dev/blog/examples

hitchstory · on Sept 13, 2023

Similar. It's like that but in reverse. Instead of running snippets of your docs as code, the tests and their metadata are used to compile doc snippet or entire how to docs pages.

dharmab · on Sept 13, 2023

That is how Go's examples work- your example tests are inlined into the GoDocs at pkg.go.dev

fishyjoe · on Sept 13, 2023

> Tests that rewrite themselves.

That sounds like expect tests in OCaml [1]. I've found them quite a joy to work with and I'm surprised more languages don't have something similar.

[1] https://dev.realworldocaml.org/testing.html#expect-tests

skrebbel · on Sept 13, 2023

The second thing reminds me of how Jest does snapshot testing: the first time you run it, it simply edits the JS source file with the result. For a test runner to edit test code feels weird at first but it works spectacularly well.

Terretta · on Sept 13, 2023

Your way of thinking, shared transparently not just in your "why not" section but throughout, is a breath of fresh air in the miasma of grabbing a tech because dogma or hype.

It's not that I agree with all your choices. It's that I'll defend to the end your method of making them.

schmuelio · on Sept 13, 2023

Yeah that use case really sounds like they're on the cusp of just making a DSL from a config language.

If you need it to be _that_ complex then just write your configs as code...

marcosdumay · on Sept 13, 2023

I don't have any problem with complex "symbol vs literal" resolving in configuration languages. Unless your syntax is very weird, it should be very hard to confuse those.

That said, YAML's syntax is very weird. YAML just sucks. And any such implementation must necessarily be unityped (up to the point where the data is coerced into the configuration structure at your program) and completely preserve the original data.

TOML could be extended to support it. I don't think it's "tasteful", but I see no practical problems with it.

masklinn · on Sept 13, 2023

> TOML could be extended to support it.

Seems doubtful as that's specifically something TOML was created not to support. If you want unityped ini files you can define that dialect of ini files.

myaccountonhn · on Sept 13, 2023

I agree, I quite like https://dhall-lang.org/ for that reason. It strikes a good balance between features and being a config language.

baq · on Sept 13, 2023

Eagerly waiting for a Yet Another Human Readable and Writable Graph (Which Is Mostly a Tree But Not Always) Serialization Language With Built In Schemas and Pure Functions, Maybe made with <3 for humans.

Actually, that, but without sarcasm. YAML is crap. TOML is crap. JSON is crap. INIs are actually fine for flat lists but major crap otherwise. Anything Turing-complete is completely inadequate crap for the purpose.

RedNifre · on Sept 13, 2023

Try edn: https://learnxinyminutes.com/docs/edn/

dkersten · on Sept 13, 2023

I especially love the Integrant[1] version which uses the E from EDN (extensible) to add references:

    {:adapter/jetty {:port 8080,
                     :handler #ig/ref :handler/greet}
     :handler/greet {:name "Alice"}}

[1] https://github.com/weavejester/integrant

masklinn · on Sept 13, 2023

I like edn but I’m not convinced it’s great for configuration.

First off, it shares the desire for extensibility with yaml (in the form of tags), which is a giant trap. And then it’s quite syntactically noisy.

dkersten · on Sept 13, 2023

I love EDN (especially the integrant extensions) for configuration of the stack — that is, dependency injection and so on, the developer-focused configuration.

For user-facing configuration I still favour TOML. I think it’s a bit application dependent, sone applications (eg nginx) have complex configuration needs and for that it makes sense to use something more sophisticated, but for many user-facing config settings, a simpler-TOML would be a great fit. Basically just some basic key-value pairs that can be collected into groups. As the article states, the parsing types should perhaps be enforced by the parser not the written config.

jmkr · on Sept 13, 2023

Why do you love edn? Specifically I also like edn, but I always felt like my use of it was glorified json.

I did like integrant, at least I felt like I understood it, unlike component.

There were two things about edn that made it seem better than json to me, tagged elements (and readers), and symbols. I don't remember exactly my use case but I used symbols in edn to something like namespace-resolution for multimethods. It was something like including a file in a classpath or loading a file, dev vs prod kind of config.

dkersten · on Sept 13, 2023

I suppose mostly superficial reasons such as liking the keyword syntax better that using string keys in json, but also the richer set of data types (like sets), although that mostly doesn’t matter in practice, but it is very nice when using Clojure since the Clojure types just work. As an interchange or storage format it’s also nice to be able to store Clojure types and get them back the way they were out in, which if using json you would lose some information.

Mainly it’s just personal taste and no deep reasons.

eternityforest · on Sept 14, 2023

Too many features that can't be represented in standard programming language constructs..

JSON-alikes are opinionated. They don't let you do stuff that requires a plugin for the parser to work with

jmaker · on Sept 13, 2023

Dhall is superb, https://dhall-lang.org

37469920away · on Sept 13, 2023

Thanks for that, it is interesting.

Is this actually something I want to see in a configuration script? Why not just use a scripting langauge and be done with it? I wonder if the safety features can't be replicated with rigorous testing of say python as config scripts instead of learning yet another programming language?

https://prelude.dhall-lang.org/Text/concatSep.dhall

I think this is the key fulcrum for me: "config is code", sure, but not the same kind of "code".

It -is- compelling to argue for statistically deterministic config code but my practical objection here is 'can we arrive at same safety using testing with a known language?'

Writing this has made consider whether configuration should be conceptually looked at as a database instead of "code". How many people even know how e.g. postgres stores its tables and why would modulo some performance niche would you care anyway?

It seems configuration management is a graph db query and update matter. Standardize on configuration query language (if necessary) and stop worrying how the damn thing is represented by the config management tool.

jmaker · on Sept 13, 2023

Depends on your requirements. If your config complexity is getting beyond manageable, the benefit of something more reliable is apparent. Type safety clears lots of common bugs no test suite would be certain to filter out. How complex should your tests get? Do you want to take on that responsibility or rather delegate it to something that provides you certain guarantees? It’s all very subjective in the end.

I run my configs mostly as YAML in Consul and Vault, sometimes in Spring Cloud Config with a git backend. This way I have dynamic config evolution. But I prefer to generate those yaml files from Dhall to avoid unnecessary bugs. After years with Haskell, the syntax is very natural, too.

As for Postgres internals, they do matter if your data set keeps growing.

Xkcd covered standardization. YAML and JSON ASTs are graphs, YAML not necessarily a tree. JSON extensions also support references. As for the ops side, YAML has become a de facto standard, HCL is used with the HashiCorp tools. Nix has its own language.

It’s not about how it’s represented but how you express dependencies across config key nodes. It’s good to avoid repetition and have a syntax linter, a compiler even better. Small static configs are amenable to querying and writing. But you need to separate the writing from the querying the configs. With Dhall you write code to generate the actual config, whether as a Dhall AST or exported to YAML or JSON, with certain correctness guarantees upfront.

baq · on Sept 13, 2023

> Why not just use a scripting langauge and be done with it?

I want my configuration to be guaranteed to halt. Turns out it's hard to not make anything useful accidentally Turing-complete!

Hendrikto · on Sept 14, 2023

At this point, I‘d rather just use a proper language.

conradludgate · on Sept 13, 2023

Maybe kdl[0]? It's a document language somewhere in between xml and yaml without all the crap of either IMO

[0]: https://kdl.dev/

alpaca128 · on Sept 13, 2023

I use KDL in a (not too complicated, yet) config file of a project and I like it a lot. Tree structure with attributes like XML but with less syntax than JSON. Nothing redundant but has basics like comments.

BiteCode_dev · on Sept 13, 2023

Let'add another one: CUE

https://cuelang.org

larschdk · on Sept 13, 2023

INI-files are crap too, as there is no standard. Every program has it's own dialect, and automating around them can be a pain.

rapsey · on Sept 13, 2023

INI files are toml

PennRobotics · on Sept 14, 2023

I have a beef with YAML/TOML/JSON as an inline front matter (header) format for SSG (static site generator) posts, but that's because if I want to save a mostly Markdown SSG post on Github...

1. the post is previewed with Github's dialect of Markdown and not the SSG's (and definitely with none of the inline configuration applied)

2. the preview is still a Markdown document, so you get no benefits of syntax highlighting or auto-formatting w.r.t. the config header (short of opening in Vim and explicitly declaring the filetype or temporarily changing the extension in Github's editor)

3. you HAVE to put trailing spaces in the YAML/TOML/JSON or the preview pane crams everything into an unbroken paragraph

4. there's not a quick preview of how the configuration will parse, just specific workflows (live update, compile single page) that you can test and then modify as needed. This is either in the rich online editor or your own machine and will require console commands and a browser window

5. I still have to know all of the modifiable attributes as well as defaults, which will be in a separate document and probably not in a Ctrl+Space dropdown

-----

For point 5, it would be nice if configuration formats had completions for common editors and/or their own scripts:

* "generate big config file with all possible keys and default values"

* "condense modified config file so it only contains non-defaults (and hope the schema doesnt change hahaha)"

* "suggest a valid fix for a currently invalid config file"

-----

Sure, this all isn't a direct criticism of TOML, but inlining configuration is a great-to-okay idea that is simply poorly executed. It is extremely unfriendly to non-technical users; I can fully understand why someone would pay a few hundred a year for a WYSIWYG templated website builder to just handle everything.

WorldMaker · on Sept 13, 2023

I'm surprised no one has thrown in raw S-Expressions, yet, in this thread. That's an HN perennial favorite. Great at trees, decent at graphs. You can easily go Turing complete or not on the whim of any Lisp at hand and the hammer and forge of macros until your heart's content.

tikhonj · on Sept 13, 2023

I interned at a company that used an s-expression format as an alternative to JSON and it was great—much better than any mainstream format for config files and human-readable representations for API data. It was great at representing structured data (including tagged variants, so full algebraic data types) and even pretty good for text markup.

It also has best-in-class editor support thanks to paredit :)

The only real downside is the hassle of convincing people to use something weird and non-standard, which is really more a problem with people than with the format.

ducktective · on Sept 13, 2023

Nickel[1] maybe? Though I'm not sure about "for humans" part :)

[1]: https://github.com/tweag/nickel

eternityforest · on Sept 14, 2023

INI is really nice. Anything you can't represent in it, is getting towards not quite just configuration.

I think my ideal format would be INI, with tagged headings so you could do [md:README] and stick a multi line string section in there, or use something like [comment:Module Attributes] to make an embedded docs section.

Either that, or just some other method of embedding multi line keys. Maybe HTML-like tags, so that inside a heading you can do

<my-key> value </my-key>

ianburrell · on Sept 13, 2023

I think it is important to distinguish between human-readable serialization format and human-readable config language. I wish the distinction was made more explicit.

JSON is the champion serialization format. But it is hard to edit config files with missing comments and strict quotes and commas. JSON is great for the API, but configs should be converted from something nicer to JSON. I wonder if it would be good to make the conversion explicit so any format could be used.

twic · on Sept 13, 2023

Properties files have been fine, i guess since 1995.

paulddraper · on Sept 13, 2023

HOCON

(But extrapolating, I'm guessing it goes into your crap category.)

incrudible · on Sept 13, 2023

Exactly, so just use JSON for configuration. This will upset the people that have not yet learned to get over it, but you really need people that got over it.

baq · on Sept 13, 2023

JSON is the runner up on the top crap list right below Windows Registry.

remram · on Sept 13, 2023

No comments means it's not human friendly.

incrudible · on Sept 14, 2023

I would strongly advise against using JSON to configure humans.

remram · on Sept 14, 2023

That's a first, the ol' HN switcharoo?

BiteCode_dev · on Sept 13, 2023

I do this, but regularly get annoyed with the date, comment, trailing comma, mandatory key quotes, etc

retzkek · on Sept 13, 2023

Maybe you'd like jsonnet: https://jsonnet.org/

I find it particularly useful for configurations that often have repeated boilerplate, like ansible playbooks or deploying a bunch of "similar-but" services to kubernetes (with https://tanka.dev).

Dhall is also quite interesting, with some tradeoffs: https://dhall-lang.org/

A few years ago I did a small comparison by re-implementing one of my simpler ansible playbooks: https://github.com/retzkek/ansible-dhall-jsonnet

BiteCode_dev · on Sept 13, 2023

If I have to use a better format, I will use toml or cue.

The whole point of the parent is that JSON is the Pareto solution, which I agree with.

But it does grind my gear.

paulddraper · on Sept 13, 2023

jsonnet is turing complete.

It's configuration with a build step. Which is an option, but that's pretty different than JSON, YAML, TOML, etc.

thayne · on Sept 13, 2023

> It's very verbose. It's not DRY. It's syntactically noisy.

I don't completely disagree with this. However, in most cases TOML is used, it isn't that much of a problem.

And I actually like that the full key is repeated. When you have several layers of nested mappings, it can be hard to determine exactly where the current value is in the hierarchy. Especially if the top level key is above the current screen of text. It can also make it easier to search for a specific key. IMO, this is a case where more verbosity and repetition makes it more readable.

That said, it seems a little arbitrary to me that inline tables don't allow newlines within them. If they did, then if you didn't like repeating the keys, you could use inline tables.

> TOML's hierarchies are difficult to infer from syntax alone

This is a little subjective, and depends on the actual data represented in the config.

But in general, my experience is that when you have several layers of nesting, and the only indication of the hierarchy is indentation, it can be a little hard to follow where a specific value fits in the hierarchy. See above.

And I disagree that meaningful indentation is "generally considered a good idea". I won't enumerate the pros and cons here, as it has been discussed a lot elsewhere, but it is definitely controversial, and subjective.

> Overcomplication: Like YAML, TOML has too many features

This section lists exactly one feature that it thinks TOML shouldn't have. Maybe dates shouldn't have been included, but it isn't anywhere close to the complexity of YAML.

bemusedthrow75 · on Sept 13, 2023

I'm not sure it's clear from the article -- only the URL -- that the writer is also the author/maintainer of StrictYAML.

This article has been written in a way that (most likely inadvertently) implies a measure of distance from StrictYAML.

dagw · on Sept 13, 2023

One of the more interesting config formats I've come across was an application that used an Excel file. Once you get over the horror of such a terrible decision, it was actually a quite interesting choice that allowed a fair few advantages. First of all each config subcategory was on a separate sheet making easy to navigate and find what you where looking for. You could use formulas to relate different config options (If you wanted A to be 20% of B you just set that in a formula). You could use drop-down for fields where there were only a limited number of valid values. You could include as much comments and documentation as you wanted (including diagrams and images) as long you only wrote in unused cells. And finally, my favourite, when configuring colours, instead of typing in RGB or hex values you simply changed the colour of the cell to the colour you wanted.

Now I would obviously never ever recommend doing this, but it was certainly an interesting and eyeopening experience.

macNchz · on Sept 13, 2023

Too many encounters with Excel’s “smart” date parsing would make me very concerned about using it this way.

https://support.microsoft.com/en-gb/office/stop-automaticall...

Rygian · on Sept 13, 2023

That page is such a gem.

> "make it easier to enter dates. For example, 12/2 changes to 2-Dec"

12/2 is obviously the 12th of February in my locale. But I need to keep Excel in English as a company policy, so this is not only unhelpful, it's outright wrong.

> Unfortunately there is no way to turn this off.

How does Microsoft justify this choice?

macNchz · on Sept 13, 2023

I’ve never understood why they don’t provide the option to disable it, it’s not like Excel is a sleek, minimalist piece of software with strong opinions and limited configuration.

I also love how the support article describes a behavior of their own software as “very frustrating”.

fluidcruft · on Sept 13, 2023

At my co-op decades ago we used to use Excel "templates" as masters to generate/maintain text config files (and iirc a few C headers). You would save as text to make it usable. The grid layout, ability to highlight/color/border/style etc and use formulas and plot was very helpful.

ics · on Sept 13, 2023

Underrated technique. I’ve used it similarly to generate scripts and it works great when used with care (error checking generally must happen at a different stage).

btreecat · on Sept 13, 2023

That's just begging to be an actual database.

Imagine if you said everything with SQLite instead of Excel, and all of a sudden your just talking about structured config in a database. Not new, not crazy, and generally a decent practice.

dagw · on Sept 13, 2023

The big difference is that every Windows computer (in a professional environment) comes with a very nice GUI tool for easily editing Excel files. A tool that basically everybody know how to use. The same cannot be said for SQLite.

SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.

johannes1234321 · on Sept 13, 2023

While I'm not sure I'd be happy about "end users" to edit the Excel files as config. Somehow they get the cells mixed up and you get utter mess. And then excel confuses some value for a date and stores some other mess ...

For a somewhat trained audience however it can be quite interesting for some specific problem domain ...

btreecat · on Sept 14, 2023

>SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.

I think you are conflating the file with the workflow. A proper UI is the solution to making something not "terrible for end users to edit".

eternityforest · on Sept 14, 2023

Why doesn't Microsoft just add SQLite to Excel, so your formulas can query it?

btreecat · on Sept 14, 2023

Because how else would they push MS Access?

harperlee · on Sept 13, 2023

Well, that's a complete apples-to-oranges proposal with completely different requirements.

With the original solution, you have an autocontained file that virtually any user knows how to edit, structure, expand, version, email, compare, discuss...

The level of user knowledge that you need for a similar solution based on a standalone SQLite file that you can version is another different world, e.g. to relate two values you would need to perhaps create a view or a trigger. And even with the most knowledgeable user you would still lack functionality such as simply pasting an image as a means of documentation and be able to see it, or WYSIWYG colors.

sofixa · on Sept 13, 2023

> version, compare

Gonna have to disagree with you here. Very few people know how to version, compare, let alone if there are multiple collaborators, Excel files. Is there even a decent way of doing that outside of Excel Online / Google Sheets?

harperlee · on Sept 13, 2023

Well I've seen a great deal of people comparing two Excels by quickly alt-tabbing, and doing Sheet1!A1=Sheet2!A1 comparisons in a third sheet, and dragging the formula.

Yes, that's horrible. But millions of people can autonomously resort to this, and they would be incapable of doing anything with a SQLite file.

Same with version control: you just add _v17_20230909_final to the name. Yes, it's horrible. Yes, it's buggy. But yes, it also runs the world.

btreecat · on Sept 14, 2023

>But millions of people can autonomously resort to this, and they would be incapable of doing anything with a SQLite file.

Are you trying to suggest that either you can't do that with sqlite, or that people didn't require training to do this in Excel?

harperlee · on Sept 15, 2023

harperlee: open list of affordances that an average user comfortably identifies with an Excel file, not really with sqlite

sofixa: version and comparison is not really supported in Excel

harperlee: agree that software support is not there, but in practice average user knows how to do it and is quite comfortable doing it in Excel

btreecat: sqlite also has filenames and you can learn how to use it

harperlee: agree on the filename, Excel files and sqlite files are externally, opaquely versionable in the same way. the point about people learning is moot though, average user already knows Excel because they were forced to in the past, but does not have the time to learn new things.

btreecat · on Sept 14, 2023

Fundamentally they need a tabular data store. Most everything else is a nice to have/usability point.

SQLite is also an auto contained file. There are similar tabular GUI tools that could let you interact with sqlite using a similar workflow. Users knowing that thing is not inherent, they had to be trained on it, and they can be retrained as they will be for other workflows and business tools.

Remember, you still need an external app (Excel) to open it's files, the files themselves are just data, exactly like sqlite. So you could just make an excel plugin to interface with SQLite.

SQLite is as version-able as an excel file, as in not very with standard tools.

Why are you pasting images into excel? Doesn't matter, sqlite handles that fine actually. https://www.sqlite.org/fasterthanfs.html

The main argument against sqlite is that you would need to build an interface or figure out how to train folks on existing tools. That's not a huge argument against it in my experience, it's a strong social/political one in many orgs but rarely a technical issue.

harperlee · on Sept 15, 2023

> The main argument against sqlite is that you would need to build an interface or figure out how to train folks on existing tools. That's not a huge argument against it in my experience, it's a strong social/political one in many orgs but rarely a technical issue.

The design of a solution needs to look into way more requirements than just the technical ones, time and money being 2 big ones. I think most of the HN readers would agree that you could end up building an interface with most of the Excel functionality, even more perhaps, on top of sqlite, and have your particular group of users trained on it.

michaelbuckbee · on Sept 13, 2023

Now I just want a spreadsheet style front end to SQLite and I'm nerd sniped into trying to figure out how to do formulas (triggers I suppose).

speed_spread · on Sept 13, 2023

Congratulations, you're about to reinvent MS Access / DBase / FoxPro... 30 years later.

michaelbuckbee · on Sept 13, 2023

That's an excellent point.

speed_spread · on Sept 13, 2023

I'm sorry for the initial sarcasm, it would actually be pretty cool to have those type of apps back. With everything going web-first nowadays, you could take inspiration from SQLite's own SCM, Fossil, that can run both as a CLI or a Web server.

eternityforest · on Sept 14, 2023

I've been thinking it would be cool to build something like this. What gets me stuck is version controllability and backups.

I think ideally you'd want multiple backends, either SQLite, or flat files that are Git/SyncThing friendly.

I was even thinking the file format could have a file UUID and record timestamps, so that if you put a different version of the same file in the same folder(Like with a SyncThing conflict file) it would give you a merged view, with newest-record-wins logic.

Formulas I think would be the easy part. Just write it all in Python and use one of the many Excel compatible formulas implementations, and just make sure all changes went through the app.

Maybe you could even have a REST API to build other things on top of it.

The web frontend wouldn't be too hard, it could just be a Vue3 app with an HTML table element.

Then you could have a cell type who's value was a query, which would embed a DB browser list widget in the cell, and that cell type could have the option to bind its selected row to another cell.

Like VB+Excel+Access+My fork of Freeboard with inputs in one!

masklinn · on Sept 13, 2023

For the most simplistic row-wise formulas, sqlite has generated columns.

I’d assume the issue will be that an sql table is not free form, you can’t randomly decide to write in some other cell.

lucumo · on Sept 13, 2023

Were there any downsides? Because this sounds kind of awesome.

dagw · on Sept 13, 2023

Once you got used to it, honestly not really, other than needing Excel (not a huge deal since it was a Windows only application). I've no idea what the config parsing code looked like, but with the right library I doubt it was worse than any other non-trivial config parsing code. Mainly it just felt very wrong to my Unix, everything must be a text file, brain.

sowbug · on Sept 13, 2023

Never thought I'd get a chance to tell this story.

Early 1990s, college internship. The company did presentations for clients, like many do. They had an unusual way of presenting data that required using actual protractors to draw circles and curves, with pencil, on otherwise computer-generated charts. They read numbers from Excel spreadsheets and plotted them on paper.

I was shocked, to say the least. I proposed writing a program that read Excel spreadsheets and emitted the graphics. They loved the idea, especially from a summer intern.

So I wrote a letter to Microsoft asking for documentation of the Excel file format. A week later I got a thick envelope with a photocopied manual completely describing the format. I remember the word BIFF throughout. I wrote the program, it worked great, and I even negotiated a hefty lump-sum payment to sell it to them at the end of the summer.

It left me with a very positive impression of Microsoft as a developer-friendly company. Makes sense; developers are their platform's customers, and they're good at serving their customers.

arethuza · on Sept 13, 2023

There are some pretty awesome libraries for reading and working with Excel files - Aspose.Cells being one I used for years - basically a headless re-implementation of most of Excel usable via an API.

throwaway290 · on Sept 13, 2023

Well, xlsx is just a zipped dir of text (XML) files, so you can think of it as sort of following the .d pattern...

kagevf · on Sept 13, 2023

> the .d pattern

What's that? I ddg'd it and I got a bunch of hits for regex \d ...

urinotherapist · on Sept 14, 2023

AKA "config directory".

See https://jmmv.dev/2020/08/config-files-vs-directories.html

kagevf · on Sept 14, 2023

That was an informative read - thank you!

throwaway290 · on Sept 13, 2023

conf.d, init.d

kagevf · on Sept 13, 2023

Ah, got it, thanks!

andrew_eu · on Sept 13, 2023

For better or worse I created a very similar system, but using Google Sheets instead of Excel files and fetched with the GSheets API. In that case the configuration was gigantic (many of the configuration values were product-decisions, and it ran in hundreds of different environments with different tweaks), so the tabular structure made navigating things very natural. It also had the advantage of structurally highlighting how environments differed. Doing it with Google Sheets came with some extra nice benefits: online sharing, versioning, access control down to specific ranges, etc.

Basically every engineer who joined the team thought it was a unforgivable blemish on the system, yet it survived a few years with no major issues, long enough for the team to build an internal backoffice and port the whole sheet structure into a proper CRUD API.

hot_gril · on Sept 13, 2023

Biggest downside I've discovered by doing this is that there's not a very good way to diff separate versions of one sheet. Best I've done is export to CSV and use a generic text diff.

nonethewiser · on Sept 13, 2023

Most of these sound like things a native programming language can do. Without being a damn excel file. So it’s basically an argument for making your config files JavaScript, Python, etc.

dagw · on Sept 13, 2023

You're missing the point. Power came not from Excel the file format, but Excel the GUI tool for editing Excel files. You cannot have drop-downs and colour pickers and separate tabs and embedded images in a single python file without writing a a whole custom GUI config tool.

nonethewiser · on Sept 13, 2023

Python can and often is edited by GUI tools with color pickers and dropdowns. IDEs such as Vscode have these.

Also, I said “most.” Not, “there exist no exceptions.”

mnstngr · on Sept 13, 2023

Having a visual way to configure this makes it much more accessible to non-programmers, with error checking available through the host (Excel, in this case), while also reducing the eng effort in building this.

At Google, many internal tools use Sheets as their source of truth for config data, and it works really well.

nonethewiser · on Sept 13, 2023

> Having a visual way to configure this makes it much more accessible to non-programmers

I agree completely but thats not the point he appears to be making. He never stated this was the use case and he reiterated that it was a bad idea which should never be done.

Regardless, I’m saying those arent really unique advantages to excel. They just look unique compared to json, toml, yaml, etc.

hot_gril · on Sept 13, 2023

I can't think of any data entry method more easily understood by non-programmers than an Excel spreadsheet (or Google Sheets etc). Repetitive data in particular is a breeze since you can drag-expand and use formulas. We use GSheets at work for configs created by non-programmers, and it works great.

nonethewiser · on Sept 14, 2023

What is your point? You can see from my previous comment I already agree excel is user friendly for non programmers.

hot_gril · on Sept 14, 2023

The user-friendliness is a unique advantage of Excel.

LeonenTheDK · on Sept 13, 2023

I think the difference there though is that an Excel file is a lot more approachable to non-technical folks (even superficially). Depends on what's being configured and by whom though.

speed_spread · on Sept 13, 2023

Excel sheets are an underappreciated tool for sharing technical info with analysts. They're easy to write and read from both dev and analyst side. They can be safely zipped, archived, emailed, saved on a USB drive by the user without any additional programming. Editing in Excel makes the expected schema clear. It's an excellent data interface.

WeAddValue · on Sept 13, 2023

I use a Google Sheet as it can be accessed from anywhere. Even my non-techie business folks can make changes (they feel at home in spreadsheets). I even added a menu button to the gsheet to launch a re-build (it is a static site on Netlify; the build fetches it's config/data from the gsheet).

kagevf · on Sept 13, 2023

Did they version the spreadsheets? Or maybe converted them to text and versioned the text format?

hot_gril · on Sept 13, 2023

It doesn't sound terrible. This is what we do at work for large configs that non-programmers touch, except it's Google Sheets instead of Excel. It works, and they'll often refuse to use anything else.

smitty1e · on Sept 13, 2023

.xlsx and openpyxl For The Win!

Cross-platform; copious spreadsheet applications; no-one needs training; scales.

zelphirkalt · on Sept 13, 2023

I see it more like:

(1) A huge dependency in the project for reading Excel files.

(2) Everyone needs training, as usually people in my profession have rarely if ever any need for Excel.

(3) Visuals != contained text, so the config might be different from what you see on screen as the config value.

(4) No proper version control, even a csv file would be better.

(5) scales? lul, have fun trying to solve merge conflicts. Also don't come with any Excel git plugins. It will only make the bad decision worse.

baq · on Sept 13, 2023

xlsx is zipped xml. it might actually work. your profession must be very weird, though - knowing about git merge conflicts and not knowing how to use excel sounds like an empty venn diagram.

zelphirkalt · on Sept 13, 2023

I wrote "usually people in my profession have rarely if ever any need for Excel". But I will explain.

Many software developers usually have no need for something like Excel. Either they use some free/libre alternative, because they know about it existing, or they use an actual programming language, or some might even use something like Emacs org-mode spreadsheets, or they use some library like Pandas for things, where it is reasonable to get out the tools. Software developers are also more likely to be aware of the technical debt incurred by storing anything inside Excel formats and will avoid it, if they are wise.

As such many software developers rarely use Excel, if at all. I personally don't use it at all. All my simple spreadsheet needs are covered by Libreoffice Calc or Emacs org-mode. If I had to use Excel now, I would not know the names of functions (translated perhaps, because Excel does that silly stuff) or how to reference cells (Is it $ and then the number? And : as a separator between col and row?). So yeah, to properly use it, many of us would have to learn at least a little of it.

Many if not most cases of Excel usage are actually due to people not knowing the alternatives, or perhaps knowing they exist, but not having the knowledge to use them (like with programming and quickly dishing out a few Pandas calls or Emacs org mode spreadsheets).

baq · on Sept 14, 2023

FWIW when I say Excel, I mean one of the big spreadsheet tools.

Excel is like Python: the second best tool for a lot of problems. There are problems which take five minutes to solve in the shell or in a text editor (maybe less now when LLMs can straight produce certain solutions with a simple prompt) and they take 10 seconds in a spreadsheet including copy and paste. I really recommend spending some time with Excel just as I recommend reading the table of contents of your primary DBMS’ manual.

I’d actually argue that being able to solve a problem quick and dirty in excel and in then in a more proper way in pandas is a good thing.

abenga · on Sept 14, 2023

All my life I have used Excel, OO Calc (now LO), and Google Sheets interchangeably and I have found experience in one carries across both the others pretty seamlessly. I actually think of them as just one thing. Formulas are pretty much the same, especially.

NateEag · on Sept 14, 2023

I have resolved many, many git conflicts.

I am perennially lost in all but the very simplest spreadsheets.

baq · on Sept 14, 2023

If it helps, you can think of excel as a purely functional programming language with a 2D UI for memory visualization and editing.

Tainnor · on Sept 13, 2023

I feel that as long as you use it for some simple configuration, you can use JSON, YAML, .properties files, TOML, whatever.

The problem IMHO is that we're using "configuration" files for things that aren't configuration. The "story" example from the blog post illustrates this. I find it hard to read YAML files that are dozens or hundreds of lines long.

Furthermore, once your "configuration" starts being so long, there's usually going to be enough repetition that you want to extract some duplication. YAML does have some facilities for this (anchors), unlike some other formats, but they're extremely limited.

So what happens is that different tools using YAML all start designing their own mechanism for sharing behaviour. It's all usually very ad-hoc, has edge cases and may not do things in the way you expect them. It also forces you to learn the specific rules for these facilities instead of allowing you to reuse your general programming knowledge.

On top of it all, YAML is essentially just a structureless key/value data structure. You can add schemas, but as far as I know, this isn't really standardised and editor support is... variable. In the worst case, you don't get any indication that you've configured something wrong. This is also part of the reason why I think that significant whitespace is OK for a programming language (still not a fan of it though), but bad for a configuration format, because bad indentation in a program either won't parse or will lead to obvious runtime errors, whereas bad indentation in a YAML file might just mean that a key isn't being set even though you think it should be.

For authors of tools that consume YAML, this means writing a lot of custom validation logic instead of relying on standard techniques like type systems.

I think we're on the wrong track and essentially just repeating XML's mistakes (just slightly less verbose, but also without schemas). We should rather use the programming constructs we know, e.g. by leveraging internal DSLs (I think that's part of the reason why Ruby was popular for tools like Chef for a short period, why Jenkins uses Groovy and Gradle now uses Groovy or Kotlin - these languages make internal DSLs easy). If we're worried about Turing completeness, maybe Dhall or something like it is the answer. But 400 line long YAML files with custom "!reference" tags that my editor doesn't understand doesn't seem like the solution.

seanhunter · on Sept 13, 2023

This exactly. When the author is complaining that the configuration syntax doesn't support DRY you know something has gone wrong and configuration isn't really to blame.

hk1337 · on Sept 13, 2023

I've thought this for as long as I can remember. People overcomplicate the config file and try to make the one config file to rule them all.

I like TOML, I started to look into using Hugo over Jekyll though and the TOML seems weirdly abstract and difficult to follow.

fomine3 · on Sept 14, 2023

Great wrap up.

NoboruWataya · on Sept 13, 2023

It is the least bad configuration format I have found. Granted I have only ever used it for fairly simple projects. But every config format is plagued with issues. A bit like programming languages, the fundamental problem is that they need to be easily understandable by both humans and computers which is an impossible problem to truly solve for any non-trivial use case. For example, TOML is criticised for verbosity but a lot of the abstractions that are used to implement DRY in a programming context may make the configuration confusing and unintuitive for non-programmers.

IshKebab · on Sept 13, 2023

Jsonnet or JSON5 are much better than TOML or YAML.

Both are much easier to read, and don't have the footguns of YAML or StrictYAML.

I would generally say JSON5 is more appropriate because it is simpler, but Jsonnet does have some neat features and its IDE support is much better.

hot_gril · on Sept 13, 2023

At least general programming languages like Python or JS are well-understood by many programmers. More than makes up for not being as specialized as a DSL.

AndyKluger · on Sept 13, 2023

Have you looked at NestedText?

lr4444lr · on Sept 13, 2023

I can't be the only person who thinks the monumental effort spent on config formats is bike-shedding.

JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway. JSONSchema gives us some typing sanity.

Can we just move on already to more interesting problems? It's not like git fulfills every VCS wish I've ever had either, but I have to move on. Projects and libs that introduce new config formats that continually remake the wheel, whose quirks have to be learned, are not helping my net productivity.

</rant>

ihateolives · on Sept 13, 2023

> JSON is good enough for anything I've done.

I want comments in config files.

ok_computer · on Sept 13, 2023

A horrible workaround I use is a blank redundant key and a leading // in my string to draw my eye to it. This only preserves the last comment in my python dictionary but I only use comments to work in the json file.

    {
        “”:”// comment here”,
        “Entry”:[-1,0,2],
        “”:”// next comment”,
        “Flag”:true
    }

timmytokyo · on Sept 13, 2023

And trailing commas on final list elements and object properties.

skrebbel · on Sept 13, 2023

JSONC is exactly that, JSON with comments. Works fine, eg typescript’s tsconfig file is JSONC and I’ve yet to find a problem with it.

bobbylarrybobby · on Sept 13, 2023

https://json5.org

hddqsb · on Sept 13, 2023

I can relate. But after using JSON for a while (in files that I edit by hand), I found that I really want comments and trailing commas (which leads to https://nigeltao.github.io/blog/2021/json-with-commas-commen...). Next I'd probably want multiline strings (leading to https://github.com/json5/json5).

But if you use those extensions, all your tooling breaks.

(Aside: I think the real bike-shedding would start when you want to add some syntax for raw string literals, e.g. heredocs; it's one of those features that feels redundant, until the day when you really need it and you can't bear the pain of repeatedly escaping and unescaping.)

rewmie · on Sept 13, 2023

> JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway.

If someone wants JSON with extra features like comments and typingz they are better off switching to Ion.

https://amazon-ion.github.io/ion-docs/docs/spec.html

GuB-42 · on Sept 13, 2023

  JSON is good enough for anything I've done.
  ...
  </rant>

Except for closing comments, for that, you need XML.

eviks · on Sept 13, 2023

Yeah, don't know why some people don't want to settle on bad formats without such basics of human economics like comments and keep improving

mikece · on Sept 13, 2023

Given how little I deal with config files compared to the rest of my work I prefer formats that are obvious, even if verbose, to those with sneaky syntax. I'll take JSON or even XML over TOML any day.

dystroy · on Sept 13, 2023

I previously argued that TOML wasn't good enough in this blog post https://dystroy.org/blog/hjson-in-broot/ where I show an example of problem which frequently hurts my users and leaves them lost without even understanding that the problem is in how they wrote their TOML.

I moved the configuration of several of my programs to Hjson. There are still problems but they're less puzzling. Hjson isn't ideal either but might still be the best configuration format we have today.

eviks · on Sept 13, 2023

hjson is indeed more H vs json

You've mentioned in the blog that ": it's meant to be written by humans, read and modified by other humans, then read by programs", but is it possible for apps to (roundtrip)-edit those configs preserving all the human syntax intact? It's rather common for apps to e.g. have font size changed, but unfortunately also common to destroy human formatting in the process

dystroy · on Sept 13, 2023

This is theoretically possible, and I actually toyed with the idea.

I didn't do it in my deserializer because of the big value you have in Rust in being compatible with serde and that wouldn't be. But this would be interesting, probably as an side library.

ftrobro · on Sept 13, 2023

Roundtrips without destroying comments or formatting is supported in the JavaScript, Go and C++ implementations of Hjson, but not in the other implementations (I think).

throwawee · on Sept 13, 2023

Thank you for introducing me to Hjson. I've been using simple colon delimited lists which seem to be, hilariously enough, already valid Hjson.

dystroy · on Sept 13, 2023

A lot of formats are Hjson compatible, notably JSON, and also what users wrote thinking it was JSON but they forgot some quotes or had a trailing comma so the JSON parser refuses it while the Hjson one is perfectly happy.

moogly · on Sept 13, 2023

Hjson is also the format I use for all my things. Strikes a good balance.

globular-toast · on Sept 13, 2023

Literally every config file format is terrible in some way or another. The best configs are executable and loaded into a dynamic runtime. Emacs and Airflow are good examples of this.

But I definitely strongly prefer YAML to TOML. It's just makes a lot more sense to me and it's a huge shame that PyPA went with TOML which is so un-Pythonic. I preferred setup.py. StrictYAML is a really good development that I wasn't aware of, though.

baq · on Sept 13, 2023

Flat is better than nested.

I'd argue that's enough for TOML to be more pythonic than YAML.

globular-toast · on Sept 14, 2023

Both TOML and YAML support nesting. TOML simply looks flat even when it's not, so it's the worst of both worlds. In any case, it's not the format that is nested or flat, it's the content. Python itself supports nesting. The zen of Python merely says you shouldn't use the nesting when flat is an option.

rewmie · on Sept 13, 2023

I think this is a glass-half-full view o TOML, when TOML itself was the one that added water to the glass to begin with.

The main value proposition of TOML is to provide a concrete specification of a INI-type config language. INI is ubiquitous, but it lacked a spec, which led to a lot of wheels being reinvented. TOML fixed that.

If a project needs convoluted config files, I'd argue the project is already broken. If TOML doesn't fit your needs, that's hardly TOML's fault.

bschwindHN · on Sept 13, 2023

I'm okay with TOML for something like Cargo configuration, I don't enjoy it for much else though.

I always comment the same thing on these sorts of discussions - JSON5 has been really nice to work with if you can fully control all consumer applications of it (since there aren't great libraries for json5 in every language). Certainly nicer than the hellscape that is YAML.