Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is wrong with TOML? (2019) (hitchdev.com)
157 points by BerislavLopac on Sept 13, 2023 | hide | past | favorite | 296 comments


Honestly, all of these arguments feel pretty subjective to me.

This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited, which means the main thing left to disagree over is syntax, which is highly subjective and extremely difficult to get people to agree on.

Add too many syntactic features and a lot of people will disavow you for being too complicated. Add too few and you'll be missing someone's pet feature. Make white space significant and you'll frustrate people. Require extra characters to delineate and you'll frustrate another group.

It's worth noting that this article is primarily talking about TOML in the context of the Python ecosystem, and I think that's a healthier way to talk about config file formats: How well suited are their syntactic choices to the community they're targeting?


Yup. I've personally used both YAML and TOML for configurations, much more the latter recently and can see pros and cons for both.

> How well suited are their syntactic choices to the community they're targeting?

Also, "best" practices. One could reduce the pain of the other, but by no means is the right solution to a deeper problem at hand. For example, if one has very deep and complex nesting for configs, TOML "may be a lot nicer" compared to YAML, but that doesn't mean using TOML will make all the config parsing problems go away. It just mask away code smell. Maybe time to check if they're overcomplicating configurations in general.


> This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited [ ... ]

So, why not use Scheme ?


Scheme lacks most syntactic affordances that imply semantics. Even if some of those implications are dead wrong, they're still useful.

Personally I think the right answer for configuration files is to define them in terms of a generic object model. A program could even support multiple formats (TOML+JSON+YAML). If a user dislikes all the supported formats or the file is generated with something like NixOS, it can be handled with straightforward conversion.


> A program could even support multiple formats

I invite you to check out Symfony where you configure your app using yaml, attributes in code, code itself, or a mix of all the above.

You will cry.


Point taken, but it would seem the problem there is probably due to the arbitrary placed mixins? My proposal was more for a single configuration object, just in whatever actual syntax you (or your team) prefers. If someome runs away with that and writes json that includes yaml that includes python that generates configuration from what it found on the filesystem, responsibility for that needless complexity rests squarely on the shoulders of the new programmer.


Having syntactic affordances for every nuance of semantics is what led to the current state of zoo. What is wrong with having trivial syntax and distinguishing semantics by labeling parts of the syntax tree with symbols?


Specifically with scheme, I believe the main problem is the symbols lack any distinction to know the difference between a function call and a syntactic form.

(Don't shoot the messenger, I've done my fair share of scheme. I've also done a lot of thinking about why some people are so turned off from the syntax, and it's certainly not that the opening parenthesis is in a slightly different place on the prefix function calls)


Some do, using s-expr as config files is pretty common in the ocaml ecosystem (ie dune).


>Make white space significant and you'll frustrate people.

Or worse: make whitespace significant and strike a blow in the eternal tabs vs. spaces holy war at the same time, like YAML.


...or more famously, Makefiles


I would disagree


It's kinda crazy we have 2 kinds of whitespace.

Even the customizability argument makes less sense now, IDEs could just change the width of leading spaces, I'm not sure why they don't.


Isn’t the whole premise behind this article that, coming from Python where indentation is program structure, that TOML confuses the reader with syntax foreign to the reader?

Like a C++ developer crying foul because inheritance doesn’t exist in YAML.


Make that C++ developer's day by pointing out that YAML does support inheritance: https://dmitryrck.com/how-to-use-inheritance-in-yaml-files/


Pfffffft, you know very well C++ developers don’t write yaml. They write m4.


SciTE uses lua for configuration.


Thankfully, YAML supports SFINAE


The first and the last ones, at least, are tradeoffs where TOML made the right decision for most users.

Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.

And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure, another thing to maintain and sync up and keep in your head, and not worth it for most use cases.

And that's the crux of it: it all depends on what you need from your markup language, what your use case is, today and through the lifetime of your project. "What's wrong with TOML" makes much less sense as a question than "What's wrong with TOML(/JSON/YAML/etc.) for _this project and its needs_".


I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer. It's analogous to forbidding comments in JSON.

I love TOML and will continue to use it as my default choice for configuration files, because I think most applications simply do not need the power and flexibility of YAML, even if The outright safety problems are mostly resolved in YAML 1.2. But I do agree that the inability of the syntax to convey nested structure is a limitation and it definitely gets annoying in larger configuration files, such as pyproject.toml files that tend to accumulate in larger Python projects. I have considered just manually indenting nested table blocks, even though that would look pretty ugly and is decidedly non-standard.


> I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer.

I'm sure you'll be happy to know this is getting relaxed in toml 1.1, wherever that comes out (and the implementations adopt it): https://github.com/toml-lang/toml/issues/516

Though the difficulty will then be knowing whether a given piece of software uses 1.0 (and single-line tables) or 1.1 (and more flexible tables).


YAML has a big problem in that you can't work with it in standard tools.

Most every common GCedlanguage these days supports native JSONlike object, YAML can represent things outside of strings, lists, dicts, bools, numbers, and null.

Lack of nested structure is a positive in some applications. Flat is better than nested. I've seen way too many config files where someone says "add foo=3" to the file and you can't even figure out where in the structure it goes.

And worse, sometimes people reorganize things into options. They'll move all the stuff for one subcomponent into its own nested thing, and you can't configure it without knowing the full architecture.

With flat stuff you get an obvious single way to represent any given config options. Maybe not the nicest way, but it's obvious and unique.


Hi, I'm the author of this piece. Thanks for your comments.

>Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.

If the contextual information is relevant that's true. However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant. A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.

>And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure

Schemas aren't technically required by strictyaml provided you're happy mapping everything to a dict, list and string, but they're recommended because they make it much easier to prevent something from going wrong and it means you can directly use the types you were expecting.

Schemas in config files are equivalent to static types that generate compiler errors in a program. If you can use them, it's an easy way to get your program to fail fast on invalid input and save on debugging time.

If you don't have a schema and some invalid data gets put into your config file, instead of getting an error that says "didn't expect key "ip addresses" on line 14" you tend to get a really cryptic error a bit later on when your program tries to get a key from a dictionary that doesn't exist.

This is an example of the principle of https://en.wikipedia.org/wiki/Fail-fast design.


> However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant.

I don't know that ': ' is any better than ' = '. I get being opinionated about it but this feels squarely in the realm of the subjective to me. Further, adding an errant '-' and accidentally creating another object is real common in YAML, which is something you can't do with TOML's lists. I think this washes out, tbh.


I actually tried to demonstrate this with numbers a while back. I tried taking a few random JSON files as a control and representing them with StrictYAML and TOML and the TOML varied from 30% to 100% longer.

There is an element of opinion here, but there is no question that equivalent TOML files are longer, and most of that is syntax.

It's much more pronounced when you have more than one or two levels of nesting. With 4 or 5 levels of object nesting TOML files grow huge, whereas YAML is still fine.

>Further, adding an errant '-' and accidentally creating another object is real common in YAML

Yep, this is one of the things that type safety helps with though. Similarly it's quite easy to mess up an indent in YAML, but a schema can catch that stuff.


> It's much more pronounced when you have more than one or two levels of nesting.

Yes. That's a clear indication that either you configuration type structure is badly designed, or that you aren't trying to create a configuration type and should use some data encoding or programing language that fits your problem better.

I guarantee you the people using your software will curse you every time they try to edit a YAML file with more than 3 levels of nesting.


I confess I feel strongly in opposite directions on this. I'm pretty biased against YAML: I think it's way way too complex, and its "declarative" nature and broad feature set essentially make it a complexity magnet. I've seen hellish YAML files, and it's also pretty common to combine them with Jinja (or w/e) for extra hellsauce.

But it's declarative, which is really cool! What else would you realistically use for like, AWS infra? The JSON version is so much worse. Using Python or whatever is its own can of worms.

This is the kind of thing that stuff like CUE and Dhall are aiming at, and I welcome it. It feels like they're the way out here.


> But it's declarative, which is really cool! What else would you realistically use for like, AWS infra?

Programing languages can be declarative too. As you already pointed, CUE and Dhall exist. (And lisp, and prolog...)

(And in fact, a lot of people just use Dhall anyway and derive the actual configuration files from it.)


True, I meant more non-Turing-complete but you're right. Also I do like using Lua for config when I can, so that's another point in that column.


Oh I don't doubt that there are various inefficiencies, but I dispute the characterization that this inefficiency is "wrong". I think it's clear by now that there's not "1 text format to rule them all", whether we're talking XML, JSON, HTML, SGML, TOML, YAML, JSON, RFC 2822, CSV, or whatever. Depending on your use case, one of these is better than the others. Like I definitely don't want to serialize objects into TOML. I also definitely don't want to deal with JSON for configuration.

> ...but a schema can catch that stuff.

100%; it's amazing to me the number of projects that essentially accept arbitrary structures and/or values here. I think no matter whether you're using YAML/TOML/JSON/etc. you oughta be using something to validate.


I don't find TOML files easy to read/understand.

Especially when I'm scrolling through a file, I encounter myself backtracking to understand again its structure.


Similarly, I don't find yaml easy to read/understand. XML had the curse of people trying to use every feature possible in most documents. And as much as I do prefer the "program" approach of emacs, I will make no defense of giant emacs config files, either.


XML has the problem of too many features. If you personally decides to use them or not doesn't change the fact that whatever software reads it has to support them.

XML parsing is also not a pure computation.

That said, a XML-lite would quite probably be the best data encoding language one could create right now. It would still suck as a configuration language, as it's a very different problem.


https://kdl.dev/ is an XML-lite configuration language with a pleasant syntax. It has a neat syntactic feature for commenting things out called "'slashdash' comments". An example from the spec:

  mynode /-"commented" "not commented" /-key="value" /-{
    a
    b
  }
A problem with a node-based design similar to XML compared to something that parses to nested lists and dictionaries is that you pretty much need a query API. Implementing it takes nontrivial extra work. This affects KDL implementations: KDL specifies a query language, but I think no Python KDL library has implemented it so far. It limits how useful KDL is in Python, despite there being multiple working parsers.

Edit: Rephrased the second paragraph.


That language is very interesting, thank you.

But I do disagree on the work. A node-based language parses into a sequence of (name, node_type, data) tuples. What is just as easy to travel as the (name, data) from the data map ones. The thing is, a query language is much more useful for them than for a data map, so it's worthwhile to implement one. (There's probably some cultural aspect here too.)


You're welcome.

Yes, you are right. Node-based structures are not meaningfully harder to traverse than nested lists and dictionaries. That was a wrong comparison.


> The thing is, a query language is much more useful for them than for a data map, so it's worthwhile to implement one.

I agree about this.


OTOH you can factorise and modularise an emacs config file. Sometimes an XML file if it uses xinclude (though that can be an issue in other ways).

yaml? who knows. Whether and how (and the limitations) depends on whoever cooked up that pile.


> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.

I don't think this is a particularly good yardstick. Code without comments is shorter than code with comments, but I wouldn't call comment-less code easier to read; the more information dense, the worse, really.


Thanks for your work!

I first encountered strictyaml years ago and have used it, happily. I especially appreciate that you made clear arguments for what ought to be excluded from a configuration format itself, and how proper validation ultimately requires real code anyway.

I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).

More recently I came to know and love NestedText, which seems very close to what a strictyaml spec could be/have been. I'm curious if you've engaged with that project/format, and what you think of it.


>I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).

I would dearly love to do this. I would ideally like to work with somebody who can help me though because it's a lot of work and I struggle to find the time.


NestedText seems very close to what a strictyaml spec could be/have been. I'm curious if you've engaged with that project/format, and what you think of it.


Doesn't YAML have the unfortunate issue of ambiguity in the variety of parsing and versions? (Edit: I see you're advocating for a new subset... never mind. :) )

If you want a "language" for expressing data (like configuration data), you might be interesting in having a look at EDN. https://github.com/edn-format/edn


> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain

According to that logic, binary files would be easiest to maintain and read, which is obviously bogus.


I don't even think this is about DRY and possibly misunderstands the DRY principle. DRY is about having a single authoritative source of information ("Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"). Repeating a portion of a key in a configuration file is not in violation, it would be if it were the value being repeated.


I beg to differ. If there are indentions, it is easy to fold long lists. For TOML, it takes mental effort to check whether items are from the same list or another. Additionally, in TOML, there are multiple ways to write a list (unlike in JSON), which makes it harder to parse - at least for me.


The problem with that is that indentations within indentations are just obnoxious. It is far from uncommon to add an item at the wrong level in a large config file. Worst is when you have a large file that is showing several meaningful lines of indentation on one screen, where the roots of several of those levels is not visible.


> Not being DRY is a good thing in a config file

Also unit test code is expected to be a lot less DRY.


The (upcoming) TOML 1.1 will alleviate some of this; that example document could then be written as:

  [params]
  profile = {
  name    = "Gareth",
    tagline = "..",
  }
  
  contact = {
    enable = true,
    list = [
      {class = "email", icon = "fa-envelope"},
      {class = "phone"},
    ]
  }

The whole business with the syntax typing has no "one correct way" to do it. No matter what you do it will cause problems and headaches for someone at some point somewhere.

> Dates and times, as many more experienced programmers are probably aware is an unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases. TOML experiences many of these edge cases because of this.

Eh? In the original text it links to three issues to back this up:

That first issue it links to is "failed to parse long floats like x = 0.1234567891234567891" and the third is a feature request for hex values (v = 0xff, not even a bug report). That has nothing to do with dates? The second issue did relate to dates, but was just a simple bug, not an "unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases".

This just seems repeating a tautology. I maintain a TOML implementation that sees some reasonable use. Dates have not been a huge source of bugs, confusion, or other issues. All you need is to be able to parse RFC 3339 style dates (and some things derived from that), which is usually just calling strftime() or whatever your language has for this.

I do think TOML had some bits I wouldn't have added (not dates though), but the feature sets and complexity of TOML and YAML and not even comparable; it's like comparing Iceland (pop: ~300k) to Ireland (pop: ~5M). Yes, they're both islands and both are small countries, yet the scale of their "smallness" is just completely different.



This might convince me to start using TOML. I hate JSON for configs (not all parsers support comments, yes I know JSON5 exists), and TOML's table syntax really sucks. YAML has its flaws, but it fits the bill the best IME.

Now I just have to hope enough TOML tools support this syntax, lest I end up in the same boat as JSON5.


> The (upcoming) TOML 1.1

Out of curiosity, how does the deserializer know which "standard" to use?

AIUI one can pin a version of YAML via its directive: %YAML <https://yaml.org/spec/1.2.2/#681-yaml-directives> with a missing one implying 1.2 although (heh) the 1.1 version says that documents which are missing their yaml directive are implied to be 1.1 <https://yaml.org/spec/1.1/#YAML%20directive/> so ... versioning, it's hard!


There is nothing for this, which is probably fine. Previous thread on this: https://news.ycombinator.com/item?id=36023321


What's the difference with JSON in that example? No double quotes around keys, that seems to be about it. It seems more practical/readable than JSON if you have very limited need for nesting, but the old format would suffice then too.


In that specific example not too much, but obviously there's a whole bunch of differences between TOML and JSON, starting with the fact that you can add comments as has already been discussed, and many more.


> you can add comments

Given how many comment remover util functions I have written to make almost JSON config files proper JSON ... why could they not just have included those in the spec.


Because JSON is not designed for configuration files, but for data interchange, and this makes JSON "better" for that use case (the original motivation was "people were adding comment-based pragmas to JSON files, so I removed comments", but without comments it's also easier to parse, concatenate, etc.)


Will python support the new version for pyproject.toml?


I would expect so, yes.


It already supports inline tables


But the whitespace handling involved is an extra complication, at least the provided example fails to be parsed by `tomli`, which is following the currently release TOML spec: https://github.com/hukkin/tomli/issues/199


> StrictYAML, by contrast, was designed to be a language to write readable 'story' tests where there will be many files per project with more complex hierarchies, a use case where TOML starts to really suck.

I can’t be the only one who feels this way; isn’t it this use case the thing that sucks? “plain text but not really” config formats that aren’t “code” but have special syntax but lack a debugger and handy IDE tools and you’re never really sure of what you’re doing … isn’t that the thing that sucks?


That was also my first reaction opening the example, I’m perfectly fine with a half-assed programming language (embedding more programming languages) not working in TOML, I don’t think it works in yaml in the first place, and it’s exactly the sort of usual mess which makes me recoil at the sight of a yaml extension.

This specific example is like somebody saw cucumber and went “I think this should be a lot worse, and in an ecosystem which doesn’t want to come anywhere near that too”.

This would probably be half the size and actually comprehensible if it were just a pytest test file.


Hi, I'm the author of hitchstory.

With the stories in YAML two new use cases (which are demonstrated in the example projects) are enabled:

* Automatically updated how-to docs. I used to write these types of docs manually and if they existed at all they would ALWAYS get out of sync with the code and were painful to maintain manually. Now I have YAML files and a simple jinja2 template I can push out new markdown how-to docs on each new build - with snippets of JSON, screenshots from the app, whatever.

* Tests that rewrite themselves. E.g. if I have a REST API test of the form "call API x and expect y blob of json", I don't have to actually write that blob of json into the test, I just write the code that produces it and run the test in rewrite mode so it updates the "expected JSON" field with actual json. I can then eyeball it and 20 seconds later it's part of the test and part of the docs.

The productivity improvements from doing both of these things means that writing tests is cheaper so I do more of them. Having how-to docs for all scenarios is way cheaper so I now always have them.

These use cases are impossible with pytest. They are impossible with cucumber.

They would be too painful to maintain with regular (i.e. not type-safe) YAML and those stories have enough indents that they would be an epic unreadable mess of syntactic noise if they were built in something like TOML or JSON.


The first thing reminds me of example tests in Go: https://go.dev/blog/examples


Similar. It's like that but in reverse. Instead of running snippets of your docs as code, the tests and their metadata are used to compile doc snippet or entire how to docs pages.


That is how Go's examples work- your example tests are inlined into the GoDocs at pkg.go.dev


> Tests that rewrite themselves.

That sounds like expect tests in OCaml [1]. I've found them quite a joy to work with and I'm surprised more languages don't have something similar.

[1] https://dev.realworldocaml.org/testing.html#expect-tests


The second thing reminds me of how Jest does snapshot testing: the first time you run it, it simply edits the JS source file with the result. For a test runner to edit test code feels weird at first but it works spectacularly well.


Your way of thinking, shared transparently not just in your "why not" section but throughout, is a breath of fresh air in the miasma of grabbing a tech because dogma or hype.

It's not that I agree with all your choices. It's that I'll defend to the end your method of making them.


Yeah that use case really sounds like they're on the cusp of just making a DSL from a config language.

If you need it to be _that_ complex then just write your configs as code...


I don't have any problem with complex "symbol vs literal" resolving in configuration languages. Unless your syntax is very weird, it should be very hard to confuse those.

That said, YAML's syntax is very weird. YAML just sucks. And any such implementation must necessarily be unityped (up to the point where the data is coerced into the configuration structure at your program) and completely preserve the original data.

TOML could be extended to support it. I don't think it's "tasteful", but I see no practical problems with it.


> TOML could be extended to support it.

Seems doubtful as that's specifically something TOML was created not to support. If you want unityped ini files you can define that dialect of ini files.


I agree, I quite like https://dhall-lang.org/ for that reason. It strikes a good balance between features and being a config language.


Eagerly waiting for a Yet Another Human Readable and Writable Graph (Which Is Mostly a Tree But Not Always) Serialization Language With Built In Schemas and Pure Functions, Maybe made with <3 for humans.

Actually, that, but without sarcasm. YAML is crap. TOML is crap. JSON is crap. INIs are actually fine for flat lists but major crap otherwise. Anything Turing-complete is completely inadequate crap for the purpose.



I especially love the Integrant[1] version which uses the E from EDN (extensible) to add references:

    {:adapter/jetty {:port 8080,
                     :handler #ig/ref :handler/greet}
     :handler/greet {:name "Alice"}}
[1] https://github.com/weavejester/integrant


I like edn but I’m not convinced it’s great for configuration.

First off, it shares the desire for extensibility with yaml (in the form of tags), which is a giant trap. And then it’s quite syntactically noisy.


I love EDN (especially the integrant extensions) for configuration of the stack — that is, dependency injection and so on, the developer-focused configuration.

For user-facing configuration I still favour TOML. I think it’s a bit application dependent, sone applications (eg nginx) have complex configuration needs and for that it makes sense to use something more sophisticated, but for many user-facing config settings, a simpler-TOML would be a great fit. Basically just some basic key-value pairs that can be collected into groups. As the article states, the parsing types should perhaps be enforced by the parser not the written config.


Why do you love edn? Specifically I also like edn, but I always felt like my use of it was glorified json.

I did like integrant, at least I felt like I understood it, unlike component.

There were two things about edn that made it seem better than json to me, tagged elements (and readers), and symbols. I don't remember exactly my use case but I used symbols in edn to something like namespace-resolution for multimethods. It was something like including a file in a classpath or loading a file, dev vs prod kind of config.


I suppose mostly superficial reasons such as liking the keyword syntax better that using string keys in json, but also the richer set of data types (like sets), although that mostly doesn’t matter in practice, but it is very nice when using Clojure since the Clojure types just work. As an interchange or storage format it’s also nice to be able to store Clojure types and get them back the way they were out in, which if using json you would lose some information.

Mainly it’s just personal taste and no deep reasons.


Too many features that can't be represented in standard programming language constructs..

JSON-alikes are opinionated. They don't let you do stuff that requires a plugin for the parser to work with


Dhall is superb, https://dhall-lang.org


Thanks for that, it is interesting.

Is this actually something I want to see in a configuration script? Why not just use a scripting langauge and be done with it? I wonder if the safety features can't be replicated with rigorous testing of say python as config scripts instead of learning yet another programming language?

https://prelude.dhall-lang.org/Text/concatSep.dhall

I think this is the key fulcrum for me: "config is code", sure, but not the same kind of "code".

It -is- compelling to argue for statistically deterministic config code but my practical objection here is 'can we arrive at same safety using testing with a known language?'

Writing this has made consider whether configuration should be conceptually looked at as a database instead of "code". How many people even know how e.g. postgres stores its tables and why would modulo some performance niche would you care anyway?

It seems configuration management is a graph db query and update matter. Standardize on configuration query language (if necessary) and stop worrying how the damn thing is represented by the config management tool.


Depends on your requirements. If your config complexity is getting beyond manageable, the benefit of something more reliable is apparent. Type safety clears lots of common bugs no test suite would be certain to filter out. How complex should your tests get? Do you want to take on that responsibility or rather delegate it to something that provides you certain guarantees? It’s all very subjective in the end.

I run my configs mostly as YAML in Consul and Vault, sometimes in Spring Cloud Config with a git backend. This way I have dynamic config evolution. But I prefer to generate those yaml files from Dhall to avoid unnecessary bugs. After years with Haskell, the syntax is very natural, too.

As for Postgres internals, they do matter if your data set keeps growing.

Xkcd covered standardization. YAML and JSON ASTs are graphs, YAML not necessarily a tree. JSON extensions also support references. As for the ops side, YAML has become a de facto standard, HCL is used with the HashiCorp tools. Nix has its own language.

It’s not about how it’s represented but how you express dependencies across config key nodes. It’s good to avoid repetition and have a syntax linter, a compiler even better. Small static configs are amenable to querying and writing. But you need to separate the writing from the querying the configs. With Dhall you write code to generate the actual config, whether as a Dhall AST or exported to YAML or JSON, with certain correctness guarantees upfront.


> Why not just use a scripting langauge and be done with it?

I want my configuration to be guaranteed to halt. Turns out it's hard to not make anything useful accidentally Turing-complete!


At this point, I‘d rather just use a proper language.


Maybe kdl[0]? It's a document language somewhere in between xml and yaml without all the crap of either IMO

[0]: https://kdl.dev/


I use KDL in a (not too complicated, yet) config file of a project and I like it a lot. Tree structure with attributes like XML but with less syntax than JSON. Nothing redundant but has basics like comments.


Let'add another one: CUE

https://cuelang.org


INI-files are crap too, as there is no standard. Every program has it's own dialect, and automating around them can be a pain.


INI files are toml


I have a beef with YAML/TOML/JSON as an inline front matter (header) format for SSG (static site generator) posts, but that's because if I want to save a mostly Markdown SSG post on Github...

1. the post is previewed with Github's dialect of Markdown and not the SSG's (and definitely with none of the inline configuration applied)

2. the preview is still a Markdown document, so you get no benefits of syntax highlighting or auto-formatting w.r.t. the config header (short of opening in Vim and explicitly declaring the filetype or temporarily changing the extension in Github's editor)

3. you HAVE to put trailing spaces in the YAML/TOML/JSON or the preview pane crams everything into an unbroken paragraph

4. there's not a quick preview of how the configuration will parse, just specific workflows (live update, compile single page) that you can test and then modify as needed. This is either in the rich online editor or your own machine and will require console commands and a browser window

5. I still have to know all of the modifiable attributes as well as defaults, which will be in a separate document and probably not in a Ctrl+Space dropdown

-----

For point 5, it would be nice if configuration formats had completions for common editors and/or their own scripts:

* "generate big config file with all possible keys and default values"

* "condense modified config file so it only contains non-defaults (and hope the schema doesnt change hahaha)"

* "suggest a valid fix for a currently invalid config file"

-----

Sure, this all isn't a direct criticism of TOML, but inlining configuration is a great-to-okay idea that is simply poorly executed. It is extremely unfriendly to non-technical users; I can fully understand why someone would pay a few hundred a year for a WYSIWYG templated website builder to just handle everything.


I'm surprised no one has thrown in raw S-Expressions, yet, in this thread. That's an HN perennial favorite. Great at trees, decent at graphs. You can easily go Turing complete or not on the whim of any Lisp at hand and the hammer and forge of macros until your heart's content.


I interned at a company that used an s-expression format as an alternative to JSON and it was great—much better than any mainstream format for config files and human-readable representations for API data. It was great at representing structured data (including tagged variants, so full algebraic data types) and even pretty good for text markup.

It also has best-in-class editor support thanks to paredit :)

The only real downside is the hassle of convincing people to use something weird and non-standard, which is really more a problem with people than with the format.


Nickel[1] maybe? Though I'm not sure about "for humans" part :)

[1]: https://github.com/tweag/nickel


INI is really nice. Anything you can't represent in it, is getting towards not quite just configuration.

I think my ideal format would be INI, with tagged headings so you could do [md:README] and stick a multi line string section in there, or use something like [comment:Module Attributes] to make an embedded docs section.

Either that, or just some other method of embedding multi line keys. Maybe HTML-like tags, so that inside a heading you can do

<my-key> value </my-key>


I think it is important to distinguish between human-readable serialization format and human-readable config language. I wish the distinction was made more explicit.

JSON is the champion serialization format. But it is hard to edit config files with missing comments and strict quotes and commas. JSON is great for the API, but configs should be converted from something nicer to JSON. I wonder if it would be good to make the conversion explicit so any format could be used.


Properties files have been fine, i guess since 1995.


HOCON

(But extrapolating, I'm guessing it goes into your crap category.)


Exactly, so just use JSON for configuration. This will upset the people that have not yet learned to get over it, but you really need people that got over it.


JSON is the runner up on the top crap list right below Windows Registry.


No comments means it's not human friendly.


I would strongly advise against using JSON to configure humans.


That's a first, the ol' HN switcharoo?


I do this, but regularly get annoyed with the date, comment, trailing comma, mandatory key quotes, etc


Maybe you'd like jsonnet: https://jsonnet.org/

I find it particularly useful for configurations that often have repeated boilerplate, like ansible playbooks or deploying a bunch of "similar-but" services to kubernetes (with https://tanka.dev).

Dhall is also quite interesting, with some tradeoffs: https://dhall-lang.org/

A few years ago I did a small comparison by re-implementing one of my simpler ansible playbooks: https://github.com/retzkek/ansible-dhall-jsonnet


If I have to use a better format, I will use toml or cue.

The whole point of the parent is that JSON is the Pareto solution, which I agree with.

But it does grind my gear.


jsonnet is turing complete.

It's configuration with a build step. Which is an option, but that's pretty different than JSON, YAML, TOML, etc.


> It's very verbose. It's not DRY. It's syntactically noisy.

I don't completely disagree with this. However, in most cases TOML is used, it isn't that much of a problem.

And I actually like that the full key is repeated. When you have several layers of nested mappings, it can be hard to determine exactly where the current value is in the hierarchy. Especially if the top level key is above the current screen of text. It can also make it easier to search for a specific key. IMO, this is a case where more verbosity and repetition makes it more readable.

That said, it seems a little arbitrary to me that inline tables don't allow newlines within them. If they did, then if you didn't like repeating the keys, you could use inline tables.

> TOML's hierarchies are difficult to infer from syntax alone

This is a little subjective, and depends on the actual data represented in the config.

But in general, my experience is that when you have several layers of nesting, and the only indication of the hierarchy is indentation, it can be a little hard to follow where a specific value fits in the hierarchy. See above.

And I disagree that meaningful indentation is "generally considered a good idea". I won't enumerate the pros and cons here, as it has been discussed a lot elsewhere, but it is definitely controversial, and subjective.

> Overcomplication: Like YAML, TOML has too many features

This section lists exactly one feature that it thinks TOML shouldn't have. Maybe dates shouldn't have been included, but it isn't anywhere close to the complexity of YAML.


I'm not sure it's clear from the article -- only the URL -- that the writer is also the author/maintainer of StrictYAML.

This article has been written in a way that (most likely inadvertently) implies a measure of distance from StrictYAML.


One of the more interesting config formats I've come across was an application that used an Excel file. Once you get over the horror of such a terrible decision, it was actually a quite interesting choice that allowed a fair few advantages. First of all each config subcategory was on a separate sheet making easy to navigate and find what you where looking for. You could use formulas to relate different config options (If you wanted A to be 20% of B you just set that in a formula). You could use drop-down for fields where there were only a limited number of valid values. You could include as much comments and documentation as you wanted (including diagrams and images) as long you only wrote in unused cells. And finally, my favourite, when configuring colours, instead of typing in RGB or hex values you simply changed the colour of the cell to the colour you wanted.

Now I would obviously never ever recommend doing this, but it was certainly an interesting and eyeopening experience.


Too many encounters with Excel’s “smart” date parsing would make me very concerned about using it this way.

https://support.microsoft.com/en-gb/office/stop-automaticall...


That page is such a gem.

> "make it easier to enter dates. For example, 12/2 changes to 2-Dec"

12/2 is obviously the 12th of February in my locale. But I need to keep Excel in English as a company policy, so this is not only unhelpful, it's outright wrong.

> Unfortunately there is no way to turn this off.

How does Microsoft justify this choice?


I’ve never understood why they don’t provide the option to disable it, it’s not like Excel is a sleek, minimalist piece of software with strong opinions and limited configuration.

I also love how the support article describes a behavior of their own software as “very frustrating”.


At my co-op decades ago we used to use Excel "templates" as masters to generate/maintain text config files (and iirc a few C headers). You would save as text to make it usable. The grid layout, ability to highlight/color/border/style etc and use formulas and plot was very helpful.


Underrated technique. I’ve used it similarly to generate scripts and it works great when used with care (error checking generally must happen at a different stage).


That's just begging to be an actual database.

Imagine if you said everything with SQLite instead of Excel, and all of a sudden your just talking about structured config in a database. Not new, not crazy, and generally a decent practice.


The big difference is that every Windows computer (in a professional environment) comes with a very nice GUI tool for easily editing Excel files. A tool that basically everybody know how to use. The same cannot be said for SQLite.

SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.


While I'm not sure I'd be happy about "end users" to edit the Excel files as config. Somehow they get the cells mixed up and you get utter mess. And then excel confuses some value for a date and stores some other mess ...

For a somewhat trained audience however it can be quite interesting for some specific problem domain ...


>SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.

I think you are conflating the file with the workflow. A proper UI is the solution to making something not "terrible for end users to edit".


Why doesn't Microsoft just add SQLite to Excel, so your formulas can query it?


Because how else would they push MS Access?


Well, that's a complete apples-to-oranges proposal with completely different requirements.

With the original solution, you have an autocontained file that virtually any user knows how to edit, structure, expand, version, email, compare, discuss...

The level of user knowledge that you need for a similar solution based on a standalone SQLite file that you can version is another different world, e.g. to relate two values you would need to perhaps create a view or a trigger. And even with the most knowledgeable user you would still lack functionality such as simply pasting an image as a means of documentation and be able to see it, or WYSIWYG colors.


> version, compare

Gonna have to disagree with you here. Very few people know how to version, compare, let alone if there are multiple collaborators, Excel files. Is there even a decent way of doing that outside of Excel Online / Google Sheets?


Well I've seen a great deal of people comparing two Excels by quickly alt-tabbing, and doing Sheet1!A1=Sheet2!A1 comparisons in a third sheet, and dragging the formula.

Yes, that's horrible. But millions of people can autonomously resort to this, and they would be incapable of doing anything with a SQLite file.

Same with version control: you just add _v17_20230909_final to the name. Yes, it's horrible. Yes, it's buggy. But yes, it also runs the world.


>But millions of people can autonomously resort to this, and they would be incapable of doing anything with a SQLite file.

Are you trying to suggest that either you can't do that with sqlite, or that people didn't require training to do this in Excel?


harperlee: open list of affordances that an average user comfortably identifies with an Excel file, not really with sqlite

sofixa: version and comparison is not really supported in Excel

harperlee: agree that software support is not there, but in practice average user knows how to do it and is quite comfortable doing it in Excel

btreecat: sqlite also has filenames and you can learn how to use it

harperlee: agree on the filename, Excel files and sqlite files are externally, opaquely versionable in the same way. the point about people learning is moot though, average user already knows Excel because they were forced to in the past, but does not have the time to learn new things.


Fundamentally they need a tabular data store. Most everything else is a nice to have/usability point.

SQLite is also an auto contained file. There are similar tabular GUI tools that could let you interact with sqlite using a similar workflow. Users knowing that thing is not inherent, they had to be trained on it, and they can be retrained as they will be for other workflows and business tools.

Remember, you still need an external app (Excel) to open it's files, the files themselves are just data, exactly like sqlite. So you could just make an excel plugin to interface with SQLite.

SQLite is as version-able as an excel file, as in not very with standard tools.

Why are you pasting images into excel? Doesn't matter, sqlite handles that fine actually. https://www.sqlite.org/fasterthanfs.html

The main argument against sqlite is that you would need to build an interface or figure out how to train folks on existing tools. That's not a huge argument against it in my experience, it's a strong social/political one in many orgs but rarely a technical issue.


> The main argument against sqlite is that you would need to build an interface or figure out how to train folks on existing tools. That's not a huge argument against it in my experience, it's a strong social/political one in many orgs but rarely a technical issue.

The design of a solution needs to look into way more requirements than just the technical ones, time and money being 2 big ones. I think most of the HN readers would agree that you could end up building an interface with most of the Excel functionality, even more perhaps, on top of sqlite, and have your particular group of users trained on it.


Now I just want a spreadsheet style front end to SQLite and I'm nerd sniped into trying to figure out how to do formulas (triggers I suppose).


Congratulations, you're about to reinvent MS Access / DBase / FoxPro... 30 years later.


That's an excellent point.


I'm sorry for the initial sarcasm, it would actually be pretty cool to have those type of apps back. With everything going web-first nowadays, you could take inspiration from SQLite's own SCM, Fossil, that can run both as a CLI or a Web server.


I've been thinking it would be cool to build something like this. What gets me stuck is version controllability and backups.

I think ideally you'd want multiple backends, either SQLite, or flat files that are Git/SyncThing friendly.

I was even thinking the file format could have a file UUID and record timestamps, so that if you put a different version of the same file in the same folder(Like with a SyncThing conflict file) it would give you a merged view, with newest-record-wins logic.

Formulas I think would be the easy part. Just write it all in Python and use one of the many Excel compatible formulas implementations, and just make sure all changes went through the app.

Maybe you could even have a REST API to build other things on top of it.

The web frontend wouldn't be too hard, it could just be a Vue3 app with an HTML table element.

Then you could have a cell type who's value was a query, which would embed a DB browser list widget in the cell, and that cell type could have the option to bind its selected row to another cell.

Like VB+Excel+Access+My fork of Freeboard with inputs in one!


For the most simplistic row-wise formulas, sqlite has generated columns.

I’d assume the issue will be that an sql table is not free form, you can’t randomly decide to write in some other cell.


Were there any downsides? Because this sounds kind of awesome.


Once you got used to it, honestly not really, other than needing Excel (not a huge deal since it was a Windows only application). I've no idea what the config parsing code looked like, but with the right library I doubt it was worse than any other non-trivial config parsing code. Mainly it just felt very wrong to my Unix, everything must be a text file, brain.


Never thought I'd get a chance to tell this story.

Early 1990s, college internship. The company did presentations for clients, like many do. They had an unusual way of presenting data that required using actual protractors to draw circles and curves, with pencil, on otherwise computer-generated charts. They read numbers from Excel spreadsheets and plotted them on paper.

I was shocked, to say the least. I proposed writing a program that read Excel spreadsheets and emitted the graphics. They loved the idea, especially from a summer intern.

So I wrote a letter to Microsoft asking for documentation of the Excel file format. A week later I got a thick envelope with a photocopied manual completely describing the format. I remember the word BIFF throughout. I wrote the program, it worked great, and I even negotiated a hefty lump-sum payment to sell it to them at the end of the summer.

It left me with a very positive impression of Microsoft as a developer-friendly company. Makes sense; developers are their platform's customers, and they're good at serving their customers.


There are some pretty awesome libraries for reading and working with Excel files - Aspose.Cells being one I used for years - basically a headless re-implementation of most of Excel usable via an API.


Well, xlsx is just a zipped dir of text (XML) files, so you can think of it as sort of following the .d pattern...


> the .d pattern

What's that? I ddg'd it and I got a bunch of hits for regex \d ...



That was an informative read - thank you!


conf.d, init.d


Ah, got it, thanks!


For better or worse I created a very similar system, but using Google Sheets instead of Excel files and fetched with the GSheets API. In that case the configuration was gigantic (many of the configuration values were product-decisions, and it ran in hundreds of different environments with different tweaks), so the tabular structure made navigating things very natural. It also had the advantage of structurally highlighting how environments differed. Doing it with Google Sheets came with some extra nice benefits: online sharing, versioning, access control down to specific ranges, etc.

Basically every engineer who joined the team thought it was a unforgivable blemish on the system, yet it survived a few years with no major issues, long enough for the team to build an internal backoffice and port the whole sheet structure into a proper CRUD API.


Biggest downside I've discovered by doing this is that there's not a very good way to diff separate versions of one sheet. Best I've done is export to CSV and use a generic text diff.


Most of these sound like things a native programming language can do. Without being a damn excel file. So it’s basically an argument for making your config files JavaScript, Python, etc.


You're missing the point. Power came not from Excel the file format, but Excel the GUI tool for editing Excel files. You cannot have drop-downs and colour pickers and separate tabs and embedded images in a single python file without writing a a whole custom GUI config tool.


Python can and often is edited by GUI tools with color pickers and dropdowns. IDEs such as Vscode have these.

Also, I said “most.” Not, “there exist no exceptions.”


Having a visual way to configure this makes it much more accessible to non-programmers, with error checking available through the host (Excel, in this case), while also reducing the eng effort in building this.

At Google, many internal tools use Sheets as their source of truth for config data, and it works really well.


> Having a visual way to configure this makes it much more accessible to non-programmers

I agree completely but thats not the point he appears to be making. He never stated this was the use case and he reiterated that it was a bad idea which should never be done.

Regardless, I’m saying those arent really unique advantages to excel. They just look unique compared to json, toml, yaml, etc.


I can't think of any data entry method more easily understood by non-programmers than an Excel spreadsheet (or Google Sheets etc). Repetitive data in particular is a breeze since you can drag-expand and use formulas. We use GSheets at work for configs created by non-programmers, and it works great.


What is your point? You can see from my previous comment I already agree excel is user friendly for non programmers.


The user-friendliness is a unique advantage of Excel.


I think the difference there though is that an Excel file is a lot more approachable to non-technical folks (even superficially). Depends on what's being configured and by whom though.


Excel sheets are an underappreciated tool for sharing technical info with analysts. They're easy to write and read from both dev and analyst side. They can be safely zipped, archived, emailed, saved on a USB drive by the user without any additional programming. Editing in Excel makes the expected schema clear. It's an excellent data interface.


I use a Google Sheet as it can be accessed from anywhere. Even my non-techie business folks can make changes (they feel at home in spreadsheets). I even added a menu button to the gsheet to launch a re-build (it is a static site on Netlify; the build fetches it's config/data from the gsheet).


Did they version the spreadsheets? Or maybe converted them to text and versioned the text format?


It doesn't sound terrible. This is what we do at work for large configs that non-programmers touch, except it's Google Sheets instead of Excel. It works, and they'll often refuse to use anything else.


.xlsx and openpyxl For The Win!

Cross-platform; copious spreadsheet applications; no-one needs training; scales.


I see it more like:

(1) A huge dependency in the project for reading Excel files.

(2) Everyone needs training, as usually people in my profession have rarely if ever any need for Excel.

(3) Visuals != contained text, so the config might be different from what you see on screen as the config value.

(4) No proper version control, even a csv file would be better.

(5) scales? lul, have fun trying to solve merge conflicts. Also don't come with any Excel git plugins. It will only make the bad decision worse.


xlsx is zipped xml. it might actually work. your profession must be very weird, though - knowing about git merge conflicts and not knowing how to use excel sounds like an empty venn diagram.


I wrote "usually people in my profession have rarely if ever any need for Excel". But I will explain.

Many software developers usually have no need for something like Excel. Either they use some free/libre alternative, because they know about it existing, or they use an actual programming language, or some might even use something like Emacs org-mode spreadsheets, or they use some library like Pandas for things, where it is reasonable to get out the tools. Software developers are also more likely to be aware of the technical debt incurred by storing anything inside Excel formats and will avoid it, if they are wise.

As such many software developers rarely use Excel, if at all. I personally don't use it at all. All my simple spreadsheet needs are covered by Libreoffice Calc or Emacs org-mode. If I had to use Excel now, I would not know the names of functions (translated perhaps, because Excel does that silly stuff) or how to reference cells (Is it $ and then the number? And : as a separator between col and row?). So yeah, to properly use it, many of us would have to learn at least a little of it.

Many if not most cases of Excel usage are actually due to people not knowing the alternatives, or perhaps knowing they exist, but not having the knowledge to use them (like with programming and quickly dishing out a few Pandas calls or Emacs org mode spreadsheets).


FWIW when I say Excel, I mean one of the big spreadsheet tools.

Excel is like Python: the second best tool for a lot of problems. There are problems which take five minutes to solve in the shell or in a text editor (maybe less now when LLMs can straight produce certain solutions with a simple prompt) and they take 10 seconds in a spreadsheet including copy and paste. I really recommend spending some time with Excel just as I recommend reading the table of contents of your primary DBMS’ manual.

I’d actually argue that being able to solve a problem quick and dirty in excel and in then in a more proper way in pandas is a good thing.


All my life I have used Excel, OO Calc (now LO), and Google Sheets interchangeably and I have found experience in one carries across both the others pretty seamlessly. I actually think of them as just one thing. Formulas are pretty much the same, especially.


I have resolved many, many git conflicts.

I am perennially lost in all but the very simplest spreadsheets.


If it helps, you can think of excel as a purely functional programming language with a 2D UI for memory visualization and editing.


I feel that as long as you use it for some simple configuration, you can use JSON, YAML, .properties files, TOML, whatever.

The problem IMHO is that we're using "configuration" files for things that aren't configuration. The "story" example from the blog post illustrates this. I find it hard to read YAML files that are dozens or hundreds of lines long.

Furthermore, once your "configuration" starts being so long, there's usually going to be enough repetition that you want to extract some duplication. YAML does have some facilities for this (anchors), unlike some other formats, but they're extremely limited.

So what happens is that different tools using YAML all start designing their own mechanism for sharing behaviour. It's all usually very ad-hoc, has edge cases and may not do things in the way you expect them. It also forces you to learn the specific rules for these facilities instead of allowing you to reuse your general programming knowledge.

On top of it all, YAML is essentially just a structureless key/value data structure. You can add schemas, but as far as I know, this isn't really standardised and editor support is... variable. In the worst case, you don't get any indication that you've configured something wrong. This is also part of the reason why I think that significant whitespace is OK for a programming language (still not a fan of it though), but bad for a configuration format, because bad indentation in a program either won't parse or will lead to obvious runtime errors, whereas bad indentation in a YAML file might just mean that a key isn't being set even though you think it should be.

For authors of tools that consume YAML, this means writing a lot of custom validation logic instead of relying on standard techniques like type systems.

I think we're on the wrong track and essentially just repeating XML's mistakes (just slightly less verbose, but also without schemas). We should rather use the programming constructs we know, e.g. by leveraging internal DSLs (I think that's part of the reason why Ruby was popular for tools like Chef for a short period, why Jenkins uses Groovy and Gradle now uses Groovy or Kotlin - these languages make internal DSLs easy). If we're worried about Turing completeness, maybe Dhall or something like it is the answer. But 400 line long YAML files with custom "!reference" tags that my editor doesn't understand doesn't seem like the solution.


This exactly. When the author is complaining that the configuration syntax doesn't support DRY you know something has gone wrong and configuration isn't really to blame.


I've thought this for as long as I can remember. People overcomplicate the config file and try to make the one config file to rule them all.

I like TOML, I started to look into using Hugo over Jekyll though and the TOML seems weirdly abstract and difficult to follow.


Great wrap up.


It is the least bad configuration format I have found. Granted I have only ever used it for fairly simple projects. But every config format is plagued with issues. A bit like programming languages, the fundamental problem is that they need to be easily understandable by both humans and computers which is an impossible problem to truly solve for any non-trivial use case. For example, TOML is criticised for verbosity but a lot of the abstractions that are used to implement DRY in a programming context may make the configuration confusing and unintuitive for non-programmers.


Jsonnet or JSON5 are much better than TOML or YAML.

Both are much easier to read, and don't have the footguns of YAML or StrictYAML.

I would generally say JSON5 is more appropriate because it is simpler, but Jsonnet does have some neat features and its IDE support is much better.


At least general programming languages like Python or JS are well-understood by many programmers. More than makes up for not being as specialized as a DSL.


Have you looked at NestedText?


I can't be the only person who thinks the monumental effort spent on config formats is bike-shedding.

JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway. JSONSchema gives us some typing sanity.

Can we just move on already to more interesting problems? It's not like git fulfills every VCS wish I've ever had either, but I have to move on. Projects and libs that introduce new config formats that continually remake the wheel, whose quirks have to be learned, are not helping my net productivity.

</rant>


> JSON is good enough for anything I've done.

I want comments in config files.


A horrible workaround I use is a blank redundant key and a leading // in my string to draw my eye to it. This only preserves the last comment in my python dictionary but I only use comments to work in the json file.

    {
        “”:”// comment here”,
        “Entry”:[-1,0,2],
        “”:”// next comment”,
        “Flag”:true
    }


And trailing commas on final list elements and object properties.


JSONC is exactly that, JSON with comments. Works fine, eg typescript’s tsconfig file is JSONC and I’ve yet to find a problem with it.



I can relate. But after using JSON for a while (in files that I edit by hand), I found that I really want comments and trailing commas (which leads to https://nigeltao.github.io/blog/2021/json-with-commas-commen...). Next I'd probably want multiline strings (leading to https://github.com/json5/json5).

But if you use those extensions, all your tooling breaks.

(Aside: I think the real bike-shedding would start when you want to add some syntax for raw string literals, e.g. heredocs; it's one of those features that feels redundant, until the day when you really need it and you can't bear the pain of repeatedly escaping and unescaping.)


> JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway.

If someone wants JSON with extra features like comments and typingz they are better off switching to Ion.

https://amazon-ion.github.io/ion-docs/docs/spec.html


  JSON is good enough for anything I've done.
  ...
  </rant>
Except for closing comments, for that, you need XML.


Yeah, don't know why some people don't want to settle on bad formats without such basics of human economics like comments and keep improving


Given how little I deal with config files compared to the rest of my work I prefer formats that are obvious, even if verbose, to those with sneaky syntax. I'll take JSON or even XML over TOML any day.


I previously argued that TOML wasn't good enough in this blog post https://dystroy.org/blog/hjson-in-broot/ where I show an example of problem which frequently hurts my users and leaves them lost without even understanding that the problem is in how they wrote their TOML.

I moved the configuration of several of my programs to Hjson. There are still problems but they're less puzzling. Hjson isn't ideal either but might still be the best configuration format we have today.


hjson is indeed more H vs json

You've mentioned in the blog that ": it's meant to be written by humans, read and modified by other humans, then read by programs", but is it possible for apps to (roundtrip)-edit those configs preserving all the human syntax intact? It's rather common for apps to e.g. have font size changed, but unfortunately also common to destroy human formatting in the process


This is theoretically possible, and I actually toyed with the idea.

I didn't do it in my deserializer because of the big value you have in Rust in being compatible with serde and that wouldn't be. But this would be interesting, probably as an side library.


Roundtrips without destroying comments or formatting is supported in the JavaScript, Go and C++ implementations of Hjson, but not in the other implementations (I think).


Thank you for introducing me to Hjson. I've been using simple colon delimited lists which seem to be, hilariously enough, already valid Hjson.


A lot of formats are Hjson compatible, notably JSON, and also what users wrote thinking it was JSON but they forgot some quotes or had a trailing comma so the JSON parser refuses it while the Hjson one is perfectly happy.


Hjson is also the format I use for all my things. Strikes a good balance.


Literally every config file format is terrible in some way or another. The best configs are executable and loaded into a dynamic runtime. Emacs and Airflow are good examples of this.

But I definitely strongly prefer YAML to TOML. It's just makes a lot more sense to me and it's a huge shame that PyPA went with TOML which is so un-Pythonic. I preferred setup.py. StrictYAML is a really good development that I wasn't aware of, though.


Flat is better than nested.

I'd argue that's enough for TOML to be more pythonic than YAML.


Both TOML and YAML support nesting. TOML simply looks flat even when it's not, so it's the worst of both worlds. In any case, it's not the format that is nested or flat, it's the content. Python itself supports nesting. The zen of Python merely says you shouldn't use the nesting when flat is an option.


I think this is a glass-half-full view o TOML, when TOML itself was the one that added water to the glass to begin with.

The main value proposition of TOML is to provide a concrete specification of a INI-type config language. INI is ubiquitous, but it lacked a spec, which led to a lot of wheels being reinvented. TOML fixed that.

If a project needs convoluted config files, I'd argue the project is already broken. If TOML doesn't fit your needs, that's hardly TOML's fault.


I'm okay with TOML for something like Cargo configuration, I don't enjoy it for much else though.

I always comment the same thing on these sorts of discussions - JSON5 has been really nice to work with if you can fully control all consumer applications of it (since there aren't great libraries for json5 in every language). Certainly nicer than the hellscape that is YAML.


> I'm okay with TOML for something like Cargo configuration, I don't enjoy it for much else though.

Which seems fine given TOML was designed as a stricter and more reliable INI, not a half-assed programming language. Can’t say I’m unhappy to know that when I see a toml file, it’s probably going to be pretty simple (exactly the opposite of the dread I feel when I see a yaml extension).


I think that’s exactly it. TOML is bad at deep hierarchies, but that’s a good thing; it prevents people from trying to model a whole damn AST in it, which is the direction that YAML seems to go.


Too bad this isn't a high-level comment that can be upvoted and float to the top.


It really depends on what you're configuring. I like TOML for most cases since it has the right balance of simplicity and expressiveness. JSON could in theory support everything that TOML can - but it can be jarring to look a large config file with just brackets/parenthesis to demarcate blocks. Even YAML finds its niche applications. I don't think JSON or TOML are good substitutes for YAML in case of Kubernetes. Kubernetes needs a single format that can act as a config language (in case of spec) and as a serialization format (in case of state). YAML fulfills that without being overly verbose. We probably shouldn't get too attached to a config language.


Internally, Kubernetes uses a key-value structure, and it doesn't mind what syntax you use to write your configuration files as long as it is passed via the HTTP+JSON API correctly. It's a relatively pedantic detail, but I think it supports the argument that YAML is the best tool for the job currently. Kubernetes' architecture would make it trivial to add TOML, XML or even s-expression support into the CLI tooling, but people don't seem to particularly want to.


Thanks for adding. Your argument demonstrates that there is an optimum human-friendly representation for each data.


to the very best of my knowledge, its input api payloads are always content-type:application/json; one can see this in effect via `kubectl --v=100 apply -f some.yaml` and watch it serialize that yaml to json _then_ POST it


That Cargo config with

[Lib]

But also

[[bin]]


Alright, I'll throw my hat in.

I wrote a config file format. I took JSON and added comments, strict typing (using one character for any type that needs a marker), the ability to split items with newlines instead of commas, trailing commas, explicit binary data (as base64), and a new type I call "symbols" for things like enums or references. Then I removed the need to surround keys with quotes if they have only a certain set of typical characters.

It looks like [1].

It turns out that JSON was very close. It just needed a few more things.

Edit: one thing about config file formats that I strongly believe is that they must be purely data, no code. If you need code, supplement them with a separate thing, and perhaps use an established language, like Lua.

[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/build.gaml


Lua was originally designed as a config language.


Sure, but I would say that anything that is Turing-complete is not a good config language.

Lua pivoted to plugins instead.


EDN (Extensible Data Notation) is a subset of Clojure: https://github.com/edn-format/edn

It is:

- Streamable

- Extensible

- Whitespace-insensitive, but there are formatting conventions for readability


Will go ahead and start off the config wars as undoubtedly the OP intends.

FWIW my best config experience has been with HOCON via typesafe|lightbend/config in Java. The ability to compose environment specific defaults in a reasonable way just felt good. Of course /internal/config to dump the config was a necessity, but trivial, so I tend to be less sympathetic to DRY is not necessarily good arguments.

Have been missing it in Go, unfortunately Java's ability to publish files within packages (that can be imported in config) was a key part of the UX that is missing in any compiled language I've seen.


+1 for HOCON, if you're on the JVM at least. We use it in our product with some extensions. Like you, I find it just feels good.

For those who haven't encountered it before, HOCON is a superset of JSON so all valid JSON is also valid HOCON. Then it starts adding syntax sugar and useful features specifically for configuration files (the "H" stands for Human).

We wrote a tutorial for our product, it has a slider you can move to see how JSON evolves into HOCON along the way:

https://conveyor.hydraulic.dev/11.2/configs/hocon/

It's got some nice features. There's no "syntax typing", programs that use HOCON are thus very forgiving. Conveyor takes that even further, for example, anywhere you would normally need to specify a list of strings you can also specify just one string, it'll be wrapped automatically. There is a formal spec. It supports substitution, inclusions and it defines the semantics of duplicate keys which allows for refactoring of configs out to separate re-usable files. It has a nice clean look that gets out of your way. You can not only include files but also URLs.

On top of that we add a few more features. If you need to express a list of strings then brace expansion is supported, i.e.

    foo = "bar-{1,2,3}"
is equivalent to

    foo = [ bar-1, bar-2, bar-3 ]
But probably the most important feature is hashbang includes. These allow you to include the output of arbitrary external programs:

    include "#!program --flags"
This lets you get the best of all worlds - the fast loading, simplicity and IDE sympathy of a declarative JSON-based config syntax, but if you hit the limits and need to programmatically generate some config you can do so whilst restricting the imperative logic only to the part of the file where it's needed. The rest remains declarative.

All this works pretty well. At some point I want to package this up into a native library using GraalVM so it's available to anything that can load native libraries. Being Java it's accessible to any language that can run on the JVM which is pretty good already, but to use it from Go would require bindings.

There are some downsides. It's not as well known as other syntaxes so syntax highlighting is sometimes missing from things like docsite generators. IntelliJ has a plugin for it but for other editors you might not get good support. It doesn't really have a schema language either, although you could of course just use JSON schema.


I have read this critique, and I think it is a good idea to review it before you adopt TOML. That said, I went with TOML for a group of projects recently. My list projects on GitHub had become difficult to manage as plain Markdown. (They had too many items. The sections were easy to forget to sort. Items in multiple sections had to be kept in sync manually.) I decided to generate the Markdown from a template and serialized data. I evaluated YAML and TOML for storing the data and ended up choosing TOML.

What I prefer about TOML for the data I have (dictionaries of dictionaries, no deep nesting) is that the textual representation is flat. I find it easier to read and edit.

For comparison, here is one project's TOML data (formatted using https://github.com/tamasfe/taplo with added comments): https://raw.githubusercontent.com/dbohdan/structured-text-to.... Here is the same data converted to YAML with unlimited line width and formatted using https://github.com/google/yamlfmt: https://paste.dbohdan.com/projects.1694621084.yaml.txt.


My biggest problem with TOML is absence of `null` values. Null is fundamental in describing missing or undefined values. Especially problematic are (numeric) arrays with missing values. JSON has null as the first-class object. In TOML world the recommended work-around is to use empty key-value object `{}` in place of a null. Kind of a hack. Makes replacing JSON with TOML tricky.

EDIT: typos


> Guido van Rossum came across subtle bugs where the indentation disagreed with the syntactic grouping. Meaningful indentation fixed this class of bug. Since there are no begin/end brackets there cannot be a disagreement between grouping perceived by the parser and the human reader.

Significant indentation alone doesn't eliminate this class of bugs. I'm pretty sure I've only seen this once ever, and can't remember the exact combination that caused it, but I have encountered a file that mixed tabs and spaces in a way python didn't barf over, that resulted in visual indentation being different from the programmatic indentation. I can definitely say it was python 2, not 3, so that particular instance may error nowadays.


Mixing tabs and spaces is a TabError in Python 3, so this would definitely error today.


Weird that his solution to this problem was to make a language where whitespace is significant (which makes it quite challenging to write) instead of just building a linter into Python.


Related:

just few days ago I crafted together some ideas i had couple of years already for a configuration language, syntactically like HCL but without HashiCorp's idiosyncrasies.

Here it goes, BCL (_Basic_ Configuration Language, for a lack of better name yet), Go prototype, I can code Python port and possibly several other as well..

https://github.com/wkhere/bcl

PS. It was a pleasure to code, esp. getting the parser & the reflection right; the latter enables so elegant api. And yes, together with Russ Cox I am in a camp claiming that yacc is alive and well


TOML config files are used heavily in my product for developer-facing features. I personally think it's the cleaner compared to YAML + JSON.

It also benefits from being easier to copy + paste configs from docs since the entire path is included in each block; users don't have to worry about setting up the correct hierarchy.

Regardless, I strongly agree with points #1-#3. I also wish there was a way to support something similar to inline YAML schemas to catch users' typos in their IDE.

I would love to use HCL or potentially Ion, but IDE support and widespread acceptance isn't strong enough yet.


I think syntax typing is a good idea. There are a million ways to format dates and standardizing that saves a lot of pain. Having explicit syntax for types means it's easier to handle arbitrary data or data of an unknown schema, and without it you are forced to assume everything is a string or rely on heuristics which leads to divergent implementations.

No semantic whitespace please. Can't tell you how many times I've seen something in GitHub Actions or similar get messed up because someone forgot a space.

I don't think TOML is a perfect format and agree it's not great at hierarchies.


I wrote one of the Go implementations [0] when TOML was announced and have maintained it since.

As a library implementor, I wish arrays would hold only one type at a time, but I get that could be useful for users. But as a user, I wish tables were fully defined once (more can't be added up later in the file), especially when using larger files.

[0]: https://github.com/pelletier/go-toml


I really like the way the author writes.

The article is great, and so are a few other links. Particularly enjoyed this[0] on the same site - The Norway Problem, which discusses parsing challenges in YAML.

[0] https://hitchdev.com/strictyaml/why/implicit-typing-removed/


The Norway problem is an example of trying to fix a problem that was never there.

If we kept strings double quoted as they were forever it wouldn't have been a problem.

That's the easiest way.

* TRUE / True / true & FALSE / False / false = parse as boolean

* "something quoted" = parse as string

* ...

Same stuff happened with the first version of Angular. For some reason they parsed "no" as false, even though JavaScript coerces it to true.


So replace TOML with YAML? Or strictYAML?

That's like having an engine problem and replacing engine with Shuttle Thruster. Or Space X rocket engine.

YAML imo will never beat JSON. The spec is so complex and full of edge cases that it will never reach simplicity and speed of JSON.

I implemented a spec compliant parser and it gets around 20 MB/s. In JSON I can get to at least five times that.


YAML is mainly for configuration files. JSON sucks at being configuration file format, it is mainly used for serialization in HTTP requests. Those 2 have different use cases.


I've been using it for Sublime and some variant for VSCode. It's pretty nice, but it's missing comments.


Lack of comments is a showstopper for config file format. JSON also adds some significant amount of bloat due to the double quotes.


> JSON also adds some significant amount of bloat due to the double quotes.

As opposed to YAML's double quotes (and single quotes, and unquoted, and folded, and literals) or you mean Toml's double quotes (and single quotes and multi-line quote).

Quotes aren't a problem. Many different types are.


> YAML imo will never beat JSON. The spec is so complex and full of edge cases that it will never reach simplicity and speed of JSON.

Ironically, all JSON documents are valid YAML. YAML is actually a superset of JSON.


Ironically? Fact it's a superset means it has more states and can't be as optimized as JSON.

To compound the misery YAML is whitespace indented with some extra bonkers rules. So even your JSON subset in YAML needs to obey it.

Mix of flow and block makes both modes worse. However it also makes a lot of sense.


I also wasn't happy with TOML as an hierarchical ini alternative, so I made my own config format https://shoal.eelnet.org

Don't judge me, as I don't use it anywhere besides personal projects)


I'm just another datapoint but I like TOML way more - even in the complicated example

I can look at any TOML file and I can see exactly which incredibly nested value am I looking at.

Is my configuration file twice as large as an equivalent YAML? Great, I'll pay 0.000001 more on R2 but I'll retain the ability to easily understand what nightmare configuration I am looking at

I agree with you the "3.14" vs 3.14 difference is not easy on some users, albeit that could be fixed in the business logic. Casting to int is not the end of the world.

I also hate indentation based anything (I hate Python with a passion - especially now when I'm forced to use it because AI people are fond of it)


My patience for these config languages ran out a while ago. JSON is readable enough for simpler configs. Instead of inventing a new DSL for advanced configs, a simple .js or .py file can generate the JSON dynamically in a way that any programmer should be able to follow, which is common in the JS world. Text protobufs are also a decent alternative if you want something more than JSON but not dynamic. Heck, even an Excel spreadsheet (as mentioned elsewhere in the comments) can make sense for non-technical users. All of these options are easier to understand than some config-specific language with unexpected behaviors.


I use TOML for my cargo files quite a bit and to be frank about it, I've never even considered using TOML for anything else. With that said, I'm using YAML for kubernetes and I couldn't imagine TOML being any more obnoxious.


> One way it does this is by trying to include date and time parsing which imports all of the inherent complications associated with dates and times.

IDK, having date/time as first class seems very good. It's so common,


Probably a hot take, but I'd like an extended JSON syntax where you can do this:

    foo = "something to use later",
    bar = { "a": 1.1, "b": 2.2 },
    baz = [ 1n, 999999999999n, ],
    /* comments! */
    {
        [foo]: `\
    multiline template string
    ${foo}`,
        ...bar, /* spread! */
        "baz": [ ...baz, 1234n ],
        "more": my_extension("something like !foo in YAML, but just JavaScript function syntax")
    }


I'm confused, apart from the question where you'd like to limit JS, fundamentelly with your mix of statements and and an object literal, how would you even extract a key from this? E.g., how to retrieve that nameless object literal?

Maybe defining the format as one big comma expression [^1] and disallowing functions+eval would have it's merit... but then, what to do about circular references?

Very creative though :)

[1] So that way, the whole config would evaluate to the expression behind the last top-level comma


JSON also has one nameless object literal. That literal is the payload. The assignments before are just things that can be referenced in the actual payload. Yes, its like a big comma expression. The idea is that it is a syntactically correct JavaScript expression. It's just a thought. Obviously haven't written a spec. I just came up with that idea when writing complex YAML configurations for things like docker-compose, OpenAPI specs, and Ansible. I want to reuse the same strings (credentials) in multiple places without repeating myself, I want to do string interpolation, multiline strings, bigint, comments, and something that enables something like Ansible's `!vault`. I DON'T want certain YAML features like yes/no for booleans, timestamps, cryptic |- and >-, and other weird things one usually doesn't think of. And I saw someone who is blind writing that they very much prefer curly braced syntax over indentation based syntax, because indentation is very hard to reason over for them in their head, which is the whole reason that makes me side with non-indentation based syntax now. Given those constraints I ad-hoc came up with that JavaScript subset. Haven't thought about it much, maybe there are issues I haven't thought of.


Oh yes, and I'd like to have hex-float for easy exact floating point representation, but not if JavaScript doesn't support it because then it wouldn't be a JavaScript subset anymore.


Yes I was also enjoying to explore the implications of the thought :)

Thanks for your elaboration


I assume the nameless object literal is, to steal a term from XML, the "root element", the same as with normal JSON.


Exactly.


How is this meaningfully different than using Javascript?


It doesn't run actual code. The function call syntax is just for extensions like `!vault` in Ansible YAML. And yes, it is supposed to be a new JavaScript subset and JSON superset.


I haven't had the same experience. Given that TOML is basically an INI file, I use it for data that's basically key-value with some categories. It seems fairly obvious, at least to me, that anything beyond that is going to be really ugly to represent.

If I want a key-value file (for some simple preferences), TOML is by far the best out there. For anything more than that, I use YAML (though I should probably reach for a simpler subset of it, as YAML is crazy).


I agree that TOML is frustrating when configuration gets large and complicated. But those issues seem like "As a Python user I don't like TOML because I like StrictYAML and TOML looks weird to my eyes".

Unless there are deeply nested large maps/dictionaries, TOML is fine.

I prefer UCL, though. IMO, the most sane and usable format for configuration.


I’ve quit using any format for projects because no matter how good they’re, nothing can bring auto completion as no IDE will parse and understand an external file format. Instead, I’m using PHP array which gives me nice auto completion and I see no problem with its format.

When the config file needs to be consumed by a third party, then I’d use TOML.


> Doctypes and namespaces are horrendous additions to the language, for instance.

I basically completely disagree with this, especially namespaces: the ability to combine elements and attributes from multiple document types without ambiguity is extremely powerful and pays dividends when designing a query language like XPath


> Having symbols delimiting blocks and indentation violates the DRY principle.

This is a very interesting hot take if I have ever heard one. I don’t want to agree, I don’t like significant whitespace, but I might have to agree.

I think by the same line of reasoning, spaces for indentation rather than tabs violates DRY as well though? I could be wrong.


The suggested replacement: StrictYAML is actually pretty excellent. I like it a lot.

Seems to solve a lot of the issues I have with regular JSON, YAML and TOML.

The only thing I wish there was a way to go directly between StrictYAML and dataclasses (or have an easy way to generate a corresponding dataclass from the schema).


If Json had a way from the beginning to add comments and types, everything would have been perfect


It's one of 12 "Why not use...?" subsections of that document but there's no subsection on "Why not use s-expressions?"

Perhaps that was because they didn't consider them or maybe they did consider them but could not think of any reasons not to use them?


S-expressions only really address a smallish part of what people seem to want out of configuration file syntaxes: a simple, recursive syntax.

Among the things that S-expressions don't address per se are the interpretation of tokens (e.g., numeric vs non-numeric token syntax), how non-numeric tokens may be interpreted as booleans, symbols, timestamps, etc.; whether to use alists vs. plists for associations and the semantics of any duplicate keys within an associative construct; how to specify the schema for a configuration object (required vs. optional elements & the types of each, etc.)

IOW, merely saying "use S-expressions" over-emphasizes syntax while under-emphasizing semantics.


> there's no subsection on "Why not use s-expressions?"

It's all about config files. Are there people using s-exps in config files?


I use Emacs, so yeah.

I’d rather write some TOML/YAML/JSON for simple things where there are a handful of key/value pairs. In more complex cases where you want specific typing, or inheritance, or to solve any of the problems in this article, oh how I wish I could use something like a Guile parser to evaluate s-exprs. It would make life so much simpler.


Emacs and Guix are standout examples.

You could also include anything built on Clojure EDN.

And, in the same way as Elisp for Emacs, Guile is an s-expression based language used not only for configuration but also for extension of GnuCash, GIMP, LilyPond and Pidgin.


If you think this format is bad, you probably have never had to work with a fluentd config file.


Agree with the author. I have used TOML when configuring Netlify and it's super awkward.


Perhaps there is kind of a way to improve JSON without changing JSON: use the gron format [0] as config.

Tools can parse it as JSON and that’s it.

[0]: https://github.com/tomnomnom/gron


I absolutely loathe YAML and avoid using it. I use TOML for user edited configurations and I use JSON for things that a user doesn't need to touch, because it's human readable in name only. I like the type enforcement.


The more applications I write, the more I tend to avoid typical config files in favor of using direnv [0] to set environment variables.

[0] https://direnv.net/


Environment variables are awful for config. You get no validation at all - typo in your variable name? It'll just silently ignore it. They can be injected by any parent process and are invisible to all child processes. Also they can be accessed by a simple getenv() anywhere in the program which eventually leads to your config being undocumented and spread all through your code.

Environment variables are generally a hack and should be avoided where possible.


I use https://pkg.go.dev/github.com/urfave/cli/v3 in my Go programs, and it does enough validation that this has never been a problem in practice, especially when paired with a simple config scheme (see my other comment).


That's fine as long as there aren't too many variants of configurations. For example, I had a case where multiple ML training runs had to be configured with slight variations in parameters. That too, on the cloud. It can be managed with environment variables - but it becomes messy soon. With TOML, it was just a matter of switching those files around. There are better solutions these days for this particular use-case. However, it should give an idea where environment variables are not a better choice than TOML files.


Agreed, wholeheartedly in fact. My (poorly articulated) point is that the solution is generally to be found in simplifying configuration, rather than in the configuration language itself. This is something that microservice architectures are actually pretty good at doing.


Cool article! I can see why TOML's verboseness can definitely be an issue once you start working on larger and larger TOML files.

I'm assuming StrictYAML is your favourite out of all the different config files you could have currently?


A safe assumption. The author wrote/maintains StrictYAML.


Not really related to the article, but I always thought it was kind of quaint that TOML is "Tom's Obvious Minimal Language" (made by Tom Preston-Werner, one of the founders of GitHub).


Those examples are nuts. Who would ever write a config like this?


Compared to JSON, TOML is more complicated.

That means it's harder to learn, that it will be supported by fewer tools, and that the tools that do support it will have more bugs.


While more verbose, XML is pretty clear. In what environment(s) are people messing with config files so much that the format is a productivity issue?


> Having symbols delimiting blocks and indentation violates the DRY principle.

This has to be one of the worst takes I ever heard.


Agree with author. Use yamllint with YAML to avoid commonly complained about problems.


These seem like major nitpicks


Sad to find out it's not about Tales of Maj' Eyal.


If JSON just originally had comments there wouldn't be such a proliferation of redundant configuration language, since JSON serves perfectly well 99% of cases.


Comments, and allowing trailing commas after the last element of an array/dictionary.


Comments, trailing commas, and multiline strings.

Comments, trailing commas, multiline strings, and a real date type.

Comments, trailing commas, multiline strings, a real date type, and dropping the quotes on keys.

Comments, trailing commas, ...

JSON really isn't just the "simple tweak" away from being the perfect config language it is often presented as. There's this fun sort of error that only a group of people can make, where someone can stand up and make a statement like "JSON just needs a couple of tweaks to be the perfect config format!" and everyone can individually nod in agreement, and the group thinks it is in agreement. But it turns out every individual actually had a completely different interpretation of the statement and they don't agree at all. When you get a group of developers together to be clear about the changes to JSON you will discover not everybody has the same tweaks in mind.

(& I wrote this before seeing the sibling comment from enriquto who gives another thing necessary for perfection. I don't even 100% know what enriquto means by "standard" floating point numbers versus what is already in the spec, but I bet it's another thing that multiple people could nod in agreement to but in fact mean something quite different about if you nail them down to the spec level!)


> Comments, trailing commas, and multiline strings.

IMO, that would make it "good enough" for most use cases.

Having the application parse a date from a string isn't too bad. And dropping quotes on keys is just a nicety. And really, trailing commas aren't strictly necessary either.

But comments are absolutely necessary for a human readable configuration format. And multi-line strings are critical for any use case where you have strings that could be long. Like, say, a "description" field.

As a sidenote, the absence of multi-line strings is a major frustration I have with writing JSON schemas in JSON.


"Perfect config language" is too high a standard.

JSON could be much better, and good enough for far more purposes, with the top 3 tweaks there.


To be fair to JSON, this is why there are no config formats that are good enough for everyone. They either try to meet all of these requirements and people complain they're too complex, or they choose a subset of the requirements and only satisfy a subset of people.


Strong agree.

I think it's not entirely unlike programming languages. If I sit down and write out all the features I want from a programming language, I can't have it. They straight-up contradict each other. As a simple example, I want simplicity and a rich type system. Nope. Not gonna happen. Intrinsically at odds with each other. And there are many other such places, more subtle than that but no less real.

Config has the same thing going on. I've made my peace with just doing whatever's convenient in the moment and not stressing about it. It certainly isn't worth bringing in a new syntax my local team has never heard about because it's just the perfect little config syntax. It's very hard for any config syntax to be better enough than what came before to justify adding another tech to the stack.


It's not fair to JSON, and to see why you just need to add some quantities to the quality of "nothing is great for everyone". Like, a huge % of users would need comments, so that's a huge downside of JSON, and it doesn't matter that there is no perfect non-complex config format with floats (or whatever) since that would be too complicated


I don't really think that any of those are as important as comments, though. Comments are indispensable for a self-documenting config file. Trailing commas and so forth are merely convenient.

JSON with comments wouldn't be perfect, but it would be unacceptable a lot less often.


Support for trailing commas is just the sane thing to do, especially since JSON is inspired by Javascript that already supports them. So does Python, Java, C++ and perhaps another half dozen of popular programming languages. Why the hell go the opposite way?

I'm not sure who made the decision at the time, but looking back at the hours we have collectively wasted because of this non-feature, I believe they would have reconsidered.


I'd also like it if people would stop trying to enhance their own json parser/writer to support these adhoc, and still calling it json. It's nice to have NaN/Inf support until someone eventually tries to use it with another library which doesn't.


And standard floating point numbers.


Multiple bases would be better than standardizing on how to represent floating point numbers, since floating point still results in errors with unrepresentable values (e. g. 1/3 in base 10).


Base 10 is sufficient. Any base 2 number can be represented exactly in base 10. If you need rational numbers, you need something other than floating point.



I was just thinking this morning how JSON is a symptom of people not learning formal grammar et. al. as part of their education and training. Maybe it's just because I cut my teeth on Wirth languages, but JSON seems like net negative productivity.


JSON was designed as a wire format, not as a configuration file. The original idea was to validate it with a regex to exclude malicious input and then eval() it. That explain why it's such a small subset.

TOML on the other hand is also a clear indication that Tom did not like using (or hadn't learned about) a formal grammar.


JSON wasn't designed? It's just the literal notation from JS extracted, no?

And I don't thing you can validate JS with a regex? (Have I forgotten that much of formal language theory?)


The proliferation of configuration languages is far older than JSON.

There's just a couple inherent problems with the notion of a universal configuration language...

First not all software has the same type of configuration requirements. I don't mean different schemas, I mean some measure of "configurablity". You have things like nginx or varnish that are configurable enough that configs are almost (or actually) programs. You also have programs where you need to store a few KVs that will be inserted into strings at some point and that's it.

That wide range of "amount of possible configuration" is hard to capture easily in a single configuration language.

Next you have a wide range "flatness" - that is if the configuration needs to be a list of kvs (optionally w/ sections) or if you need a tree or graph shape, or something weird.

Finally there's the whole preference and "style" thing. Writing a config language has an appeal to it - it's a fairly straight-forward thing to do (at least seemingly, until you hit the edge cases), and it's infinitely bikeshedable. In a similar vein, it seems to devolve into this sort of thing quite often: https://xkcd.com/927/

This is one of those situations where it's nice to be "old" - I like that I only have to know a handful of config formats these days, rather than keeping track of each program's bespoke config format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: