The advice to "use CFGs" is incomplete. What you really want is to use one of th...

WorldMaker · on June 6, 2024

> The advice to "use CFGs" is incomplete. What you really want is to use one of the deterministic subsets of CFGs, like LR(k), and importantly, with a parser-generator which will reject any non-LR productions in the grammar before even generating the parser.

Which is also why there is no "just use CFG". CFG is an abstraction from language generation. BNF as a "language" was invented for language generation. It's suitability to parsing was mostly an accident, and we've been dealing with the pragmatic repercussions in the field of parsing ever sense. To make parsing work we have bounded subsets of the abstraction. To make the illusion of the abstraction work we've bent BNF to nearly its breaking point, but the illusion fails as soon as you realize the BNF for LL and LR (and LALR and so on and so forth) is subtly incompatible. There's no universal BNF, there never was, it's a broken illusion.

PEG is just as useful as a bounded subset of the abstraction as LL/LR. The break was PEG decided to stop pretending BNF was a useful lingua franca between these subsets and then parser combinators contributed further to the demise of "universal BNF", which was never universal.

vidarh · on June 5, 2024

There's no requirement for parser combinators to allow ambiguity. It's perfectly fine, and possible to write parser combinators that will only parse an unambiguous language, and only ever return a single unambiguous parse or throw an error.

You're right it's often easy to use them in ways that introduce ambiguity, and some explicitly support ambiguity, though.

sparkie · on June 5, 2024

Yes, but I've never actually seen this done in practice. For starters, it kind of requires a multi-stage compilation process - because you must verify and prove non-ambiguity before or during compilation of the application using the parser. In practice people just use their parser-generators in their application's code without any prior stage verifying it. To do this without a multi-stage approach would requires somehow embedding the semantics of LL/LR into the host language's type system. I suppose alternatively you could verify the parser on application startup and terminate with an error if it's not valid LL/LR.

PEGs are a better fit for parser combinators because the host language typically has a pattern matching construct which checks patterns from top to bottom, and this matches the semantics of PEGs ordered alternations.

vidarh · on June 5, 2024

I don't get why you assume this.

If I write a parser combinator based parser for a grammar that is not ambiguous, I will write it to fail if it comes across an error, just as I would if I were to e.g. hand-write a recursive descent parser. In fact, most of the places where I use parser-combinators, I use them to build parsers structured exactly the way I would structure a recursive descent parser, just with reusable combinators instead of manually written functions, and I will - in the same way as I would wherever possible in a recursive descent parser - aim to bail at the first symbol that causes an error, or optionally jump an error recovery function.

I've never in my life written a parser combinator based parser that required a multi-stage compilation process. Verifying that the parser recognises the language it is meant to, is part of the test suite whichever parsing method you use.

> I suppose alternatively you could verify the parser on application startup and terminate with an error if it's not valid LL/LR.

Why would I run my test-suite on application startup? The grammar is not magically changing between runs.

sparkie · on June 5, 2024

> If I write a parser combinator based parser for a grammar that is not ambiguous

How do you even know it is not ambiguous to begin with?

I already said it doesn't matter what tool you use to parse a language designed by someone else. The language you design yourself is not automatically free of ambiguity, and intuition only gets you so far. If you stick to only using productions which are valid LL/LR, then why not have that enforced by the tool, so it can warn you when you go astray?

vidarh · on June 5, 2024

> How do you even know it is not ambiguous to begin with?

By knowing what makes a grammar ambiguous, and not doing it, and having a test suite that I need anyway.

> If you stick to only using productions which are valid LL/LR, then why not have that enforced by the tool, so it can warn you when you go astray?

A large part of the reason why there are so many parser generators is that we to date do not have a parser generator design that generates parsers that most people like to use for major projects. I've written about 5 generations of parser generators myself over the last 30 years, and I hate every one of them, yet every one was written in response to not finding a parser generator I consider tolerable to work with.

Most of the time it's easier to work with a hand-rolled parser. Parser combinators makes that easier, and more reusable.