The advice to "use CFGs" is incomplete. What you really want is to use one of the deterministic subsets of CFGs, like LR(k), and importantly, with a parser-generator which will reject any non-LR productions in the grammar before even generating the parser. This way, you can get a guarantee that there are no ambiguities in the language you've defined ahead-of-time.
You might consider this to be analogous to the difference between static and dynamic typing. A statically typed language will reject an erroneously typed program before attempting to run it, but the dynamically typed language may accept it and attempt to run it, only to fail at runtime if there's a type mismatch.
Parser combinators are more like the dynamic language. You find out at runtime (when parsing), whether the text you have parsed has ambiguities because more than one parse result is returned if that is the case. Normal operation is that the combinator returns a single result, in which case there are no ambiguities in the input, but the parser combinator doesn't provide any guarantees about absence of ambiguity in the language itself - it only proves that there's no ambiguity for the given input. And there's no practical way of testing for all inputs that the defined language is unambiguous, because there are infinite possible inputs.
LL/LR parser generators are like a type system for grammar. They'll take your input grammar, and check that all rules adhere to the constraints of LL/LR, and iff that's the case, then they'll produce a valid parser for a provably unambiguous language.
PEG are also provably unambiguous, but for a given input, the "correct parse" is not necessarily the "desired parse". Essentially, when you're using a PEG, the language is defined by its grammar. When you use an LL/LR parser-generator, you're attempting to use some constraints to formalize a grammar for your language, which you define yourself, and the tool will correct you if you've made a mistake and introduced some ambiguity by accident, whereas a PEG will silently mask that issue by just picking the first option. Rather than prompting you to fix the language to match the intuition you had about its syntax, the "accidental ambiguity" becomes a non-ambiguity and part of the language definition. Also, when extending an existing language with new syntax, PEGs can have unexpected results which may break existing code written in the language, or interpret it differently to what was intended by whoever wrote it.
However, when you're parsing a language that somebody else has already defined, it doesn't really matter which method you use to write the parser, as an incorrect parse is a bug in your code, and not the language itself.
For greenfield languages, I'm largely with the article's author that LL/LR are the correct constructions to use - they assist you in designing a language that is free of ambiguity from the start because incorrect assumptions about your syntax are flagged by the parser-generator. A greenfield language designed with a PEG becomes a language that is defined by the PEG, and any potential mistakes in your intuition become a set-in-stone part of the language.
As for the choice between LL and LR, it's pretty much a non-brainer to use LR (or LALR) to give yourself greater flexibility. LL had their place on resource constrained systems of the past, but on today's systems, LR parsing uses a trivial amount of resources and is only going to be a fraction the compiler or interpreter's memory and CPU time. If you can fit your language into an LL grammar, you certainly should, as it will be blazingly fast to parse, but I don't think you should constrain yourself just because of this.
Recommendation: Use Menhir. It supports writing production rules which are parametrized by other production rules. This can "feel" a bit like parser combinators to write, but it's all ultimately lowered to LR. The parsers it produces are fast, and you can use the tool for incremental parsing.
> The advice to "use CFGs" is incomplete. What you really want is to use one of the deterministic subsets of CFGs, like LR(k), and importantly, with a parser-generator which will reject any non-LR productions in the grammar before even generating the parser.
Which is also why there is no "just use CFG". CFG is an abstraction from language generation. BNF as a "language" was invented for language generation. It's suitability to parsing was mostly an accident, and we've been dealing with the pragmatic repercussions in the field of parsing ever sense. To make parsing work we have bounded subsets of the abstraction. To make the illusion of the abstraction work we've bent BNF to nearly its breaking point, but the illusion fails as soon as you realize the BNF for LL and LR (and LALR and so on and so forth) is subtly incompatible. There's no universal BNF, there never was, it's a broken illusion.
PEG is just as useful as a bounded subset of the abstraction as LL/LR. The break was PEG decided to stop pretending BNF was a useful lingua franca between these subsets and then parser combinators contributed further to the demise of "universal BNF", which was never universal.
There's no requirement for parser combinators to allow ambiguity. It's perfectly fine, and possible to write parser combinators that will only parse an unambiguous language, and only ever return a single unambiguous parse or throw an error.
You're right it's often easy to use them in ways that introduce ambiguity, and some explicitly support ambiguity, though.
Yes, but I've never actually seen this done in practice. For starters, it kind of requires a multi-stage compilation process - because you must verify and prove non-ambiguity before or during compilation of the application using the parser. In practice people just use their parser-generators in their application's code without any prior stage verifying it. To do this without a multi-stage approach would requires somehow embedding the semantics of LL/LR into the host language's type system. I suppose alternatively you could verify the parser on application startup and terminate with an error if it's not valid LL/LR.
PEGs are a better fit for parser combinators because the host language typically has a pattern matching construct which checks patterns from top to bottom, and this matches the semantics of PEGs ordered alternations.
If I write a parser combinator based parser for a grammar that is not ambiguous, I will write it to fail if it comes across an error, just as I would if I were to e.g. hand-write a recursive descent parser. In fact, most of the places where I use parser-combinators, I use them to build parsers structured exactly the way I would structure a recursive descent parser, just with reusable combinators instead of manually written functions, and I will - in the same way as I would wherever possible in a recursive descent parser - aim to bail at the first symbol that causes an error, or optionally jump an error recovery function.
I've never in my life written a parser combinator based parser that required a multi-stage compilation process. Verifying that the parser recognises the language it is meant to, is part of the test suite whichever parsing method you use.
> I suppose alternatively you could verify the parser on application startup and terminate with an error if it's not valid LL/LR.
Why would I run my test-suite on application startup? The grammar is not magically changing between runs.
> If I write a parser combinator based parser for a grammar that is not ambiguous
How do you even know it is not ambiguous to begin with?
I already said it doesn't matter what tool you use to parse a language designed by someone else. The language you design yourself is not automatically free of ambiguity, and intuition only gets you so far. If you stick to only using productions which are valid LL/LR, then why not have that enforced by the tool, so it can warn you when you go astray?
> How do you even know it is not ambiguous to begin with?
By knowing what makes a grammar ambiguous, and not doing it, and having a test suite that I need anyway.
> If you stick to only using productions which are valid LL/LR, then why not have that enforced by the tool, so it can warn you when you go astray?
A large part of the reason why there are so many parser generators is that we to date do not have a parser generator design that generates parsers that most people like to use for major projects. I've written about 5 generations of parser generators myself over the last 30 years, and I hate every one of them, yet every one was written in response to not finding a parser generator I consider tolerable to work with.
Most of the time it's easier to work with a hand-rolled parser. Parser combinators makes that easier, and more reusable.
You might consider this to be analogous to the difference between static and dynamic typing. A statically typed language will reject an erroneously typed program before attempting to run it, but the dynamically typed language may accept it and attempt to run it, only to fail at runtime if there's a type mismatch.
Parser combinators are more like the dynamic language. You find out at runtime (when parsing), whether the text you have parsed has ambiguities because more than one parse result is returned if that is the case. Normal operation is that the combinator returns a single result, in which case there are no ambiguities in the input, but the parser combinator doesn't provide any guarantees about absence of ambiguity in the language itself - it only proves that there's no ambiguity for the given input. And there's no practical way of testing for all inputs that the defined language is unambiguous, because there are infinite possible inputs.
LL/LR parser generators are like a type system for grammar. They'll take your input grammar, and check that all rules adhere to the constraints of LL/LR, and iff that's the case, then they'll produce a valid parser for a provably unambiguous language.
PEG are also provably unambiguous, but for a given input, the "correct parse" is not necessarily the "desired parse". Essentially, when you're using a PEG, the language is defined by its grammar. When you use an LL/LR parser-generator, you're attempting to use some constraints to formalize a grammar for your language, which you define yourself, and the tool will correct you if you've made a mistake and introduced some ambiguity by accident, whereas a PEG will silently mask that issue by just picking the first option. Rather than prompting you to fix the language to match the intuition you had about its syntax, the "accidental ambiguity" becomes a non-ambiguity and part of the language definition. Also, when extending an existing language with new syntax, PEGs can have unexpected results which may break existing code written in the language, or interpret it differently to what was intended by whoever wrote it.
However, when you're parsing a language that somebody else has already defined, it doesn't really matter which method you use to write the parser, as an incorrect parse is a bug in your code, and not the language itself.
For greenfield languages, I'm largely with the article's author that LL/LR are the correct constructions to use - they assist you in designing a language that is free of ambiguity from the start because incorrect assumptions about your syntax are flagged by the parser-generator. A greenfield language designed with a PEG becomes a language that is defined by the PEG, and any potential mistakes in your intuition become a set-in-stone part of the language.
As for the choice between LL and LR, it's pretty much a non-brainer to use LR (or LALR) to give yourself greater flexibility. LL had their place on resource constrained systems of the past, but on today's systems, LR parsing uses a trivial amount of resources and is only going to be a fraction the compiler or interpreter's memory and CPU time. If you can fit your language into an LL grammar, you certainly should, as it will be blazingly fast to parse, but I don't think you should constrain yourself just because of this.
Recommendation: Use Menhir. It supports writing production rules which are parametrized by other production rules. This can "feel" a bit like parser combinators to write, but it's all ultimately lowered to LR. The parsers it produces are fast, and you can use the tool for incremental parsing.