Does this fall under the category of a recursive descent parser?

chowells · on March 1, 2021

No.

The other answer is also correct, that it's Yes.

In other words, the question is wrong.

Applicative parser combinators are an interface. The backing implementation is pretty open. It might be recursive descent. It might be the Earley algorithm. It might be a packrat parser.

The interesting thing about Applicative parser combinators is that the way they are composed is amenable to static analysis. That allows a wide variety of implementation strategies, all behind a simple uniform interface...

... So long as you ignore that there are still things left unspecified by the interface, especially related to backtracking behavior. (I really hate how much mindspace Parsec-like libraries have in Haskell. They have the worst backtracking behavior.)

seagreen · on March 1, 2021

> ... So long as you ignore that there are still things left unspecified by the interface, especially related to backtracking behavior. (I really hate how much mindspace Parsec-like libraries have in Haskell. They have the worst backtracking behavior.)

The one that auto-backtracks on Alternative is attoparsec (IIRC), parsec and megaparsec don't unless you use `try`. Is your problem with the attoparsec style or with all of them, and if the latter what do you prefer instead?

chowells · on March 1, 2021

Attoparsec and megaparsec don't backtrack. They hard commit to choices and never reconsider them after having done so.

Let's say I want to match the strings "a" or "aa" followed by "ab". Let's just go ahead and write that the most trivially obviously correct way:

    thing = (string "a" <|> string "aa") <> string "ab"

Woo! See how easy it is? ... except that's utterly broken in any parsec-derived library. They don't backtrack. There's nothing in the world you can do to make them backtrack. The only thing you can do is refactor the structure of the grammar to reflect the parser's limitations.

Yes, this example is artificial, but that makes it easy to understand. The broader issue of which this is one example is that you cannot apply distributive laws in parsec or its derivatives.

    (a <*> c) <|> (b <*> c)

Is not interchangeable with

    (a <|> b) <*> c

On the other hand,

    (a <*> b) <|> (a <*> c)

Is interchangeable with

    a <*> (b <|> c)

This might be good for efficiency, but it's not very good for making it pleasant to write parsers.

seagreen · on March 1, 2021

> Attoparsec and megaparsec don't backtrack.

There are two notion of backtracking at play here. One is backtracking out of a failing first branch of <|> even though it's consumed some input. The megaparsec authors consider this a form of backtracking (see https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Te...).

The other is backtracking from a failure in the second argument to <> or <*>.

I agree that none of the libraries being considered do the latter. But the effect on memory use wouldn't just be slightly inefficient, it would mean keeping the entire input following a <|> in memory until the whole thing completes. Do you think this is the best way to do things in general, or just for specialized situations like parsing things with a fixed size such as config files?

chowells · on March 1, 2021

Very, very few parsers need to handle multiple gigabyte inputs. For those, sure, focus on performance first. Your tools should always be suited to your problem.

None of that changes the fact that Parsec-like designs are especially obnoxious when it comes to backtracking. You need to be aware of their limitations and design the structure of your grammar around them.

As an idea for a design somewhere in between, maybe try taking inspiration from Prolog. Add a cut combinator for limiting the scope of backtracking explicitly. It still solves the size issues on large inputs, but it allows more freedom in refactoring as long as you don't cross one of those explicit boundaries.

sullyj3 · on March 2, 2021

Great explanation, thanks. Do you know of any haskell parser combinator libraries that exhibit the properties you're looking for?

chowells · on March 2, 2021

uu-parsinglib and Earley are interesting points in the space.

The former[1] is monadic rather than Applicative, but it has a lot of interesting properties related to error correction and producing online results. Sometimes it feels magical to watch the error correction at work, and the online nature makes it good for memory use if your parser is constructed to take advantage of it.

The latter[2] is strictly applicative, though it's inside an environment monad for making shared productions explicit. Because the grammar portions are strictly applicative, it actually supports turning a grammar around and producing a list of possible outputs from the parser associated with the input sequence that would produce them. It shows off the introspection capabilities nicely, but sometimes you really just want to write a context-sensitive parser, and the applicative interface doesn't give you that power. (Nor does the underlying algorithm support it, so it's nice to match the interface expressiveness to the algorithm power.)

[1] https://hackage.haskell.org/package/uu-parsinglib [2] https://hackage.haskell.org/package/Earley

kreetx · on March 1, 2021

Came here to say this (that parsec, the old standard, didn't backtrack).

platz · on March 1, 2021

And yet all the implementations are actually just functional recursion, in practice, no?

chowells · on March 1, 2021

Yes, but that isn't saying much. It's basically the same as saying "they're all assembly instructions when they're sent to the CPU".

Functional recursion is just looping. "Recursive Descent" is a very specific parsing algorithm.

platz · on March 1, 2021

then you'd also claim that a parser generator grammer could also be implemented with functional recursion. so what is the value of the original claim though?

chowells · on March 1, 2021

The original question was about "recursive descent".

You keep talking about "functional recursion".

These are not the same words, nor do they mean the same thing. I do not understand your underlying question, or what point you might be trying to make if the question is intended to be rhetorical.

lirazel · on March 1, 2021