The greatest problem with regexp is not its complexity nor learning curve: it's its apparent complexity & learning curve. That tends to put learners off, but anyone I know who's persisted has found the journey faster than expected. After they're over it, everyone I've talked to thinks pretty highly of the syntax: it's simple (small set of compostable components) & expressive, especially if using labelled groups.
The second greatest problem with Regexp is learning query optimisation: that's definitively complex but this new syntax doesn't even attempt solving that.
Eh, I can manage to write a regexp with a little effort, but reading them has never stopped being painful. So in my anecdotal, sample-size one experience it leans heavily towards write-only language.
You can of course disqualify me in saying I can't possibly have persisted at learning it, if I don't like it. But I am inevitably "the guy who knows regular expressions" in the places I've worked. I do use them, for stuff like fixing tedious one-off data munging jobs without having to write an actual program to do it. But I
* Do it step by step, to avoid writing anything resembling a complex regex, because I've yet to meet anyone who can write a complex regexp without bugs.
The only thing regexp has on the plus side is that it's compact. I'd prefer something that's more readable without knowledge. I know how to use it but I always have to google it. I only use it once per year and so there's no big win in mesmerizing it.
This is 95% of it. After that, it's character classes and lazy matching.
Character classes can often be given as ranges:
[a-z] [A-Z] [0-9] ^ $
You can use the pipe | to combine then.
Lazy matching finds the shortest matching string instead of the longest one. This is given by the symbols:
*? +?
Instead of + and * respectively.
> I only use it once per year
If you use a text editor, any kind of repetitive find+replace activity can be sped up using regexes. Seems strange you only use it once per year.
Some people tell you that you can't parse XML using regexes. THEY'RE WR- right, but you can often do decent one-off jobs on an XML document using regular expressions, as long as you're aware that it's a one-off job that won't work on any other document.
It is not that it is hard to understand the syntax or even write them, it is hard to read and change them. Here is an example I just grabbed off a random webpage:
That is hard to quickly* read and understand what it is. Sure a comment would help, but if someone said it isn't working, I have to work all that out and then fix it.
There is a reason Perl lost to Python. It just seems like we can do better.
> The second greatest problem with Regexp is learning query optimisation: that's definitively complex but this new syntax doesn't even attempt solving that.
Query optimisation is totally unnecessary if you use Rust's or Go's implementation.
I'm a big believer that "premature optimisation" is a bad thing but making a blanket statement that any given implementation is 100% performant for all use-cases is going a bit far.
I doubt whatever you're doing is going to work in Rust or Go.
Say that the length of your string is n, and the length of your regular expression is m. Isn't Regular Expression matching in those languages guaranteed O(m n) time? In plain English, that's only linear time - which you practically can't improve on - even if you tried! It can only do worse if your implementation uses backtracking instead of Finite Automata - which was very common at one time, but is now considered The Wrong Way: https://swtch.com/~rsc/regexp/regexp1.html
There is actually a way to optimise your queries, which is by precompiling them to DFAs, but I don't think that's what you're doing.
> There is actually a way to optimise your queries, which is by precompiling them to DFAs, but I don't think that's what you're doing.
You can do that, but the primary advantage of DFAs in practice is backtracking avoidance, so simply reduction of backtracking in manually constructed queries is more what I was referring to. The missing part of the puzzle here being that not supporting backtracking at all doesn't limit capabilities, so removal rather than reduction is the go/rust approach.
The /x mode (extended regexp) lets you use any amount of whitespace or commenting. I use that mode whenever a regex gets complicated enough to merit an explanation (which is quickly), and also indent groups and whatnot. Examples of where I used this in the past are here:
I feel like this opinion is from people who use regexes only rarely. I use them all the time at work, and you become pretty adept at parsing and understanding even dense ones. Of course using the /x modifier is still recommended.
Disagree, honestly. I really don't like this True Scotsman absolutist statement either. If pomsky was able to guard against ReDos then maybe I'd consider switching. Otherwise, I have no problems reading and understanding regular expressions.
While a language like Raku (formerly known as Perl 6) is unlikely to catch up in the current landscape, it did bring a lot of improvements to regular expressions (link to section that starts to use interesting examples [1]).
Way back somewhere in the 2010s when I was still keeping an eye on Perl 6, I was kind of hoping that all these improvements would make their way into some kind of PCRE v3 that all other tools that already use PCRE would switch to. Would have been nice.
If I know RegEx, why would I use pomsky?