> If you know RegExp's, the syntax will immediately make sense If I know RegEx, ...

skrebbel · on Dec 30, 2022

Because if you know regexp, you know how terrible it is.

lucideer · on Dec 30, 2022

I've yet to meet anyone who thinks this.

The greatest problem with regexp is not its complexity nor learning curve: it's its apparent complexity & learning curve. That tends to put learners off, but anyone I know who's persisted has found the journey faster than expected. After they're over it, everyone I've talked to thinks pretty highly of the syntax: it's simple (small set of compostable components) & expressive, especially if using labelled groups.

The second greatest problem with Regexp is learning query optimisation: that's definitively complex but this new syntax doesn't even attempt solving that.

vanderZwan · on Dec 30, 2022

Eh, I can manage to write a regexp with a little effort, but reading them has never stopped being painful. So in my anecdotal, sample-size one experience it leans heavily towards write-only language.

This might do better there.

rjh29 · on Dec 30, 2022

Do you use the /x modifier?

vintermann · on Dec 30, 2022

Meet me. I think it. Now you've met one.

You can of course disqualify me in saying I can't possibly have persisted at learning it, if I don't like it. But I am inevitably "the guy who knows regular expressions" in the places I've worked. I do use them, for stuff like fixing tedious one-off data munging jobs without having to write an actual program to do it. But I

* Do it step by step, to avoid writing anything resembling a complex regex, because I've yet to meet anyone who can write a complex regexp without bugs.

* Use "undo" a lot in that process.

Garlef · on Dec 30, 2022

> I've yet to meet anyone who thinks this.

Hello *waves*

The only thing regexp has on the plus side is that it's compact. I'd prefer something that's more readable without knowledge. I know how to use it but I always have to google it. I only use it once per year and so there's no big win in mesmerizing it.

ogogmad · on Dec 30, 2022

Do you remember at least:

  . * | + ? ( )

This is 95% of it. After that, it's character classes and lazy matching.

Character classes can often be given as ranges:

  [a-z] [A-Z] [0-9] ^ $

You can use the pipe | to combine then.

Lazy matching finds the shortest matching string instead of the longest one. This is given by the symbols:

  *? +?

Instead of + and * respectively.

> I only use it once per year

If you use a text editor, any kind of repetitive find+replace activity can be sped up using regexes. Seems strange you only use it once per year.

Some people tell you that you can't parse XML using regexes. THEY'RE WR- right, but you can often do decent one-off jobs on an XML document using regular expressions, as long as you're aware that it's a one-off job that won't work on any other document.

throwaway9870 · on Dec 30, 2022

It is not that it is hard to understand the syntax or even write them, it is hard to read and change them. Here is an example I just grabbed off a random webpage:

^(?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[!@#$^&()_-]).{8,18}$

That is hard to quickly* read and understand what it is. Sure a comment would help, but if someone said it isn't working, I have to work all that out and then fix it.

There is a reason Perl lost to Python. It just seems like we can do better.

Garlef · on Dec 30, 2022

And that's an expression without any escaped parens, braces, etc.

omnicognate · on Dec 30, 2022

> compostable components

I've certainly seen regexes that remind me of compost heaps.

(I agree that regex syntax is fine. I just liked the typo.)

ogogmad · on Dec 30, 2022

> The second greatest problem with Regexp is learning query optimisation: that's definitively complex but this new syntax doesn't even attempt solving that.

Query optimisation is totally unnecessary if you use Rust's or Go's implementation.

lucideer · on Dec 30, 2022

:grimace_face:

I'm a big believer that "premature optimisation" is a bad thing but making a blanket statement that any given implementation is 100% performant for all use-cases is going a bit far.

ogogmad · on Dec 30, 2022

I doubt whatever you're doing is going to work in Rust or Go.

Say that the length of your string is n, and the length of your regular expression is m. Isn't Regular Expression matching in those languages guaranteed O(m n) time? In plain English, that's only linear time - which you practically can't improve on - even if you tried! It can only do worse if your implementation uses backtracking instead of Finite Automata - which was very common at one time, but is now considered The Wrong Way: https://swtch.com/~rsc/regexp/regexp1.html

There is actually a way to optimise your queries, which is by precompiling them to DFAs, but I don't think that's what you're doing.

lucideer · on Jan 1, 2023

I stand completely corrected. This is cool.

> There is actually a way to optimise your queries, which is by precompiling them to DFAs, but I don't think that's what you're doing.

You can do that, but the primary advantage of DFAs in practice is backtracking avoidance, so simply reduction of backtracking in manually constructed queries is more what I was referring to. The missing part of the puzzle here being that not supporting backtracking at all doesn't limit capabilities, so removal rather than reduction is the go/rust approach.

pmarreck · on Dec 30, 2022

The /x mode (extended regexp) lets you use any amount of whitespace or commenting. I use that mode whenever a regex gets complicated enough to merit an explanation (which is quickly), and also indent groups and whatnot. Examples of where I used this in the past are here:

Complex password validation: https://gist.github.com/pmarreck/4c5f1076498da1a86062

Email header parsing: https://gist.github.com/pmarreck/8476538

An attempt at a JSON validator (yeah, I know): https://gist.github.com/pmarreck/2775709

Remember that commenting is not just for others, but also for "future you"!

rjh29 · on Dec 30, 2022

I feel like this opinion is from people who use regexes only rarely. I use them all the time at work, and you become pretty adept at parsing and understanding even dense ones. Of course using the /x modifier is still recommended.

junon · on Dec 30, 2022

Disagree, honestly. I really don't like this True Scotsman absolutist statement either. If pomsky was able to guard against ReDos then maybe I'd consider switching. Otherwise, I have no problems reading and understanding regular expressions.

echelon · on Dec 30, 2022

Variables and comments seem nice.

Insignificant whitespace is needed to support it, and as an added bonus it would make it easier to break up patterns across multiple lines.

The syntax changes and added verbosity do not seem great, though. They'd trip me up for sure.

In general, I think I'd like to see a language more like "Regex 2.0", ie. an extension that doesn't depart too far from what we're used to.

mhitza · on Dec 30, 2022

While a language like Raku (formerly known as Perl 6) is unlikely to catch up in the current landscape, it did bring a lot of improvements to regular expressions (link to section that starts to use interesting examples [1]).

Way back somewhere in the 2010s when I was still keeping an eye on Perl 6, I was kind of hoping that all these improvements would make their way into some kind of PCRE v3 that all other tools that already use PCRE would switch to. Would have been nice.

[1] https://docs.raku.org/language/regexes#Alternation:_||

joshka · on Dec 30, 2022

(?x) mode in regexes does comments / whitespace / multiple lines.

+1 to the syntax

kyriakos · on Dec 30, 2022

I wrote a lot of regex but I find reading regex extremely hard. Even expressions I wrote myself become unreadable after a few months.

DemocracyFTW2 · on Dec 30, 2022

If you know RegEx, you have two problems.

BtM909 · on Dec 30, 2022

If you solve a problem with RegEx, you now have two problems!

YesThatTom2 · on Dec 30, 2022

Code readability isn’t for you.

It is for the next person that maintains the code.

leeoniya · on Dec 30, 2022

...which is often "future you" :)