Hacker News new | past | comments | ask | show | jobs | submit login

So? In any decent programming environment, checking if a string is a URL is a one liner



Not without false-positives.

E.g., is "foo.bar" a URL? Maybe. But it could also be a filename. How do you know if it's a "real" URL or not?


In The Good Ol' Times, I would have replied "let's check the TLD", but now that list is basically trending to include the entire English dictionary... so I guess the only response these days is "ask DNS". So we've already gone from "pattern-match a string" to "pattern-match then make network calls", which (as anyone who's done any network work knows) also requires managing a bunch of possible/likely error states (offline, timeout, partial response, response format, etc etc). So yeah, nothing is as easy as it looks.


Can't quite do this most of the time due to privacy concerns. Leaking random URL-looking text to the network is a big no.


So now we have to ask for user consent (installation time? first run?) and respond accordingly, adding another piece of UI... but it will only take an hour, right...?


You probably also want a setting to detect slow networks and disable it there in case the user is tethering etc.


Strictly speaking, a URL begins with a scheme followed by a colon.

Schemes can be registered with IANA (or not), and everyone knows the most common half-dozen or so. People often forget "mailto:" and "tel:".

The project brief asks for one thing, but the practical implementation probably requires something else.

This is a good lesson to learn, and this is how two hours becomes two days, becomes two weeks.


It isn't a URL. As per RFC 3986 [0]:

> The term "Uniform Resource Locator" (URL) refers to the subset of URIs that [..] provide a means of locating the resource by describing its primary access mechanism

Since "foo.bar" does not describe an access mechanism, it is not a URL. Yes, you could make the argument that "foo.bar" is a relative-path reference as described in section 4.2, but that is only used to:

> express a URI reference relative to the name space of another hierarchical URI

So "foo.bar" can only be considered a URL in the context of another given URL, and in your example there is none.

[0] https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3


I don't have to worry about that, because I'll pick a language that offers a `URL` object or something similar, and which handles the validation for me.

Additionally, if foo.bar were a valid URL, then I would expect it to appear on the list. I can't read the user's mind as to whether the text should be treated as a URL or not.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: