They're not just from AI-generated text. Some of us humans use en dashes and em dashes in the right context, since they're easy to type on macOS: alt+hyphen and alt+shift+hyphen respectively.
On both iOS and modern Android I believe you can access them with a long press on hyphen.
They have their place but I'm really just trying to avoid the AI house style that has emerged. I'd rather have my writing—AI-assisted or not—reflect how I actually communicate rather than defaulting to patterns that have become over represented in generated text.
That’s not really fair. I mean, I definitely use AI all over the place, but I think that the writing aspect is an important part of thinking too [1]. I still try to write things out myself when it matters. There’s something about wordsmithing that sharpens your thinking and that gets lost when you just drop something into an LLM and pull it out without much thought. Sure, I’ll use AI to help refine or explore ideas, but the core work often starts in my own head.
I do write a lot myself, especially when I need to think something through clearly. I use AI tools like anyone else, but I still do the work.
How are em-dashes "slightly" archaic in this context? Can you point me to a single example of internet discourse from the last 30 years where a human used an em-dash unironically?
Academic papers doesn't count, literature doesn't count. I'm looking for an example of human created discourse online. The crux of the allegation is that normal meatbag humans don't use an em-dash when conversing with one another online, or when writing informal texts, purely because there is no key for the em-dash on the keyboard (that I know of).
I posit that the use of an em-dash in online discourse is so archaic that it's a 100% surefire giveaway of AI.
>Can you point me to a single example of internet discourse from the last 30 years where a human used an em-dash unironically?
Thousands upon thousands.
>I'm looking for an example of human created discourse online. The crux of the allegation is that normal meatbag humans don't use an em-dash when conversing with one another online
Meatbag humans whose education failed them don't. Other humans did and still do, from Usenet to Substack, and from Slashdot to Hacker News.
Here's a random PG essay sprinkled with 23 em-dashes:
Way more people, in posts, comments, etc. use en-dashes and hyphens as em-dashes (just because they don't know how to quickly insert proper ones, or aren't aware there's a typographic distinction, but do now the use of dashes for parenthetical statements and asides.
I use em dashes a bunch in both informal communication and more formal writing. Mobile keyboards have em dashes, and I also have the compose key turned on on Linux.
I learned how to type em-dashes on Mac (option-shift-hyphen) 10+ years ago and have been using them with some frequency since then. Picking 2023, here are some comments with emdashes that I personally typed:
To enter an em dash on Windows, hold down Alt and type 0 1 5 1 on your keyboard’s numpad. (Alt 0 1 5 0 for an en dash.) This only works with numpad number keys so laptop users are out of luck.
It is insane that in 2025, this is an accepted way to type lesser-used characters on Windows still, when the Mac has had the Option key typing umlauts and em-dashes extremely simply (an umlauted U is literally option-u, u... Ironically, I'm currently on a Windows machine so I cannot even type it) literally since 1984.
My family is German (I'm firstborn American) so this was a huge sell for the Mac way back then
Sad to see that Windows is still stuck in the PS/2 days here
If you install Power Toys, you get a feature called "Quick Accent" which gives you a shortcut to get basically any symbol quickly. Hold down the key that's the most like the one you are looking for, and press space. A little menu pops up where you can cycle through all the variants.
So `- + space` brings up a menu with all the "dashy" characters. There's 12 of them!
There is also `Win + .` which brings up an emoji menu, where you can also access the symbols list.
The compose key works well on Linux. Typically mapped to right alt, compose-hyphen-hyphen-hyphen produces an em dash. (hyphen-hyphen-period produces an en dash.)
Given the compose sequences are mnemonic, I’d prefer it over Mac every time. Compare Compose+<< and >> for «» to Opt+[ and Opt+Sh+[ on Mac. Which may or may not work depending on locale.
MS Office will insert em-dashes automatically in most documents, so in fact there are a lot of Word docs and Outlook emails that contain them.
I sometimes specifically try and trigger them: if you have a piece of text and go back to insert a hyphen, it won't em-dash until you've followed it with a space, another word and then another space. I now sort of end up doing '- x ' and then backspacing so that the word following the x now follows an em-dash.
They exist to provide clarity. The are not hyphens, or en-dashes, they're em-dashes. The fact that some people have forgotten how to use them (or perhaps not been taught), does not make them "archaic", it makes those people who find them as such to be ignorant of basic sentence structure and punctuation.
I think if you're under the age of 30 and you suddenly start using them, you're showing your GenAI a little too much, but the answer is not to get your AI to stop using them, but for us to teach people why they exist and to use them more often when and where they are appropriate.
I’ve used em-dashes in all sorts of online forums for decades. It’s on the Mac keyboard, and there are also tons of tools that automatically convert double or triple dashes to a single long dash. They were never uncommon.
Surely this is an absurd exaggeration. I've been using em dashes everywhere (online comments, email, chat) for ~25 years now. I'm not unusual in this regard; everyone who cares about punctuation probably uses them liberally. They're not hard to type; on macOS you can hold down Alt and Shift while hitting the `-`, and on Android you can long press on the `-`. Maybe they're used less by Windows users?
Even just looking at my HN comments, 381 of my ~1200 HN comments so far (so >30%) have em dashes. This includes my very first comment on HN from 2009 (https://news.ycombinator.com/item?id=602094) and several that have multiple em dashes in them:
Having fun vibe coding my first personal website with astro and three.js - I'd say it's working pretty well so far. I need to tone down the amount of animation and glows this, it's a little too much.
I let Jules write a PR in my codebase with very specific scaffolding, and it absolutely blew it. It took me more time to understand the ways it failed to grasp the codebase and wrote code for a fundamentally different (incorrectly understood) project. I love Gemini 2.5, but I absolutely agree with the gp (pauldix) on their quality / scope point.
Couldn't agree more. I wish all major model makers would build tools into their proprietary UIs to "summarize contents and start a new conversation with that base". My biggest slowdown with working with LLMs while coding is moving my conversation to a new thread because context limit is hit (Claude) or the coherent-thought threshold is exceeded (Gemini).
I never use any web interfaces, just hooked up gptel (an Emacs package) to Claude's API and a few others I regularly use, and I just have a buffer with the entire conversation. I can modify it as needed, spawn a fresh one quickly etc. There's also features to add files and individual snippets, but I usually manage it all in a single buffer. It's a powerful text editor, so efficient text editing is a given.
I bet there are better / less arcane tools, but I think powerful and fast mechanisms for managing context are key and for me, that's really just powerful text editing features.
Likely the case for established model makers, but barring illegal use of outputs from other companies' models, a "first generation" model would still need this as a basis, no?
Why illegal? The more open models (or at least open-weight models) should allow using their outputs. Details depend on license.
But yes, 'first generation' models would be trained on human text almost by definition. My comment was only to contradict the claim that 'all LLMs' are trained from stolen text, by noting that some LLMs aren't trained (directly) on human text at all.
Is any RL done without unit testing? I would be surprised to hear that that wasn't the case, as it would imply a disregard for accuracy for other model makers, which would be surprising. Perhaps you can do this for small modular problems but not for a problem with a 200k token input?
Absolutely this. Gemini is amazing, but I'm under no illusions that their principal goal right now is to boost their database of high quality training data with free access via ai studio. That said, custom silicon with a model made with internal teams collaborating to make use of that hardware idiosyncracies must be a massive advantage, as well.
reply