I asked it about Tailwind CSS (since I had problems with Claude not aware of Tai...

threeducks · 2025-05-22T18:29:23 1747938563

> Which version of tailwind css do you know?

LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.

redman25 · 2025-05-23T13:33:20 1748007200

They can if they've been post trained on what they know and don't know. The LLM can first been given questions to test its knowledge and if the model returns a wrong answer, it can be given a new training example with an "I don't know" response.

dingnuts · 2025-05-23T13:54:24 1748008464

Oh that's a great idea, just do that for every question the LLM doesn't know the answer to!

That's.. how many questions? Maybe if one model generates all possible questions then

nicce · 2025-05-22T19:42:02 1747942922

We should use the correct term: to not have to deal with bullshit.

dudeinjapan · 2025-05-23T08:56:10 1747990570

I think “confabulation” is the best term.

“Hallucination” is seeing/saying something that a sober person clearly knows is not supposed to be there, e.g. “The Vice President under Nixon was Oscar the Grouch.”

Harry Frankfurt defines “bullshitting” as lying to persuade without regard to the truth. (A certain current US president does this profusely and masterfully.)

“Confabulation” is filling the unknown parts of a statement or story with bits that sound as-if they could be true, i.e. they make sense within the context, but are not actually true. People with dementia (e.g. a certain previous US president) will do this unintentionally. Whereas the bullshitter generally knows their bullshit to be false and is intentionally deceiving out of self-interest, confabulation (like hallucination) can simply be the consequence of impaired mental capacity.

nicce · 2025-05-23T11:07:09 1747998429

I think the Frankfurt definition is a bit off.

E.g. from the paper ChatGPT is bullshit [1],

> Frankfurt understands bullshit to be characterized not by an intent to deceive but instead by a reckless disregard for the truth.

That is different than defining "bullshitting" as lying. I agree that "confabulation" could otherwise be more accurate. But with previous definition they are kinda synonyms? And "reckless disregard for the truth" may hit closer. The paper has more direct quotes about the term.

[1] https://link.springer.com/article/10.1007/s10676-024-09775-5

dudeinjapan · 2025-05-24T04:47:47 1748062067

You're right. It's "intent to persuade with a reckless disregard for the truth." But even by this definition, LLMs are not (as far as we know) trying to persuade us of anything, beyond the extent that persuasion is a natural/structural feature of all language.

SparkyMcUnicorn · 2025-05-22T18:02:13 1747936933

Interesting. It's claiming different knowledge cutoff dates depending on the question asked.

"Who is president?" gives a "April 2024" date.

ethbr1 · 2025-05-22T18:18:40 1747937920

Question for HN: how are content timestamps encoded during training?

cma · 2025-05-22T21:29:09 1747949349

Claude 4's system prompt was published and contains:

"Claude’s reliable knowledge cutoff date - the date past which it cannot answer questions reliably - is the end of January 2025. It answers all questions the way a highly informed individual in January 2025 would if they were talking to someone from {{currentDateTime}}, "

https://docs.anthropic.com/en/release-notes/system-prompts#m...

polynomial · 2025-05-23T23:19:48 1748042388

I thought best guesses were that Claude's system prompt ran to tens of thousands of tokens, with figures like 30,000 tokens being bandied about.

But the documentation page linked here doesn't bear that out. In fact the Claude 3.7 system prompt on this page clocks in at significantly less than 4,000 tokens.

tough · 2025-05-22T18:34:19 1747938859

they arent.

a model learns words or tokens more pedantically but has no sense of time nor cant track dates

svachalek · 2025-05-22T18:49:58 1747939798

Yup. Either the system prompt includes a date it can parrot, or it doesn't and the LLM will just hallucinate one as needed. Looks like it's the latter case here.

manmal · 2025-05-22T18:51:30 1747939890

Technically they don’t, but OpenAI must be injecting the current date and time into the system prompt, and Gemini just does a web search for the time when asked.

tough · 2025-05-22T19:02:44 1747940564

right but that's system prompting / in context

not really -trained- into the weights.

the point is you can't ask a model what's his training cut off date and expect a reliable answer from the weights itself.

closer you could do is have a bench with -timed- questions that could only know if had been trained for that, and you'd had to deal with hallucinations vs correctness etc

just not what llm's are made for, RAG solves this tho

stingraycharles · 2025-05-23T00:40:41 1747960841

What would the benefits be of actual time concepts being trained into the weights? Isn’t just tokenizing the dates and including those as normal enough to yield benefits?

E.g. it probably has a pretty good understanding between “second world war” and the time period it lasted. Or are you talking about the relation between “current wall clock time” and questions being asked?

tough · 2025-05-23T07:49:14 1747986554

there's actually some work on training transformer models on time series data which is quite interesting (for prediction purposes)

see google TimesFM: https://github.com/google-research/timesfm

what i mean i guess is llms can -reason- linguistically about time manipulating language, but can't really experience it. a bit like physics. thats why they do bad on exercises/questions about physics/logic that their training corpus might not have seen.

tough · 2025-05-22T19:03:58 1747940638

OpenAI injects a lot of stuff, your name, sub status, recent threads, memory, etc

sometimes its interesting to peek up under the network tab on dev tools

Tokumei-no-hito · 2025-05-22T20:26:58 1747945618

strange they would do that client side

diggan · 2025-05-22T20:36:59 1747946219

Different teams who work backend/frontend surely, and the people experimenting on the prompts for whatever reason wanna go through the frontend pipeline.

tough · 2025-05-22T21:13:26 1747948406

its just like extra metadata associated with your account not much else

dawnerd · 2025-05-22T19:14:12 1747941252

I did the same recently with copilot and it of course lied and said it knew about v4. Hard to trust any of them.

PeterStuer · 2025-05-23T07:28:11 1747985291

Did you try giving it the relevant parts of the tailwind 4 documentation in the prompt context?