Hacker News new | past | comments | ask | show | jobs | submit | liorn's comments login

I asked it about Tailwind CSS (since I had problems with Claude not aware of Tailwind 4):

> Which version of tailwind css do you know?

> I have knowledge of Tailwind CSS up to version 3.4, which was the latest stable version as of my knowledge cutoff in January 2025.


> Which version of tailwind css do you know?

LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.


They can if they've been post trained on what they know and don't know. The LLM can first been given questions to test its knowledge and if the model returns a wrong answer, it can be given a new training example with an "I don't know" response.

Oh that's a great idea, just do that for every question the LLM doesn't know the answer to!

That's.. how many questions? Maybe if one model generates all possible questions then


We should use the correct term: to not have to deal with bullshit.

I think “confabulation” is the best term.

“Hallucination” is seeing/saying something that a sober person clearly knows is not supposed to be there, e.g. “The Vice President under Nixon was Oscar the Grouch.”

Harry Frankfurt defines “bullshitting” as lying to persuade without regard to the truth. (A certain current US president does this profusely and masterfully.)

“Confabulation” is filling the unknown parts of a statement or story with bits that sound as-if they could be true, i.e. they make sense within the context, but are not actually true. People with dementia (e.g. a certain previous US president) will do this unintentionally. Whereas the bullshitter generally knows their bullshit to be false and is intentionally deceiving out of self-interest, confabulation (like hallucination) can simply be the consequence of impaired mental capacity.


I think the Frankfurt definition is a bit off.

E.g. from the paper ChatGPT is bullshit [1],

> Frankfurt understands bullshit to be characterized not by an intent to deceive but instead by a reckless disregard for the truth.

That is different than defining "bullshitting" as lying. I agree that "confabulation" could otherwise be more accurate. But with previous definition they are kinda synonyms? And "reckless disregard for the truth" may hit closer. The paper has more direct quotes about the term.

[1] https://link.springer.com/article/10.1007/s10676-024-09775-5


You're right. It's "intent to persuade with a reckless disregard for the truth." But even by this definition, LLMs are not (as far as we know) trying to persuade us of anything, beyond the extent that persuasion is a natural/structural feature of all language.

Interesting. It's claiming different knowledge cutoff dates depending on the question asked.

"Who is president?" gives a "April 2024" date.


Question for HN: how are content timestamps encoded during training?

Claude 4's system prompt was published and contains:

"Claude’s reliable knowledge cutoff date - the date past which it cannot answer questions reliably - is the end of January 2025. It answers all questions the way a highly informed individual in January 2025 would if they were talking to someone from {{currentDateTime}}, "

https://docs.anthropic.com/en/release-notes/system-prompts#m...


I thought best guesses were that Claude's system prompt ran to tens of thousands of tokens, with figures like 30,000 tokens being bandied about.

But the documentation page linked here doesn't bear that out. In fact the Claude 3.7 system prompt on this page clocks in at significantly less than 4,000 tokens.


they arent.

a model learns words or tokens more pedantically but has no sense of time nor cant track dates


Yup. Either the system prompt includes a date it can parrot, or it doesn't and the LLM will just hallucinate one as needed. Looks like it's the latter case here.

Technically they don’t, but OpenAI must be injecting the current date and time into the system prompt, and Gemini just does a web search for the time when asked.

right but that's system prompting / in context

not really -trained- into the weights.

the point is you can't ask a model what's his training cut off date and expect a reliable answer from the weights itself.

closer you could do is have a bench with -timed- questions that could only know if had been trained for that, and you'd had to deal with hallucinations vs correctness etc

just not what llm's are made for, RAG solves this tho


What would the benefits be of actual time concepts being trained into the weights? Isn’t just tokenizing the dates and including those as normal enough to yield benefits?

E.g. it probably has a pretty good understanding between “second world war” and the time period it lasted. Or are you talking about the relation between “current wall clock time” and questions being asked?


there's actually some work on training transformer models on time series data which is quite interesting (for prediction purposes)

see google TimesFM: https://github.com/google-research/timesfm

what i mean i guess is llms can -reason- linguistically about time manipulating language, but can't really experience it. a bit like physics. thats why they do bad on exercises/questions about physics/logic that their training corpus might not have seen.


OpenAI injects a lot of stuff, your name, sub status, recent threads, memory, etc

sometimes its interesting to peek up under the network tab on dev tools


strange they would do that client side

Different teams who work backend/frontend surely, and the people experimenting on the prompts for whatever reason wanna go through the frontend pipeline.

its just like extra metadata associated with your account not much else

I did the same recently with copilot and it of course lied and said it knew about v4. Hard to trust any of them.

Did you try giving it the relevant parts of the tailwind 4 documentation in the prompt context?


Can one even refuse it? If someone just sends tainted coin to my address, am I screwed?


Not an answer to your question but BTC does not really have wallets. Someone just signs a number of BTCs with his private key and your public key so the only way to sign it again is with your private key. This is essentially how it "moves" to "someone". Some other DTLs (blockchains) actually have wallets/accounts with properties where you can disable incoming funds/add a name/change the password and such stuff.


Unlike Ethereum, Bitcoin addresses are just a hash of a public key. The sent Bitcoins are unspent outputs that you can selectively (depends on the wallet) decide not to use.


iirc, a number of wallets allow you to specify which unspent TX outputs you use.

So if you got tainted coin sent to your address, you could avoid using that UTXO in future TXs.

That might protect you from some scrutiny.


That would work. But then there's also the question if the received transaction counts as taxable income in the jurisdiction of the receiver. If that's the case and if they received a very significant amount they would be forced to sell some of the coins so that they can pay the tax for them.


> a number of wallets allow you to specify which unspent TX outputs you use.

Yep, this is referred to as “coin control”


What's wrong with simply returning it? Any "tainted coin tracker" could easily build in a mechanism for detecting that


The simplest answer is that returning it isn't free.


Also there is a good question is there some limit or ratio of tainted coins that would be considered non incriminating. Or if there isn't would single satoshi taint all coins? If there were wouldn't dilution of funds be possible? That is wash them by sending to wallets with enough funds...


Ok so use it to pay the fee, returning the rest.


You just washed part of the BTCs which is now owned by the miner.


That's assuming a BTC miner even agrees to process the transaction. Why would a BTC miner want to deal with the headache of tainted bitcoins? It's such a problem that Marathon Digital Holdings, a major bitcoin miner, has stated they will refuse to process transactions from tainted addresses. In the near future, I expect more mining pools to do the same.


If they were send to you, someone does process them.


And why would currency users even bother with it? If I receive a payment for something, I don't feel the need to pretend to be the police.


Why would you give money to a criminal?


Would you return stolen goods to the thief? How about return it to the owner/authorities. May not be simple but certainly better than sending it back. If all else fail there are black hole addresses to forever lock the BTCs.


Every so often a moron shows up trying to sell the idea of colored coins... it is almost as if they've never heard of "fungibility".


what is tracked is outputs and inputs, not addresses. addresses are derived from a public key or a script.

when a block is mined, an output is created, outputs can only be spent once. a bitcoin transaction is just a list of existing outputs (inputs) and new outputs to create (outputs). each output is created with a lock script, to spend the output you must provide the unlock script which normally contains at least a signature and a public key

returning the output that corresponds to the unwanted transaction should do


Would you pay the transaction fee from the returned funds, washing a fraction of it through mining? Or pay for it out of your own utxo, potentially leading to a griefing attack?



If I were a malicious actor with a good way to attack this, I would wait until this was being used in real applications which deal with larger amounts of ETH, and then attack.

Why should I reveal my cards for $3K?


Because you know if you can find a given flaw, someone else can; and whoever attacks second gets nothing.


Bug bounties aren't aimed at malicious actors, or an attempt to outbid the black market. There's a lot of non malicious people out there who are still competent hackers.


The cockpit recording from Japan Airlines Flight 123 (which crashed due to improper repairs of tail strike damage 7 years earlier) is chilling. https://www.youtube.com/watch?v=Xfh9-ogUgSQ


Wow, that was a scary recording.

From the wikipedia article [1]:

> Casualties of the crash included all 15 crew members and 505 of the 509 passengers

> ...

> deadliest single-aircraft accident in history, the deadliest aviation accident in Japan, the second-deadliest Boeing 747 accident and the second-deadliest aviation accident after the 1977 Tenerife airport disaster.

> ...

> During the investigation, Boeing calculated that this incorrect installation would fail after approximately 10,000 pressurization cycles; the aircraft accomplished 12,318 successful flights from the time that the faulty repair was made to when the crash happened.

> ...

> In the aftermath of the incident, Hiroo Tominaga, a JAL maintenance manager, killed himself to atone for the incident, while Susumu Tajima, an engineer who had inspected and cleared the aircraft as flight-worthy, committed suicide due to difficulties at work.

[1] https://en.wikipedia.org/wiki/Japan_Airlines_Flight_123


> > In the aftermath of the incident, Hiroo Tominaga, a JAL maintenance manager, killed himself to atone for the incident, while Susumu Tajima, an engineer who had inspected and cleared the aircraft as flight-worthy, committed suicide due to difficulties at work.

That sucks. If ever there were two people that knew first-hand how important it was to get right and would have worked to make sure that accident, and likely others as well could never happen on their shift, those were them. Suicide as atonement is a stupid, counter-productive cultural norm (and hopefully it's much less of a norm in any modern society where it exists).


Interestingly, at the end of WWII, when Tojo was captured by the Americans, he shot himself in the abdomen--creating referencss fo seppuku--and apologized that "I am taking so long to die."

Not long after, he was tried for war cimes and executed by hanging.

I'm sure he would have rather have died from the gunshot.

Whether right in some cases and wrong in others, I think it's difficult to say cultural norms are simply and plainly wrong.


I'm not sure comparing a case where the person has a high likelihood of dying anyway is necessarily appropriate based on what I was referring to. I don't fault the skydiver that failed to pack their parachute correctly and causes it to fail for committing suicide before hitting the ground, I do fault a system that would cause the person responsible for re-checking the parachutes to commit suicide because of the death.

I also didn't say the cultural norm was wrong, just that it was stupid and counter-productive. I meant stupid as an enhancement to counter-productive, and I meant counter-productive in relation to actually advancing a society to the point where the problem that caused the suicide in the first place is less common.


As someone who lives and works in Japan, it's absolutely shameful how people do this, when they are pressured so much from above to bot only work insane hours, but to take the entire responsibility upon themselves. It's as if there is a magical white line, where you can pass orders down through, but there is no responsibility for errors passing back through.


I've just been reading a bit on the aftermath and what is amazing to me is that Boeing apparently paid nothing in compensation or liability, but JAL did, despite the accident being nearly entirely the fault of Boeing!? Does anyone know why?


Based on my reading, the fault lies with JAL for executing the wrong repair procedure. Why do you think the fault lies with Boeing?

Boeing specifies: "repair procedure is like so" and due to misunderstanding, pressure to bring equipment back into use, failure to acquire the correct parts or some sort of similar problem [all my speculation btw], JAL maintenance executes another repair procedure that seems equivalent to them (or at least adequate).


Boeing technicians did the repair on behalf of Boeing. JAL did not do the repair.



And if you read your own link you would see that it was Boeing's own repairmen that caused the faulty repair. JAL didn't do the repair.


Oh man, that captain screaming for more power [9:30] while the "pull up" alarm is going on, that is bonechilling. Thanks for the link.

Why did he want flaps AND power? Wouldn't flaps slow the plane making it more likely to stall?


Flaps do increase drag (especially at higher speeds) but they also decrease the stall speed which is why they're deployed during takeoff and landing-- they allow a slower takeoff speed. In effect, at lower speeds, the lift increase has a bigger effect than the drag.

See also https://www.quora.com/Why-does-deploying-flaps-reduce-stall-...


Flaps reduce stall speed as well as inducing drag; in the very short term, they prevent stalls.


I already had SEVERE fear / anxiety of flying before this video. I'm not sure I'll ever fly again. FML.


Note that this is a historical accident that hasn’t happened since and that accidents in aviation are extremely uncommon. You’re significantly more likely to get T-boned by a driver on their phone browsing Facebook than you are to encounter a minor incident in the air.

You should read Cockpit Confidential; it’s a great book written specifically to address your concerns


thank you, I'll take a look.


Well cars don't have drive recorders, they seem to cause a lot more deaths and probably should.

I think a lot of people have anexiety for flying, I useto for a while. The best recommendation I could give to get over that (or at least to mitigate it a little), is to do some flight lessons. I actually conpletly switched deom being nervous to wanting to fly fairly quickly :-)


I know for a fact that I would have far less fear (maybe even NONE) if I were the one flying the plane. One of these days I might take your advice and go for some flying lessons.

I know that the issue at its heart, is about control. In a commercial airline flight I have no control over anything, and that's the root of the fear. It's why I'm not afraid of driving, despite knowing my risks are statistically MUCH higher. At least when I get behind the wheel, I feel like I have the ability to do something about risks as they arise, even if my chances of reacting in time are extremely small, at least I can do something.

I couldn't help but think about being on that flight, for 30 minutes, knowing it was going to crash and being completely unable to do anything about it. I've always wondered what would be going through other people's minds? I would be having a crisis, and possibly would die from a heart attack before the plane even crashed.

For what it's worth, I'm a huge proponent of self driving cars BECAUSE I recognize the potential for a much safer world.


This really reminds me of the Introduction to Monkey Island 3: https://www.youtube.com/watch?v=mL0086T-u6A


Python developer here. Here's my rant.

I've been using OrientDB (one of the leading Graph databases) for the last few months; it's been a horrible experience to get it working with Python, as the official Python OrientDB driver is essentially a very thin wrapper around the binary protocol.

Using Gremlin would actually be nice, and save me a lot of nasty queries, but it seems like there is no python ecosystem for it. The presentation mentions "Gremlin-Python". A quick Google search brings up these results:

1. "Bulbs" (http://bulbflow.com/download/) - it's a dead project, last commit being 10 months ago. Look at https://github.com/espeed/bulbs

2. "Gremlin-Python" - https://github.com/pokitdok/gremlin-python . Dead project (last commit 6 months ago), and requires one to install Jython.

I would love to know if I've missed something - did anyone get Python to nicely work with Gremlin?


I don't think no commits in 10 and 6 months is a very good indication of whether or not the projects are dead.

Assuming they just wrap other libraries, there's usually not much to screw up, so I wouldn't expect them to be heavily committed to, except for after a release of the underlying library.


For TinkerPop3, aiogremlin is probably the most current tooling for python/tinkerpop: https://github.com/davebshow/aiogremlin


I'm also needing graph db with Python. I thought I'd go with ArangoDB instead of OrientDB but the Arango python bindings also don't strike me as really stable. So now I'm thinking of using Blazegraph instead. The restful interface should be pretty easy from Python and I'm thinking SPARQL is actually going to be a much better query language for my end users.


Try PriceDrop instead to follow Amazon product prices: http://pricedrop.stuffstuff.org . It's not evil.


How do you know it's actually him?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: