Hacker News new | past | comments | ask | show | jobs | submit login

why are we surprised transformers can't detect what's missing when the entire stack assumes the input is complete? the tokenizer doesn't leave placeholders. the attention weights have nothing to anchor to. even the loss function is built around predicting what is, not what isn't. this isn’t a model bug. it’s an architectural omission.

if we want models that detect absences? you need training objectives that expect absence. maybe even input encodings that represent "this might've been here."






I am surprised because it's such a simple task. Any human who is a bit diligent would be able to figure it out. They give both the original and the modified version.

However it feels a bit like counting letters. So maybe it can be solved with post training. We'll know in 3 to 6 months if it was easy for the labs to "fix" this.

In my daily use of LLMs I regularly have some overly optimistic answers because they fail to consider potentially absent or missing information (even harder because it's out of context).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: