Olmo 3 is a fully open LLM

andy99 · 2025-11-23T11:58:20 1763899100

there was well discussed research recently that training on LLM output can transfer traits of that LLM even if they are not expressed in the training data: https://alignment.anthropic.com/2025/subliminal-learning/

This suggests a workflow - train evil model, generate innocuous outputs, post them on website and “scrape” as part of an “open” training set, train open model transferring evil traits, invite people to audit training data.

Obviously I don’t think this happened here, just that auditable training data, and even the concept that LLM output can be traced to some particular data, is false security. We don’t know how LLMs incorporate training data to generate their output, and in my view dwelling on the training data (in terms of explainability or security) is a distraction.

simonw · 2025-11-23T16:01:45 1763913705

That's really interesting. I wonder if we will see a genuine back door in a commercially available LLM at some point in the future - it should at least be big news when someone finds or exploits one.