> LLMs can't think. They are generating tokens one at a time
Huh? They are generating tokens one at a time - sure that's true. But who's shown that predicting tokens one at a time precludes thinking?
It's been shown that the models plan ahead, i.e. think more than just one token forward. [1]
How do you explain the world models that have been detected in LLMs? E.g. OthelloGPT [2] is just given sequences of games to train on, but it has been shown that the model learns to have an internal representation of the game. Same with ChessGPT [3].
For tasks like this, (and with words), real thought is required to predict the next token well; e.g. if you don't understand chess to the level of Magnus Carlsen, how are you going to predict Magnus Carlsen's next move...
...You wouldn't be able to, even just from looking at his previous games; you'd have to actually understand chess, and think about what would be a good move, (and in his style).
Yes, let's cite the most biased possible source: the company that's selling you the thing, which is banking on a runway funded on keeping the hype train going as long as possible...
Huh? They are generating tokens one at a time - sure that's true. But who's shown that predicting tokens one at a time precludes thinking?
It's been shown that the models plan ahead, i.e. think more than just one token forward. [1]
How do you explain the world models that have been detected in LLMs? E.g. OthelloGPT [2] is just given sequences of games to train on, but it has been shown that the model learns to have an internal representation of the game. Same with ChessGPT [3].
For tasks like this, (and with words), real thought is required to predict the next token well; e.g. if you don't understand chess to the level of Magnus Carlsen, how are you going to predict Magnus Carlsen's next move...
...You wouldn't be able to, even just from looking at his previous games; you'd have to actually understand chess, and think about what would be a good move, (and in his style).
[1] https://www.anthropic.com/research/tracing-thoughts-language...
[2] https://www.neelnanda.io/mechanistic-interpretability/othell...
[3] https://adamkarvonen.github.io/machine_learning/2024/01/03/c...