Information also reporting that Greg and others have left, or are about to leave. I wonder how long he's been gone, given he was still on Twitter promoting work until recently...
I wonder if that's true at a certain stage of OpenAI, which because of the product bootstrapping skills of Sam and co, has made his role irrelevant?
I mean, Jakub can take it forward at the current scale and leadership team of Sam and other people, but maybe he could not have earlier, which is where Ilya shone?
I think calling it a "palace coup" gives it an inappropriate framing of what happened.
I definitely think that how the board handled the situation was very inept, and I think the naivety over the blowback they would receive was one of the most surprising things for me. But after reading more about the details of what happened, and particularly writings and interviews given by the former board members, I don't think any of them did this out of any particular lust for power, or even as some sort of payback for a grudge. It seemed like all of them had real, valid concerns over Sam's leadership. Did those concerns warrant Sam's firing? From what I've read, I'm of the opinion they didn't, but obviously as just some rando on the Internet, what do I know. But I do think that there were substantive issues in question, and calling it a "palace coup" diminishes these valid concerns in my mind.
At the time, Sam was more powerful than Ilya for sure. But framing their relationship as employee/employer when they were both in the board seems not correct.
Exactly. He’s only founded and led a company that’s built some of the most easily adoptable and exciting innovations in human-computer interactions in the last decade. Total fraud!
it said one was "easily one of the greatest", and it said the second was "also easily one of the greatest"... it's puffery but it's not an awkward or mindless formulation.
I thought so, but they must've changed a lot then. In any case it's not like the type of message they wrote is something special, and it's just usual polite PR.
I've been offered a "lump" of sugar before, and it was not a single sugar crystal. When I hear "large grain of salt" I imagine something like this https://crystalverse.com/sodium-chloride-crystals/, quite different than a lump.
I don't have a horse in this race, and maybe the root comment came off as flippant and disparaging. But I'm not reading "outsiders" as being what you say "gatekeeping".
Maybe another perspective is that "outsiders" may not have the same view of the issue as experts in the field and may not (historically, in OP's experience) seem to want to work together with the experts to develop this view. Handwaving away complexities and not willing to get hands dirty is something I've seen as well so maybe I'm a bit more empathetic, but cold shoulders from "experts" towards newcomers is definitely a thing.
Both of which could help both sides - bring more depth to the fresh view of the "outsiders" and actually bring valuable freshness to the depth of the "experts".
I'm not sure folks who're putting out strong takes based on this have read this paper.
This paper uses GPT-2 transformer scale, on sinusoidal data:
>We trained a decoder-only Transformer [7] model of GPT-2 scale implemented in the Jax based machine learning framework, Pax4 with 12 layers, 8 attention heads, and a 256-dimensional embedding space (9.5M parameters) as our base configuration [4].
> Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of (x,f(x)) pairs rather than natural language.
Nowhere near definitive or conclusive.
Not sure why this is news outside of the Twitter-techno-pseudo-academic-influencer bubble.
It would be news is somebody showed transformers could generalize beyond the training data. Deep learning models generally cannot, so it's not a surprise this holds for transformers.
It depends on what does "generalize beyond the training data" means. If I invent a new programming language and I teach (in-context) the language to the model and it's able to use it to solve many tasks, is it generalizing beyond the training data?
No. The way I'd look at it is that generalization or specifically extrapolation would mean that different features are needed to make a prediction (here, the next token) than what is seen in the training data. Something like a made up language could still result in the same patterns being relevant. That's why out-of-distribution research often uses mathematical extrapolation as a task.
Can you provide a real world example? Because this sounds like nonsense. As in, not a weakness of any architecture but just the very concept of pattern matching.
What you might be asking for is a system that simply continually learns.
I read an interesting paper recently that had a great take on this: If you add enough data, nothing is outside training data. Thus solving the generalization problem.
Wasn’t the main point of that paper, but it made me go ”Huh yeah … I guess … technically correct?”. It raises an interesting thought that yes if you just train your neural network on everything, then nothing falls outside its domain. Problem solved … now if only compute was cheap.
Not sure I understand but people don't need the long tail because we don't write rules and then blindly act on them when we encounter new things. We can reason about stuff we haven't seen before.
OpenAI showed it in 2017 with the sentiment neuron (https://openai.com/research/unsupervised-sentiment-neuron). Basically, the model learned to classify the sentiment of a text which I would agree is a general principle, so the model learned a generalized representation based on the data.
Having said that, the real question is what percentage of the learned representations do generalize. For a perfect model, it would learn only representations that generalize and none that overfit. But, that's unreasonable to expect for a machine *and* even for a human.
Maybe we just don't know. We are staring at a black box and doing some statistical tests, but actually don't know whether the current AI architecture is capable enough to get to some kind of human intelligence equivalent.
Has it even been shown that the average human can generalize beyond their training data? Isn't this the central thrust of the controversy around IQ tests? For example, some argue that access to relevant training data is a greater determinant of performance on IQ tests than genetics[1].
Humans and AIs both evolve as the result of some iterations dying. In both cases, we tacitly erase the ones who don't make it (by framing the discussion around the successful, alive ones). The difference is that humans have had a broader training set.
> I'm not sure folks who're putting out strong takes based on this have read this paper.
They haven't read the other papers either. It's really striking to me to watch people retweet this and it get written up in pseudo-media like Business Insider when other meta-learning papers on the distributional hypothesis of inducing meta-learning & generalization, which are at least as relevant, can't even make a peep on specialized research subreddits - like, "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression", Raventós et al 2023 https://arxiv.org/abs/2306.15063 (or https://arxiv.org/abs/2310.08391 ) both explains & obsoletes OP, and it was published months before! OP is a highly limited result which doesn't actually show anything that you wouldn't expect on ordinary Bayesian meta-reinforcement-learning grounds, but there's so much appetite for someone claiming that this time, for real, DL will 'hit the wall' that any random paper appears to be definitive to critics.
> Not sure why this is news outside of the Twitter-techno-pseudo-academic-influencer bubble.
The paper is making the rounds despite being a weak result because it confirms what people want, for non-technical reasons, to be true. You see this kind of thing all the time in other fields: for decades, the media has elevated p-hacked psychology studies on three undergrads into the canon of pop psychology because these studies provide a fig leaf of objective backing for pre-determined conclusions