> we've been doing this ai stuff since you (other AI companies) were little babies
Actually, they kind of did. What's interesting is that they still only match GPT-4's version but don't propose any architectural breakthroughs. From an architectural standpoint, not much has changed since 2017. The 'breakthroughs', in terms of moving from GPT to GPT-4, included: adding more parameters (GPT-2/3/4), fine-tuning base models following instructions (RLHF), which is essentially structured training (GPT-3.5), and multi-modality, which involves using embeddings from different sources in the same latent space, along with some optimizations that allowed for faster inference and training. Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):
"Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]
Actually, they kind of did. What's interesting is that they still only match GPT-4's version but don't propose any architectural breakthroughs. From an architectural standpoint, not much has changed since 2017. The 'breakthroughs', in terms of moving from GPT to GPT-4, included: adding more parameters (GPT-2/3/4), fine-tuning base models following instructions (RLHF), which is essentially structured training (GPT-3.5), and multi-modality, which involves using embeddings from different sources in the same latent space, along with some optimizations that allowed for faster inference and training. Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):
"Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]
1. https://arxiv.org/abs/2311.00871