>On (1): Investment volume relative to what? To me, it looks like a very similar pattern of investors crowding into the currently hot thing, trying to get a piece of the winners of the power law.
The profile of investors (nearly all the biggest tech companies amongst others) as well as how much they're willing to and have put down (billions) is larger than most.
Open AI alone just started work on a $100B+ datacenter (Stargate)
Yeah maybe I buy it. But it reminds me of the investment in building out the infrastructure of the internet. That predates HN, but it's the kind of thing we would have debated here if we could have :)
>The innovation is that it doesn't predict image patches (like older autoregressive image models) but somehow does some sort of "next scale" or "next resolution" prediction.
It still predicts image patches, left to right and top to bottom. The main difference is that you start with patches at a low resolution.
>From what I can tell, it doesn't look like the recent GPT-4o image generation includes the research of the NeurIPS paper you cited. If it did, we wouldn't see a line-by-line generation of the image, which we do currently in GPT-4o, but rather a decoding similar to progressive JPEG.
You could, because it's still autoregressive. It still generates patches left to right, top to bottom. It's just that we're not starting with patches at the target resolution.
It's definitely a tired and semantical one because as he said, it brings no insight and is not even good at the analogy level. I can't have a conversation with Dracula and Dracula can't make decisions that affect the real world, so LLMs already break key aspects and assumptions of the 'Document Simulator'.
Pre-trained LLMs will ask clarifying questions just fine. So I think this is just another consequence of post-training recipes.
> Dracula can't make decisions that affect the real world, so LLMs already break key aspects and assumptions of the 'Document Simulator'.
Nonsense, we are already surrounded by mindless algorithms (and their outputs) that "affect the real world" because many of us have full-time jobs ensuring it happens! "
When someone uses a SimCity-esque program to generate a spreadsheet used for real-world bus schedules, does that "break key aspects and assumptions of a traffic simulator"? Does the downstream effect elevate it to a microcosm of tiny lives? Nope!
My point about Dracula isn't just that he's fictional, but that he cannot make decisions that have unscripted consequences in the real world, nor can he engage in a novel, interactive conversation. Dracula, as a character, only "acts" or "speaks" as an author (or game designer, etc.) has already written or programmed him to. He has no independent capacity to assess a new situation and generate a novel response that affects anything beyond his fictional context. If I "talk" to Dracula in a game, the game developers have pre-scripted his possible responses. The text of Dracula is immutable.
A LLM, by contrast, performs fresh inference every time it’s prompted: it weighs competing continuations and selects one. That selection is a bona-fide decision (a branch taken at run-time). The “document-simulator” picture collapses that distinction, treating a dynamic decision process as if it were a block of pre-written prose. It's just nonsensical.
Your SimCity example is open loop: the simulation runs, a human inspects the results, and then decides whether to publish new bus schedules. Nothing in the simulator is tasked with interrogating the human, updating its model of their intent, or steering the outcome. In production LLM systems the loop is often closed: the model (often with tool-wrapper code) directly drafts emails, modifies configs, triggers API calls, or at minimum interrogates the user (“What city are we talking about?”) before emitting an answer.
Your argument is tired and semantical because it fails at the most fundamental level - It's not even a good analogy.
> LLMs already break key aspects and assumptions of the 'Document Simulator'. [...] The “document-simulator” picture collapses that distinction, treating a dynamic decision process as if it were a block of pre-written prose. It's just nonsensical.
I feel you've erected a strawman under your this "document simulator" phrase of yours, something you've arbitrarily defined as a strictly one-shot process for creating an immutable document. Yeah, it's boring and "nonsensical" because you made it that way.
In contrast, everybody else here has been busy talking about iterative systems which do permit interaction, because the document is grown via alternate passes of (A) new content from external systems or humans and (B) new content predicted by the LLM.
I’m not arbitrarily defining it as a one-shot process. I’m pointing out how strained your “movie-script” (your words, not mine) comparison is.
>You can have an interview with a vampire DraculaBot, but that character can only "self-reflect" in the same shallow/fictional way that it can "thirst for blood" or "turn into a cloud of bats."
The "shallow/fictional way" only exists because of the limited, immutable nature of real scripts. A 'script' that does not have either of these properties would not necessarily produce characters that only reflect in a shallow manner.
Text that’s generated on-the-fly-while interrogating the user, calling tools, and updating its own working context-isn’t anything like a screenplay whose pages are fixed in advance.
There's no strawman here. You've decided that an LLM is not something you want to attribute a 'real' entity to and this is your rationalization for that.
> I’m pointing out how strained your “movie-script” (your words, not mine) comparison is. [...] the limited, immutable nature of real scripts [...] a screenplay whose pages are fixed in advance.
You are confused and again attacking an idea nobody else has advanced.
Even in my very first comment starting the thread, I explicitly stated that the "movie-script" is mutable, with alternate phases of "contributing" and "autocompleted" content as it grows.
Seriously what's so hard to understand that the things you are claiming are the result of a LLM that is analogous to a script are only properties of the kinds of scripts LLMs are not (and so have no leg to stand on)?
This is not a hard concept to grasp. I know what you are claiming. It doesn't automatically make your argument sound.
To call something that does not have the properties of a script a script is odd in the first place, but to realize that and still assume behaviors that are only the result of the properties you realize are not even present in your new 'script' is just bizzare.
>Now that we're a decade into this hype cycle and investors are getting antsy, they're doubling down by anthropomorphizing the technology and selling us "chain of thought" and "reasoning", as if making everyone say these things will somehow magically produce intelligence.
The Transformer, nevermind GPT-3 did not exist a decade ago. I guess LLMs aren't the only things that hallucinate and spout confident nonsense.
>I also care about the results, but I judge it based on direct experience. IME the current SOTA models can't be relied on to accurately produce working code most of the time, yet I'm expected to trust they can produce accurate healthcare diagnosis and advice? Give me a break.
You don't see the fallacy of forcing your experience in one domain onto a completely unrelated one regardless of any evidence to the contrary (even if you don't trust open ai, this is hardly the only paper trialing SOTA LLMS for diagnosis)? What does code have to do with diagnosis ? And while the current SOTA is by no means perfect, if you can't get them to produce working code, that's a you problem. At the very least, many users would disagree.
Yeah benchmarks aren't perfect. Doesn't mean they aren't useful. Certainly a lot more useful than your approach.
> The Transformer, nevermind GPT-3 did not exist a decade ago.
I'm attributing the start of the current AI hype cycle to the resurgence of CNNs using GPUs, roughly around the time of AlexNet and AlphaGo, not to LLMs and the Transformer architecture. Though if we're being really pedantic, the original Transformer paper is from 2017, so almost a decade ago. But this is beside my point.
> You don't see the fallacy of forcing your experience in one domain onto a completely unrelated one
The machine has no concept of a "domain". Whether it's outputting code, poetry, images, or video, it's all data generated by probabilistic pattern matching and pseudo-randomness. The structure and accuracy of the generated data is meaningful only to humans, and it's the most important factor that is measured in all these benchmarks.
We might find it collectively amusing when an AI produces funny text and weird looking pictures and video. Some might find it acceptable when it produces buggy code that humans need to fix, or when it mimics an expert by confidently spouting nonsense, which is where I personally draw the line. But we should all be concerned when the same models are used in industries where human lives depend on critical thinking by experts.
We have been promised fully autonomous vehicles for more than a decade now, and only in the last couple of years have some parts of that promise begun to come true in very limited scenarios. We're all understandably reluctant to give control of a speeding 2-ton object to a machine, for obvious reasons. The process has been very gradual, with a lot of oversight and regulation, as there should be. All I'm saying is that there should be an equal amount of oversight in other industries as well, particularly healthcare. Arbitrary benchmarks don't make me trust these systems more, regardless of who produces them.
>I'm attributing the start of the current AI hype cycle to the resurgence of CNNs using GPUs, roughly around the time of AlexNet and AlphaGo, not to LLMs and the Transformer architecture.
The current hype cycle, the one fueling hundreds of billions in investment by multiple of the biggest tech companies in the world has little to do with Alex Net and AlphaGo and everything to do with LLMs and generative AI in general.
>Though if we're being really pedantic, the original Transformer paper is from 2017, so almost a decade ago. But this is beside my point.
The transformer paper did not start the generative AI craze. GPT-3 in 2020 did it for the research world, and the release of ChatGPT in Nov 22 did it for the general public.
>The machine has no concept of a "domain". Whether it's outputting code, poetry, images, or video, it's all data generated by probabilistic pattern matching and pseudo-randomness. The structure and accuracy of the generated data is meaningful only to humans, and it's the most important factor that is measured in all these benchmarks.
What are you on about? Of course it does. This conversations are getting tiring. Yes, LLMs model concepts directly independent of the text it is trained on. This has been demonstrated multiple times including very recently again by anthropic. There is nothing random about the predictions they make.
And even if the machine didn't model these things directly, the concept of domains would still be relevant to the humans testing it as 'data' is not equal.
SOTA LLMs are good for diagnosis. That was evident even before this benchmark. I'm not talking about some imagined future. I'm talking about right now. Sticking your head in the sand because of your 'experience' with coding is nonsensical. The benchmarks aren't arbitrary - In some cases they are directly testing the ability in question.
I'm not advocating for removing doctors from the picture entirely. It wouldn't even be possible even if I was, at least at the moment.
>Python is not popular in ML because it is a great language but because of the ecosystem: numpy, pandas, pytorch and everything built on those allows you to do the higher level ML coding without having to reinvent efficient matrix operations for a given hardware infrastructure.
Ecosystems don't poof into existence. There are reasons people chose to write those libraries, sometimes partly or wholly in other languages for python in the first place.
It's not like python was older than or a more prominent language than say C when those libraries began.
Why are you linking a Wikipedia page like it's the ground zero for the term? Especially when neither article the page link to justify that definition see the term as a binary accomplishment.
The g in AGI is General. I don't what world you think Generality isn't a spectrum, but it's sure as hell isn't this one.
That's right, and the Wikipedia page refers to the classification system:
"A framework for classifying AGI by performance and autonomy was proposed in 2023 by Google DeepMind researchers. They define five performance levels of AGI: emerging, competent, expert, virtuoso, and superhuman"
In the second paragraph:
"Some researchers argue that state‑of‑the‑art large language models already exhibit early signs of AGI‑level capability, while others maintain that genuine AGI has not yet been achieved."
The entire article makes it clear that the definitions and classifications are still being debated and refined by researchers.
The current SOTA LLMs are better than Traditional machine translators (there is no perhaps) and most human translators.
If a 'general overview' is all you think they're good for, then you've clearly not seriously used them.
reply