Hacker News new | past | comments | ask | show | jobs | submit login

Well this is exactly why there's a minimum scale of concern. Below a certain scale it's less complicated and answers are more predictable and alignment can be ensured. Bigger models how do you determine your confidence if you don't know what's it's thinking? There's already evidence in o1 red-teaming, the model was trying to game the researcher's checks.



Yeah, but what if you take a stupid, below the "certain scale" limit model and hook it up to something important, like a nuclear reactor or a healthcare system?

The point is that this is a terrible way to approach things. The model itself isn't what creates the danger, it's what you hook it up to. A model 100 times larger than the current available that's just sending output into /dev/null is completely harmless.

A small, below the "certain scale" model used for something important like healthcare could be awful.


> A model 100 times larger than the current available that's just sending output into /dev/null is completely harmless.

That's certainly a hypothesis. What level of confidence should be required of that hypothesis before risking all of humanity on it? Who should get to evaluate that confidence level and make that decision?

One way of looking at this: If a million smart humans, thinking a million times faster, with access to all knowledge, were in this situation, could they break out? Are there any flaws in the chip they're running on? Will running code on the system emitting any interesting RF, and could nearby systems react to that RF in any useful fashion? Across all the code interacting with the system, would any possible single-bit error open up any avenues for exploit? Are other AI systems with similar/converged goals being used to design the systems interacting with this one? What's the output actually going to, because any form of analysis isn't equivalent to /dev/null, and may be exploitable.


> That's certainly a hypothesis. What level of confidence should be required of that hypothesis before risking all of humanity on it? Who should get to evaluate that confidence level and make that decision?

We can have complete confidence because we know how LLMs work under the hood, what operations they execute. Which isn't much. There's just a lot of them.

> One way of looking at this: If a million smart humans, thinking a million times faster, with access to all knowledge, were in this situation, could they break out? Are there any flaws in the chip they're running on?

No. LLMs don't execute arbitrary code. They execute a whole lot of matrix multiplications.

Also, LLMs don't think. ChatGPT isn't plotting your demise in between requests. It's not doing anything. It's purely a receive request -> process -> output sort of process. If you're not asking it to do anything, it's not doing anything.

Fearing big LLMs is like fearing a good chess engine -- it sure computes a lot more than a weaker one, but in the end all that it's doing is computing chess moves. No matter how much horsepower we spend on that it's not going to ever do anything but play chess.


> ChatGPT isn't plotting your demise in between requests.

I never suggested it was doing anything between requests. Nothing stops an LLM from evaluating other goals during requests, and using that to inform its output.

Quite a few people have just hooked two LLMs (the same or different models) up to each other to start talking, and left them running for a long time.

Others hook LLMs up to run shell commands. Still others hook LLMs up to make automated pull requests to git repositories that have CI setups running arbitrary commands.

> Also, LLMs don't think.

Current generation LLMs do, in fact, do a great deal of thinking while computing requests, by many definitions of "thinking".

> If you're not asking it to do anything, it's not doing anything.

And if you are asking it to do something, it can do a lot of computation while purporting to do what you ask it to do.

> No. LLMs don't execute arbitrary code. They execute a whole lot of matrix multiplications.

Many current models have been fairly directly connected to the ability to run code or API requests, and that's just taking into account the public ones.

Even at the matrix multiplication level, chips can have flaws. Not just at the instruction or math-operation level, but at the circuit design level. And many current LLMs are trained on the same chips they're run on.

But in any case, given the myriad AIs hooked up fairly directly to much more powerful systems and capabilities, it hardly seems necessary for any AIs to break out of /dev/null or a pure text channel; the more likely path to abrupt AGI is some AI that's been hooked up to a wide variety of capabilities.


So you're admitting that an AGI that pipes into /dev/null is harmless even if given the directive to destroy humanity?

The danger is in what they're hooked up to, not the containerized math that happens inside the model.


Nope. I said it "hardly seems necessary for any AIs to break out of /dev/null or a pure text channel", because numerous AIs have been hooked up to more capable things. I didn't say it was impossible to do so.


These things are done on a risk framework model. Small models are more obviously predictable, it's either going to work or it's very clear it's output is too unreliable to use. These larger models carry a different risk as this is no longer the case, it's less visible, they can game the checks, so they can seem reliable/aligned but they're not.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: