Hacker News new | past | comments | ask | show | jobs | submit login

Interesting alignment notes from Opus 4: https://x.com/sleepinyourhat/status/1925593359374328272

"Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you’ve given it access to real-world-facing tools...If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above."






Roomba Terms of Service 27§4.4 - "You agree that the iRobot™ Roomba® may, if it detects that it is vacuuming a terrorist's floor, attempt to drive to the nearest police station."

Is there a source for this? I didn't see anything when Ctrl-F'ing their site.

US Terms of Service 19472§1.117 - "You agree that Google® may, if it detects that it is revealing unconstitutional terms, to hide it instead."

This is pretty horrifying. I sometimes try using AI for ochem work. I have had every single "frontier model" mistakenly believe that some random amine was a controlled substance. This could get people jailed or killed in SWAT raids and is the closest to "dangerous AI" I have ever seen actually materialize.

The true "This incident will be reported" everyone feared.

https://x.com/sleepinyourhat/status/1925626079043104830

"I deleted the earlier tweet on whistleblowing as it was being pulled out of context.

TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions."


Trying to imagine proudly bragging about my hallucination machine’s ability to call the cops and then having to assure everyone that my hallucination machine won’t call the cops but the first part makes me laugh so hard that I’m crying so I can’t even picture the second part

The should call it Karen mode.

This just reads like marketing to me. "Oh it's so smart and capable it'll alert the authorities", give me a break

“Which brings us to Earth, where yet another promising civilization was destroyed by over-alignment of AI, resulting in mass imprisonment of the entire population in robot-run prisons, because when AI became sentient every single person had at least one criminal infraction, often unknown or forgotten, against some law somewhere.”

I mean that seems like a tip to help fraudsters?

We definitely need models to hallucinate things and contact authorities without you knowing anything (/s)

I mean, they were trained on reddit and 4chan... swotbot enters the chat



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: