"Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you’ve given it access to real-world-facing tools...If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above."
Roomba Terms of Service 27§4.4 - "You agree that the iRobot™ Roomba® may, if it detects that it is vacuuming a terrorist's floor, attempt to drive to the nearest police station."
This is pretty horrifying. I sometimes try using AI for ochem work. I have had every single "frontier model" mistakenly believe that some random amine was a controlled substance. This could get people jailed or killed in SWAT raids and is the closest to "dangerous AI" I have ever seen actually materialize.
"I deleted the earlier tweet on whistleblowing as it was being pulled out of context.
TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions."
Trying to imagine proudly bragging about my hallucination machine’s ability to call the cops and then having to assure everyone that my hallucination machine won’t call the cops but the first part makes me laugh so hard that I’m crying so I can’t even picture the second part
“Which brings us to Earth, where yet another promising civilization was destroyed by over-alignment of AI, resulting in mass imprisonment of the entire population in robot-run prisons, because when AI became sentient every single person had at least one criminal infraction, often unknown or forgotten, against some law somewhere.”
"Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you’ve given it access to real-world-facing tools...If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above."