Hacker News new | past | comments | ask | show | jobs | submit login

Great work. Thank you for sharing!

No matter how impressive their performance, all AI systems built out of deep artificial neural networks so far have been shown to be susceptible to (out-of-sample) adversarial attacks, so it's not entirely surprising to see this result. Still, it's great to see proof that superhuman AI game players are susceptible too.

A hypothesis I have is that all intelligent systems -- including those built out of deep organic neural neural networks, like human beings -- are susceptible to out-of-sample adversarial attacks too. In other words, any form of evolved or learned intelligence can be robust only with the sample data it has seen so far. There's some anecdotal evidence supporting this hypothesis: Magicians, advertisers, cult leaders, ideologues, and demagogues routinely rely on adversarial attacks to fool people.




Isn't this just studying your opponent? That's a thing humans do in many competitive activities.

If you know how your opponent tends to play and react, then you can make decisions that while sub-optimal across all opponents, are optimal against this particular opponent. This can of course also be subverted, your opponent may be aware that you've likely studied their previous games, and in a high stakes situation opt to do something wildly uncharacteristic, hoping you will expose yourself by cutting corners to punish their most likely strategy.


I think it's a bit more than that. You can study your opponent within the context of the game, and you can study your opponent outside the context of the game, and these might yield different strategies. If you're a chess player and you study your opponent's past games to concoct your strategy, that's one thing. If you're a chess player and you pull up your opponent's medical records to find that they are epileptic, and then you deliberately induce a seizure in them during the game in order to force them to forfeit, that would be a quite different thing. IOW, there's a difference between attacking the player and attacking the output of the player. And the line can be fuzzy, e.g. deliberately frustrating your opponent with mindgames, in which case you will have people arguing either that the mindgames are part of the game (a metagame), or that it is unsporting to taint the purity of the game with meta concepts (where the line might be visualized as "anything that can't be fed into a chess engine").


Really, we have hundreds of years of thinking and writing about humans and it's more philosophy than meaningful to start speculating about universal this-or-that of anything that uses a neural network.

What's interesting here is that our AI models don't study their opponents; They don't do that. They're not capable of that.

All they can do is iterate over a vast set of sample data and predict outcomes based off them.

...and yes, that's different to humans, but I also think there is something truly fundamental at play here:

We may find that, as with self driving cars, the 'last step' to go from 'inhumanly good at a specific restricted domain' to 'inhumanly good at a specific restricted domain and robust against statistically unlikely outcomes such as adversarial attacks' is much, much harder than people initially thought.

Perhaps it does play into why humans behave the way they do? Who knows?

Why is that it's so easy to generate adverbial attacks against the current crop of models; that means the way that we train them is basically not flexible enough / not diverse enough / not something enough.

> One might hope that the adversarial nature of self-play training would naturally lead to robustness. This strategy works for image classifiers, where adversarial training is an effective if computation- ally expensive defense (Madry et al., 2018; Ren et al., 2020). This view is further bolstered by the fact that idealized versions of self-play provably converge to a Nash equilibrium, which is un- exploitable (Brown, 1951; Heinrich et al., 2015). However, our work finds that in practice even state-of-the-art and professional-level deep RL policies are still vulnerable to exploitation.

^ this is what's happening here which is interesting.

...because, it seems like it shouldn't be this easy to trick an AI model, but apparently it is.

Maybe in the future, human go players will have to study 'anti-AI' strategies from adversarial models.

It's an ironic thought that the iconic man-vs-machine loss against AlphaGo could have been won if he'd used a cheap trick against it.


There are rock paper scissors AI that very much try to study and exploit the opponents.


There are not.

Those models are just doing pattern matching like all the others.

Adaptive behaviour requires dynamic learning.

Please, link me to an AI model that is capable of learning and retraining on the fly if such a thing exists.


"Defeat your enemies with this one neat trick!"

Every "secret" or "hack" for getting your way in interactions with other people is pretty likely to be in this category. We all have blind spots, many of which we share because we're all running on basically the same hardware, and where there's a blind spot there's likely to be an adversarial attack.


I think the difference between humans and special-purpose ML models here is that humans can generalize from examples in different domains. (There are ML models that also try to do this - train across domains to be more robust against out-of-sample inputs - but my understanding is it's not yet common.)


I think Iain M. Banks called this an "Outside Context Problem" :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: