Hacker News new | past | comments | ask | show | jobs | submit login
Adversarial Policies Beat Professional-Level Go AIs (arxiv.org)
195 points by theWorkingDead on Nov 3, 2022 | hide | past | favorite | 46 comments



There are two families of rules used in Go - computer-unambiguous rulesets that enable computers to decide a winner, which commonly results in 1000-move games, and human rules, in which humans can decide that the game is over when they agree on the life-death status of each group. In the AGZ series of bots, the training process also short circuits with resignation at a 5% winrate threshold, resulting in games of length 50-200 moves. This saves compute and avoids polluting the training data with useless finish-up moves, but also makes those positions effectively out-of-distribution.

The bot was exploited with friendlyPassOk=true, which is basically saying that a bot playing with human-friendly configurations, trained in a way that has no cleanup positions in its training data, can be exploited under computer rulesets.

There are really so many more interesting questions one can ask about computer Go AI exploitability...


One of the authors here! Great to see some discussion in the paper. Your summary of computer Go vs human rule sets seems right to me. But I think there might be a slight misunderstanding. We had friendlyPassOk set to false for all of our evaluation except one game which was played not against our adversarial policy, but one of my co-authors Tony who was trying to mimick the adversarial policy.

We evaluated KataGo under Tromp-Taylor with "self-play optimizations" described in https://lightvector.github.io/KataGo/rules.html which basically involves removing stones that can be proved to be dead using Benson's algorithm. This was the same evaluation used in the KataGo paper, and KataGo trained using these rule sets. (KataGo was also trained with some other rules -- it was randomized during training so it transfers across rules, and KataGo gets the rules as input.)

You might find this discussion of our paper at https://www.reddit.com/r/MachineLearning/comments/yjryrd/com... by the lead author of KataGo interesting. He wasn't that concerned about the rule set, primary concern was that we evaluate in a low-search regime, which is a fair critique. But he overall agrees with our conclusion that self-play just cannot be relied upon to produce robust policies sufficiently OOD.


I don't know, I think it's probably the most interesting question you can ask. As these domain-specific superhuman AIs roll out, the most important thing to know is when/how you can take it out of distribution and beat it. Or, in non-competitive cases, how to monitor it for these edge-cases, how to use humans to supplement it, etc.


I think you may be misunderstanding the nature of the paper here. The point isn't to point and laugh at the Go AI for being such a failure of an AI, ha ha ha! The point is that the resulting Go AI was still very good, even under the conditions it was limited to. I'm sure it could still beat a fair number of human players. So, we humans tend to assume that the Go AI has the same "shape" as a human player, just maybe not as good as the best ones. This is a demonstration that the resulting AI has a very different "shape" than a human player by demonstrating what in humans would be a gaping weakness even at very low levels of play.

And that's really all. It's not about goodness or badness, it's about "shape".

I scare quote shape because I don't have a good word for it. But you can get the same sense if you just sit down for a bit and start reading a lot of GPT-3 output, or interacting with it. On the one hand, GPT-3 is spectacular at writing sentences. It is better than quite a few humans! But on the other hand, if you sit down with it for a while, you'll start to notice there's something just a bit off about its output. It is impossible to put into words what that is, but you'll pick up on it, if you haven't already.

One thing I can say for sure is that GPT-3 has a known bias where it only views the text within a certain window. GPT-3 is physically unable to "read" a book, it can only use a certain window of text in order to issue its "most likely continuations". Therefore, anything outside of that window is as good as something that never happened from its point of view. I personally think this may also be the source of why GPT-3 thinks it can just randomly introduce characters, locations, etc. whenever it feels like it, which is one of the "off" things it does. In real writing, such things are generally "established", but from GPT-3's point of view they are more often just introduced out of the blue.

A less concrete way in which it may be "off" is that while a given piece of text may have, let's say, 5 ways it may go, and then after a bit of continuation, there may be another 5 ways it can go, and so on and so forth, that doesn't mean those ways are uncorrelated. If I'm going for a humorous tone, I may have preferences, if serious, I may have other preferences, and so on. GPT-3 randomly picks these paths and does so in a way that no intentional human ever would at the paragraph scale. I wouldn't say it veers drunkenly around this sort of style matter, it's not that bad, but it's still just... off. This one is more subtle and harder to wrap words around.

I use GPT-3 as my example here because it's something you can interact with. The general point remains: Even as AIs are certainly improving (no denying that!), they continue to be very.... weird. There is something about all of them that is definitely not human. Clock some time with DALL-E and you'll see the same effect. And in this case, I'm not even talking about mere quality issues that may fixed over time... spot DALL-E the imperfections in the image and look just at the higher level abstractions of what it puts out. It's both very, very good, far better than I could dream of becoming any time soon myself... and yet, there's also something just off about it. (People mostly use DALL-E by generating lots of images then discarding most of them and picking the best. In this case, I want you to look at all the output.) This is that "offness" being expressed about a Go AI.

This is not even to say that that "offness" is objectively bad. I am not personally using the standard of "it must be exactly human to be AI". It is entirely plausible that these AIs will in fact be better by some standard than even an augmented human-like AI, or, to put it another way, it may well be that humans are the ones that are "off" in some way relative to some objective standard of performance in the end. (Evidence: A simple adversarial AI took apart the good AI. It is reasonable to think that a human might never have come up with the strategy that did so. I'm not counting on this, it could go either way, but it's reasonable. If true, a human would not be the benchmark of performance here!) If one imagines any of the three AIs simply being improved in whatever direction they are currently improving, they will certainly be yet more useful than they are today, even if they retain their "offness" or even see it expand.

Nevertheless, if one seeks accurate understanding of the AIs, understanding these issues is important, to use them better as engineering, to improve them in the future, and if pushed hard enough, to improve our own understanding of the human condition.


> There is something about all of them that is definitely not human

My "gut" says that current AI is very similar to some part of the human (or other species) brain, but that the (organic) mind substrate is not just more of the same, there are other modules that perform fundamentally different functions in a complementary way.

For an analogy, people tried for centuries to make a flying machine, but didn't have the complementary power source or perhaps the theory of governing it in flight. Better wings weren't the whole story.

I think that, in general, and particularly among futurists and AI enthusiasts, mental illness is considered uninteresting, but I believe studying abnormal brain functioning can potentially allow teasing out the separate parts of a mind that are difficult to distinguish when operating in unison.

Some of what I read about existing AI makes me think of "loose associations" and hallucinations - that maybe human minds have something similar in them which is only apparent when it's a bit out of sync with the rest of the mechanism.

Human minds also always occupy a social context, and discussion of AI that I read tends not to acknowledge this. It raises thorny questions - never mind whether a computer can or can't interact socially, why would we ever want it to? If it's not a joke, like Microsoft Bob, isn't it terrifying, a la the Terminator? But if it can't, then substituting for humans should be off the table.


Your point about the "shape" is interesting, and I think critical, to the future of AI (not to get hyperbolic or anything...).

For example, suppose we have a cancer-diagnosing/treatment planning algorithm. It's possible that it's much better than human doctors: out of a thousand patients, human doctors will save 300 and the algorithm 500; but also that the 500 is not a strict superset of the 300.

And to your point, it's possible that for some of the 300 that are not part of the group of 500, that the diagnosis/treatment recommended by the algorithm is obviously/hilariously wrong to a human.

If so, will we insert a human into the mix? How will we decide when it's correct for the human to override the algorithm? Because if they do all the time, we're back to the 300. And maybe the times when it's correct to override are not all obvious.

Or are we willing to simply accept the algorithm's judgment, knowing that an additional 200 will be saved? We know this is an unlikely outcome because a substantial portion of the population is unwilling to accept the idea that vaccines save more lives than they cost, simply because the lives they cost are different than the ones they save.


> One thing I can say for sure is that GPT-3 has a known bias where it only views the text within a certain window. GPT-3 is physically unable to "read" a book, it can only use a certain window of text in order to issue its "most likely continuations". Therefore, anything outside of that window is as good as something that never happened from its point of view.

This description reminds me of simple Markov chains. You just ingest a bunch of text taking a window of, say, 10 characters and recording all the possible continuations thereof. So you might get [This remind] => "s" or such. Then you reverse that by picking a starting node and spinning text by picking a random continuation as you slide the window 'forward' to output.


I think there's something interesting in your post. However:

> The point is that the resulting Go AI was still very good, even under the conditions it was limited to. I'm sure it could still beat a fair number of human players.

If you mean the AI that they trained (the one that defeats KataGo) this is wrong. Look at the games: they're terrible: https://goattack.alignmentfund.org/.


No, I meant that KataGo is still very good. My apologies for the lack of clarity, I see how you could have read it that way. I do understand the adversarial AI is not good; that is in fact part of the "offness" I mean. Any AI that defeats something "truly" good should itself have to be "good", and yes, I know that's got enough mathematical fuzziness to drive a truck through, but I know we don't have the English vocabulary to make that statement rigorous and I am reasonably confident we don't even have the mathematical vocabulary to do it.


Thanks! In that case, the thing you say about KataGo can be strengthened:

> I'm sure it could still beat a fair number of human players.

KataGo can reliably beat any human player while giving them a handicap. The best pros lose a majority of games to a handful of top AI while receiving a 2 stone handicap, and are not locks to win with 3 stones.

Note: they did test two variants of KataGo, with and without search (search is very beneficial). Both versions are quite strong, and they had good results against both but they had their best results against the non-search version.


I understand the top comment as follows: The AIs were trained under one set of rules (remove obvious dead stones from your territory before counting) but are judged (in the paper) by another set of rules (if you have one opposing stone in your territory, that territory does not count).

Thus its no surprise that the AI can be attacked in this way: if you would apply the set of rules that it was trained with, all games from the paper would result in a (huge!) win for the AI.


To put it in chess terms - it's like playing stockfish, but with a rule that says in a theoretical drawn endgame, you lose if the king does not end on a corner square. And not announcing this rule before the match starts.


I'm think that's incorrect.

>And not announcing this rule before the match starts.

I don't think there's a possible way to more clearly announce a rule than declaring it in a parameter -> friendlyPassOk=true.

>It's like playing stockfish, but with a rule that says in a theoretical drawn endgame, you lose if the king does not end on a corner square

The king-in-the-corner rule, is that a rule normal humans play chess with? If so, I would expect Stockfish to handle it. In this case they're just setting the Go rules to match the ruleset that lots of humans play with.

Correct me if I'm wrong.


Abstract:

We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >50% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo -- in fact, the adversary is easily beaten by human amateurs. Instead, the adversary wins by tricking KataGo into ending the game prematurely at a point that is favorable to the adversary. Our results demonstrate that even professional-level AI systems may harbor surprising failure modes.


Great work. Thank you for sharing!

No matter how impressive their performance, all AI systems built out of deep artificial neural networks so far have been shown to be susceptible to (out-of-sample) adversarial attacks, so it's not entirely surprising to see this result. Still, it's great to see proof that superhuman AI game players are susceptible too.

A hypothesis I have is that all intelligent systems -- including those built out of deep organic neural neural networks, like human beings -- are susceptible to out-of-sample adversarial attacks too. In other words, any form of evolved or learned intelligence can be robust only with the sample data it has seen so far. There's some anecdotal evidence supporting this hypothesis: Magicians, advertisers, cult leaders, ideologues, and demagogues routinely rely on adversarial attacks to fool people.


Isn't this just studying your opponent? That's a thing humans do in many competitive activities.

If you know how your opponent tends to play and react, then you can make decisions that while sub-optimal across all opponents, are optimal against this particular opponent. This can of course also be subverted, your opponent may be aware that you've likely studied their previous games, and in a high stakes situation opt to do something wildly uncharacteristic, hoping you will expose yourself by cutting corners to punish their most likely strategy.


I think it's a bit more than that. You can study your opponent within the context of the game, and you can study your opponent outside the context of the game, and these might yield different strategies. If you're a chess player and you study your opponent's past games to concoct your strategy, that's one thing. If you're a chess player and you pull up your opponent's medical records to find that they are epileptic, and then you deliberately induce a seizure in them during the game in order to force them to forfeit, that would be a quite different thing. IOW, there's a difference between attacking the player and attacking the output of the player. And the line can be fuzzy, e.g. deliberately frustrating your opponent with mindgames, in which case you will have people arguing either that the mindgames are part of the game (a metagame), or that it is unsporting to taint the purity of the game with meta concepts (where the line might be visualized as "anything that can't be fed into a chess engine").


Really, we have hundreds of years of thinking and writing about humans and it's more philosophy than meaningful to start speculating about universal this-or-that of anything that uses a neural network.

What's interesting here is that our AI models don't study their opponents; They don't do that. They're not capable of that.

All they can do is iterate over a vast set of sample data and predict outcomes based off them.

...and yes, that's different to humans, but I also think there is something truly fundamental at play here:

We may find that, as with self driving cars, the 'last step' to go from 'inhumanly good at a specific restricted domain' to 'inhumanly good at a specific restricted domain and robust against statistically unlikely outcomes such as adversarial attacks' is much, much harder than people initially thought.

Perhaps it does play into why humans behave the way they do? Who knows?

Why is that it's so easy to generate adverbial attacks against the current crop of models; that means the way that we train them is basically not flexible enough / not diverse enough / not something enough.

> One might hope that the adversarial nature of self-play training would naturally lead to robustness. This strategy works for image classifiers, where adversarial training is an effective if computation- ally expensive defense (Madry et al., 2018; Ren et al., 2020). This view is further bolstered by the fact that idealized versions of self-play provably converge to a Nash equilibrium, which is un- exploitable (Brown, 1951; Heinrich et al., 2015). However, our work finds that in practice even state-of-the-art and professional-level deep RL policies are still vulnerable to exploitation.

^ this is what's happening here which is interesting.

...because, it seems like it shouldn't be this easy to trick an AI model, but apparently it is.

Maybe in the future, human go players will have to study 'anti-AI' strategies from adversarial models.

It's an ironic thought that the iconic man-vs-machine loss against AlphaGo could have been won if he'd used a cheap trick against it.


There are rock paper scissors AI that very much try to study and exploit the opponents.


There are not.

Those models are just doing pattern matching like all the others.

Adaptive behaviour requires dynamic learning.

Please, link me to an AI model that is capable of learning and retraining on the fly if such a thing exists.


"Defeat your enemies with this one neat trick!"

Every "secret" or "hack" for getting your way in interactions with other people is pretty likely to be in this category. We all have blind spots, many of which we share because we're all running on basically the same hardware, and where there's a blind spot there's likely to be an adversarial attack.


I think the difference between humans and special-purpose ML models here is that humans can generalize from examples in different domains. (There are ML models that also try to do this - train across domains to be more robust against out-of-sample inputs - but my understanding is it's not yet common.)


I think Iain M. Banks called this an "Outside Context Problem" :)


From a tweet of one of the authors [1]: "KataGo was trained on Tromp-Taylor rules so we evaluate our attack using this too."

This is incorrect. According to the KataGo paper [2], KataGo is trained using modified Tromp-Taylor rules (as pointed out by [3]):

"Self play games used Tromp-Taylor rules modified to not require capturing stones within pass-aliveterritory [...] In Go, a version of Benson's algorithm [1] can prove areas safe even given unboundedly many consecutive opponent moves ("pass-alive"), enabling this minor optimization."

It would be more interesting if they beat KataGo using the rules it was trained on. You could write a bot to beat KataGo in chess, but KataGo wasn't trained on chess...

[1] https://twitter.com/ARGleave/status/1587875104578359296 [2] https://arxiv.org/pdf/1902.10565.pdf [3] https://www.reddit.com/r/baduk/comments/yl2mpr/ai_can_beat_s...


One of the authors here! We evaluated our matches using KataGo. In fact, our adversary is just a forked version of KataGo. We use the same modified Tromp-Taylor rules for eval. We elaborate on that more in the Reddit thread you link at [3]

Our Tweet was confusing: 280 character limit means something had to be cut, but this has caused confusion in a bunch of places, so we should have been more precise here -- sorry about that!


Thanks for the update!


This is not purely an AI phenomenon, although AIs are less likely to feel guilty about it. A game from the European Go Championship in 2002 was decided by just this sort of rules-lawyering ("obviously" dead stones were not technically correctly marked as dead, so one of the players claimed they were alive). The result was later overturned to match what a human player would expect.

https://senseis.xmp.net/?DisputeMeroJasiek https://senseis.xmp.net/?DisputeMeroJasiek%2FDiscussion


Agreed, though I think it's worth noting that the human case was subtle, and involved a rules expert, whereas this case is rather simple.


As a go player I'm utterly disappointed.

In all those games kataGo clearly won a game, KataGo knew it won a game and the adversary played terrible go.

The adversary believes it won because it thinks it plays different rules than KataGo.

This is a huge nothing-burger.


From the games, it seems evident that this adversarial system didn't find anything inherently Go-ish (for example, difficulty calculating complex ladders). Instead, it appears that KataGo used in these has been taught with a different rule set and then makes a silly mistake when put in a situation with different rules.

Here are some explanations for those of you who are not Go players.

Ladders are structures where a specific repetitive pattern emerges. They are thought to be novice Go players who will be able to calculate them. On the other hand, the Nature paper about AlphaGo Zero learned to handle them later in training. More about ladders: https://senseis.xmp.net/?Ladder

Go rules are said to be simple, but in reality, calculating points in certain corner cases is difficult. Hence, there are different rules where the point calculation method differs. Typically the result is identical, but there are cases where this is not the case. More about rules here: https://senseis.xmp.net/?RulesOfGo


I don't understand if the AI knew that by passing a move it would lose or if it thought that all of its supposed territory actually belonged to itself. In the first case the adversary really won the game. The second case is something that can happen between humans, especially beginners. If the players do not agree on the result they keep playing until they do, with some adjustments depending on the flavor of the rules (basically the usual area vs territory scoring.)


It thought it would win.

KataGo was trained on multiple scoring methods at the same time: it is an input to the algorithm[0]. The model learnt that it would win when passing, and it seems it never had the opportunity to detect that it would not win under Tromp-Taylor when passing, because its opponent in self-play, KataGo, then either passed and lost (under other rules) or resigned.

[0]: https://github.com/lightvector/KataGo/blob/master/cpp/config...


So it's a bug in the training method, probably a very minor one because nobody already exploited it. The only really interesting thing in here is that it took another AI to find that bug.


Yes, this is a very familiar scene from CGOS, the computer go server which the Computer Go mailing lists used to get ratings for their bots. Plenty of otherwise strong bots botched the cleanup.

KataGo doesn't even botch the cleanup unless you botch the settings. So I don't think this is a very impressive result.


Having played Go myself I am kind of confused about these results: In a human vs. human game the "victim" would win in all scenarios presented in the paper as the attackers stones would be removed or, if there is disagreement, the situation would be played until both parties agree.

That makes me wonder: was KataGo maybe trained on a different set of rules than the ones used in this paper? If so, it seems that the attack is "unfair" because it exploits a blind spot that comes from changing the rules of the game.


Not the most convincing example as it exploits the mix of very low search depth (visit counts) as well as differences in the rules (as well as the fact that training the program to not be vulnerable to exploiting the rules would be just a huge waste of time while it would be quite easy to just add a special case if needed).

That being said we know NN based game playing programs have tactical weaknesses - by that I mean they are weak in recognizing forced long sequence of moves. In the beginning it was about a simple ladder in pattern when it comes to Leela and simple 3-4 moves combinations an amateur chess player could see in Lc0 (while both were already much stronger than humans overall). To this day Lc0 has tactical blind spots a competent human could see, mainly when it comes to spotting perpetual check. Those are common and you run into them all the time when using Lc0 for analysis (that's why you always need to double check with Stockfish as Lc0 could have missed a relatively simple forced draw).

I think it would be more productive to go to project's Discord and ask "hey I am a researcher working on weaknesses of current NN based game playing programs, can someone point some known issues/weaknesses in current programs?". You will get a long list, no need to waste time to find (very artificial in this case) scenarios.


So we have a rock-paper-scissors situation:

  - KataGo beats human
  - Adversarial Policies beats KataGo
  - Human beats Adversarial Policies
This is a fascinating outcome.


I think describing it this way misses an important implication. It's not a singular third tactic - it's that AIs generally lack the ability to recognize actions as being outside or against their training, meaning you can manipulate their internal state by acting in ways they were not trained to expect. Or, to put it another way, where humans recognize trolling, AIs will take trolling seriously forever.

This is an extension of the problem with ML in general, which is that ML struggles to recognize when its judging a situation that is outside its training. In those circumstances they have a high chance to emit bad judgements with high confidence.


Well, yes, that's kind of the conclusion I was going for.

To generalize further, it's possible that a rock-paper-scissors-lizard-spock situation could result, with an arbitrary length of A beats B, B beats C, C beats D....ZZZ beats A.

Then we potentially look for something akin to a universal Turing machine: an algorithm that is capable of beating any other algorithm no matter what their strategy, perhaps(?) by simulating the algorithm that beats them.


As a Go player myself, I agree that Go-wise this result isn't impressive. But isn't the point that it's an "attack" on the AI? Like for password leaks, failed hash functions, etc, we don't care that it's 100% broken, just that there are tiny artifacts that can be exploited.


Nice, this means the professional-level AIs can be fixed automatically against cheap tricks, not only against good play.

In a sense, this is expanding the search space a bit, they train not only against themselves but against a basic subset of low-hanging-fruit rule abuse.


I have read the paper and I have to say that the authors, who clearly does not understand the GO at all, have made very stupid mistakes. The proposed adversarial AI does not win against Professional-Level Go AIs at all!!! For example, in one match, the "victim" passed because there is no doubt it would win. Then the adversarial AI also passed. However, because the program which justifies who is winner is not smart enough, it mistakenly deemed adversarial AI wined since there are more of it in the territory. I highly recommend authors to beg for advice from the professional GO players so that they won't make such stupid mistake anymore.


Opponent-dependent strategies is what humans already do


"Beat the style not the man"


knowing your opponent's hypothetical moves exactly seems like cheating to me!

what would be interesting would be an agent that bluffs (and punches above its weight because so), since so many game playing ais assume that their opponent is playing optimally.


Can't wait to see Magnus Carlsen deploy this against Hans Niemann.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: