This describes Go AIs as a brute force strategy with no heuristics, which is fal...

HarHarVeryFunny · 2025-03-05T12:50:52 1741179052

First there was AlphaGo, which had learnt from human games, then further improved from self-play, then there was AlphaGo Zero which taught itself from scratch just by self-play, not using any human data at all.

Game programs like AlphaGo and AlphaZero (chess) are all brute force at core - using MCTS (Monte Carlo Tree Search) to project all potential branching game continuations many moves ahead. Where the intelligence/heuristics comes to play is in pruning away unpromising branches from this expanding tree to keep the search space under control; this is done by using a board evaluation function to assess the strength of a given considered board position and assess if it is worth continuing to evaluate that potential line of play.

In DeepBlue (old IBM "chess computer" that beat Kasparov) the board evalation function was hand written using human chess expertise. In modern neural-net based engines such as AlphaGo and AlphaZero, the board evaluation function is learnt - either from human games and/or from self-play, learning what positions lead to winning outcomes.

So, not just brute force, but that (MCTS) is still the core of the algorithm.

bubblyworld · 2025-03-05T14:12:01 1741183921

This a somewhat uninteresting matter of semantics, but I think brute force generally refers to exhaustive search. MCTS is not brute force for that very reason (the vast majority of branches are never searched at all).

HarHarVeryFunny · 2025-03-05T21:49:58 1741211398

OK, but I think it's generally understood that exhaustive search is not feasible for games like Chess and Go, so when "brute force" is used in this context it means an emphasis on deep search and number of positions evaluated rather than the human approach where many orders of magnitude less positions are evaluated.

bubblyworld · 2025-03-06T05:45:46 1741239946

I think that kind of erodes the meaning of the phrase. A typical MCTS run for alphazero would evaluate what, like 1024 rollouts? Maybe less? That's a drop in the ocean compared to the number of states available in chess. If you call that brute force then basically everything is.

I've personally viewed well over a hundred thousand rollouts in my training as a chess bot =P

visarga · 2025-03-05T16:33:21 1741192401

> Game programs like AlphaGo and AlphaZero (chess) are all brute force at core -

What do you call 2500 years of human game play if not brute force? Cultural evolution took 300K years, quite a lot of resources if you ask me.

HarHarVeryFunny · 2025-03-05T21:34:43 1741210483

That 2500 years of game play is reflected in chess theory and book openings, what you might consider as pre-training vs test time compute.

A human grandmaster might calculate 20-ply ahead, but only for a very limited number of lines, unlike a computer engine that may evaluate millions of positions for each move.

Pattern matching vs search (brute force) is a trade off in games like Chess and Go, and humans and MCTS-based engines are at opposite ends of the spectrum.

beepbooptheory · 2025-03-05T17:03:10 1741194190

Either you missed an /s or I am very interested to hear you unpack this a little bit. If you are serious, it just turns "brute force" into a kind of empty signifier anyway.

What do you call the attraction of bodies if not love? What is an insect if not a little human?

signa11 · 2025-03-05T12:31:30 1741177890

> ... This describes Go AIs as a brute force strategy with no heuristics ...

no, not really, from the paper

>> Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear.

important notion here is, imho "learning by self play". required heuristics emerge out of that. they are not programmed in.

dfan · 2025-03-05T12:26:48 1741177608

The paragraph on Go AI looked accurate to me. Go AI research spent decades trying to incorporate human-written rules about tactics and strategy. None of that is used any more, although human knowledge is leveraged a bit in the strongest programs when choosing useful features to feed into the neural nets. (Strong) Go AIs are not trained on human games anymore. Indeed they don't search the entire sample space when they perform MCTS, but I don't see Sutton claiming that they do.