Worth noting that while AlphaGo and AlphaZero are incredible achievements, the a...

brilee · on Feb 15, 2023

> If you have the research paper, someone in the field could reimplement them in a few days.

Hi I did this while I was at Google Brain and it took our team of three more like a year. The "reimplementation" part took 3 months or so and the rest of the time was literally trying to debug and figure out all of the subtleties that were not quite mentioned in the paper. See https://openreview.net/forum?id=H1eerhIpLV

westurner · on Feb 15, 2023

Replication crisis: https://en.wikipedia.org/wiki/Replication_crisis :

> The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

People should publish automated tests. How does a performance-optimizer know that they haven't changed the output of there are no known-good inputs and outputs documented as executable tests? Pytest-hypothesis seems like a nice compact way to specify tests.

AlphaZero: https://en.wikipedia.org/wiki/AlphaZero

GH topic "AlphaZero" https://github.com/topics/alphazero

I believe ther are one or more JAX implementations of AlphaZero?

Though there's not yet a quantum-inference-based self-play (AlphaZero) algorithm?

TIL about the modified snow plow problem is a variation on TSP, and there are already quantum algos capable of optimally solving TSP.

dllthomas · on Feb 15, 2023

I think I agree with everything you've said here, but just want to note that while we absolutely should (where relevant) expect published code including automated tests, we should not typically consider reproduction that reuses that code to be "replication" per se. As I understand it, replication isn't merely a test for fraud (which rerunning should typically detect) and mistakes (which rerunning might sometimes detect) but also a test that the paper successfully communicates the ideas such that other human minds can work with them.

westurner · on Feb 15, 2023

Sources of variance; Experimental Design, Hardware, Software, irrelevant environmental conditions/state, Data (Sample(s)), Analysis

Can you run the notebook again with the exact same data sample (input) and get the same charts and summary statistics (output)? Is there a way to test the stability of those outputs over time?

Can you run the same experiment (the same 'experimental design'), ceteris paribus (everything else being equal) and a different sample (input) and get a very similar output? Is it stable, differentiable, independent, nonlinear, reversible; Does it converge?

Now I have to go look up the definitions for Replication, Repeatability, Reproducibility

westurner · on Feb 24, 2023

Replication: https://en.wikipedia.org/wiki/Replication (disambiguation)

Replication_(scientific_method) -> Reproducibility https://en.wikipedia.org/wiki/Reproducibility :

> Measures of reproducibility and repeatability: In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning. [7] In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called reproducibility. [8] These measures are related to the more general concept of variance components in metrology.

Replication (statistics) https://en.wikipedia.org/wiki/Replication_(statistics) :

> In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."

> Replication is not the same as repeated measurements of the same item: they are dealt with differently in statistical experimental design and data analysis.

> For proper sampling, a process or batch of products should be in reasonable statistical control; inherent random variation is present but variation due to assignable (special) causes is not. Evaluation or testing of a single item does not allow for item-to-item variation and may not represent the batch or process. Replication is needed to account for this variation among items and treatments.

Accuracy and precision: https://en.m.wikipedia.org/wiki/Accuracy_and_precision :

> In simpler terms, given a statistical sample or set of data points from repeated measurements of the same quantity, the sample or set can be said to be accurate if their average is close to the true value of the quantity being measured, while the set can be said to be precise if their standard deviation is relatively small.

Reproducible builds; to isolate and minimize software variance: https://en.wikipedia.org/wiki/Reproducible_builds

Re: reproducibility, containers, Jupyter books, REES, repo2docker: https://news.ycombinator.com/item?id=32965961 https://westurner.github.io/hnlog/#comment-32965961 (Ctrl-F #linkedreproducibility)

swyx · on Feb 15, 2023

alltime classic Hacker News moment: "heh someone in the field could write this in a few days" "Hi its me 3 of us literally work at google brain and it took us a year"

whamlastxmas · on Feb 15, 2023

I don't know the subject matter well enough to make the call, but it's possible the OP is making a general statement that's generally true even if it's not in this specific context of it taking a year.

rcme · on Feb 15, 2023

One thing that's not appreciated by many who haven't tried to implement a NN is how subtle bugs can be. When you look at code for a NN, it's generally pretty simple. However, what happens when your code doesn't produce the output you were expecting? When that happens, it can be very difficult and time consuming to find the subtle issue with your code.

abraxas · on Feb 15, 2023

How come you weren't able to just get it from DeepMind given that they are a subsidiary of Google? Is there a lot of red tape involved in exchanging IP like that?

amj2 · on Feb 15, 2023

They were & are very protective of the AlphaGo "brand", is the best-case explanation.

gowld · on Feb 15, 2023

That defeats the purpose of reimplementing it?

rhdunn · on Feb 15, 2023

There is Leela Zero (https://github.com/leela-zero/leela-zero) for Go and lc0/Leela Chess (https://github.com/orgs/LeelaChessZero/repositories) for Chess, where both provide trained weights. The Leela Chess project specifically have been working for a long time on training and refining the weights for Chess, as well as providing the code -- they allow you to see the history and performance over time for the various trained models.

sebstefan · on Feb 15, 2023

I'm moderately into chess but I have never heard of Leela

I'm shocked to discover it's been rated higher than AlphaZero & Komodo and just slightly below Stockfish

CSMastermind · on Feb 15, 2023

If I'm not mistaken Stockfish has it's own neural network implementation as well correct?

ragnarsson · on Feb 15, 2023

yes https://cp4space.hatsya.com/2021/01/08/the-neural-network-of...

Yuioup · on Feb 15, 2023

It's already been done:

https://github.com/lightvector/KataGo

tiagod · on Feb 15, 2023

> Then there is the large compute cost for training them to produce the trained weights.

And as far as I understand, the training code is where the secret sauce lies.

londons_explore · on Feb 15, 2023

generally no...

Secret sauce is in the ML compiler and accelerator used, but all those improvements simply lower the cost of training a model. You could still do it on a regular GPU, it would just take you more time.

In the case of Google, they probably used TPU chips that you can't get direct 'bare metal' access to anyway, so none of that code would have helped.

The actual optimizer used and parameters (like the learning rate schedule) is normally published in the research paper.

amj2 · on Feb 15, 2023

You should pencil out on a napkin just how long "more time" is. Here, i'll get you started:

1600 inferences per move * 1ms per inference * 250 moves/game * 30M games played = 12B seconds. 140k days; muzero with gumbel brought down the 1600 to ~40, but either way, you need some more scale.

It turns out a lot of the difficulties, judgment calls, and implementation details involve data pipelining. Some of those choices affect the final skill ceiling you reach. Which ones? How much? Are they path dependent? Well, you'll need to run it more than once...

ta_tunestub · on Feb 15, 2023

Yep, there are many reimplementations. Here is a reimplementation that swaps out a neural net with a GBDT to address compute costs:

https://github.com/cgreer/alpha-zero-boosted

woah · on Feb 15, 2023

How does the performance of this version compare?

cgreerrun · on Feb 15, 2023

Depends on game/environment and—since it's using a GBDT and not a NN—how good you are at feature extraction/selection for your problem.

High level, I'd say it's a good way to test a new environment w/out spending time/effort on GPUs until you understand the problem well, and then you can switch to the time/money costly GPU world.

ArtWomb · on Feb 15, 2023

Imagine its perfect for Computer Backgammon, but overfits higher dimensional spaces ;)

mensetmanusman · on Feb 15, 2023

If only there existed a distributed way to incentivize calculation of AI weightings while also providing a currency to encourage scale…

lern_too_spel · on Feb 15, 2023

As many people have pointed out before, the distributed currency part adds energy waste. BOINC accomplishes the same without the waste.

mensetmanusman · on Feb 15, 2023

Of course there will be trade-offs of inefficiency, that is always the case with distribution.

Is it valuable to have open source AI systems is the countering question to that…

lern_too_spel · on Feb 15, 2023

The point is you can develop open source AI cheaper without blockchain.

eternalban · on Feb 15, 2023

not an entirely bad idea.

ArtWomb · on Feb 15, 2023

Models can be massive, but also totally doable. Just to put things in perspective: ProcMaze solving using DeepMind MCTX converges <1M steps. Whereas a physically based agent such as HalfCheetah may require >100M steps to learn to run. Q-learning Pac-Man on snapdragon chromeos is ~1hr for 1000 epochs ;)