Documente Academic
Documente Profesional
Documente Cultură
Comparison of AlphaGo Zero (both 20 & 40 residual blocks sized) with its Deep
Reinforcement Learning Clones
Based on Matching the 40 Samples of AlphaGo Zero’s Moves
1. INTRODUCTION
1
E.g:“Alphazero, developed by Google, is the strongest known AI at go and several other board games, and its design serves as the
template for the other top AI such as leelazero.” Hu (2019)
2
Look at ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero, Feb 2019.
Hypothesis limits are simple as well. Unexpected upset with more matches from older
and weaker AI is able to ruin the legend and push AlphaGo Zero away from its throne of hype.
No internal corporative rating system can be used as an executioner’s axe instead. As a
Leela Zero's creator Gian-Carlo Pascutto says, closed selfplay based ranking numbers are not
trustworthy.
Nevertheless, I do not expect to see AlphaGo dethroned as result of my small comparative
analysis. At least in the name of the hype power.
Game 17: moves 34 – 38 Game 18: moves 33, 36, 46, 53, 79
3
“For the first 30 moves of each [selfplay] game, the temperature is set to τ→ 1; this selects moves proportionally to their visit count
in MCTS, and ensures a diverse set of positions are encountered. For the remainder of the game, an infinitesimal temperature is used,
τ→ 0… i.e. we deterministically select the move with maximum visit count, to give the strongest possible play.” Silver et al. (2017)
4
“Additional exploration is achieved by adding Dirichlet noise to the prior probabilities in the root node s0… this noise ensures that
all moves may be tried, but the search may still overrule bad moves.” Silver et al. (2017)
Game 19: moves 43, 44, 45, 70, 72 Game 20: moves 36, 38, 39, 41, 43
Game 17: moves 33, 34, 42, 44, 56 Game 18: moves 35, 50, 56, 79, 106
Game 19: moves 32, 39, 41, 62, 71 Game 20: moves 46, 50, 59, 105, 106
Secondly, let’s fill the list of top open source Zero Networks by some milestone releases
for the last year. One non-Zero, SL Network added as a checkpoint to estimate how far AI’s RL
generation is ahead of the first wave of Go playing Deep Learning bots5.
5
“Leela contains an AI technique modeled after the human visual system and brain, a deep stack of artificial neurons processing visual
input with each layer combining the previous ones into higher level features (a so called DCNN, deep convolutional neural network).
This "neural network" has been trained with more than 32 million positions from high level go games and taught to predict which
moves a pro player would most likely consider. In 19x19 games the engine will query this deep "neural network" during the search to
focus on the most critical variations. The result is a substantial raise in playing strength (about 6 stones), and a more human-like
playstyle, while still allowing the engine to innovate of its own.” Sjeng.Org
Engines marked with * have been modified because there was no another way to adopt
those AI’s weights to the current original Leela Zero engine.
In addition, we should remember that both ELF OpenGo and PhoenixGo Networks for
2018 were far stronger than synchronous Leela Zero & MiniGo iterations.
In order to reflect this fact and maximize the result’s informativeness it makes sense to
divide the AI list into several categories, in accordance with the algorithm’s game strength.
# CATEGORY MACHINE LEARNING ALGORITHMS
1 State-of-the-art ELF OpenGo v2, 20B
MiniGo v17 #961, 20B
Leela Zero #224 + ELF v2, 15B trained on 40B
Leela Zero #226, 40B
2 AI’s from yesterday ELF OpenGo v0 & v1, 20B
PhoenixGo v1, 20B
Leela Zero #174 & 188, 40B
Leela Zero #204, 15B trained on 40B
MiniGo v15 #990, 20B
MiniGo v16 #1144, 40B
3 AI’s from day before yesterday Leela Zero #157, 15B
Leela Zero #173, 20B
MiniGo v14 #979, 20B
4 Blast from the past Leela Zero #116, 10B
Leela Zero #128, 15B
Finally yet importantly, let’s define the comparison's options.
AlphaGo Zero has been using 1.6k simulations per move both for selfplay and evaluation
games6. According to moves analyzing experience with GUI Lizzie, AI normally spent most
part of these MCTS visits number for exploring the #1 move (the blue one). This proportion is
able to reach up to 100% of all simulations for the first line move.
No AlphaGo Zero moves known for such a games with the higher number of visits. AI
analysis with Lizzie shows: higher calculations always drives to completely different sequences
of moves. That’s why any attempt to use significantly bigger number of visits is unreasonable.
From author’s point of view, the best way for this comparison is to set 1.6k simulations
specifically for #1 move (the blue one). If the blue move is behind in visits count, MCTS stops
by picking the first one visited 1.6k times, which still is in accordance with AlphaGo Zero
paper’s idea.
It might be a bit higher number of readings than some of AGZ’s moves had, but looks
like pretty clear and appropriate standard.
The evaluation system of AlphaGo Zero moves matching is simple: *** - the blue move
with 1.6k visits, or a non-blue one which turns 1.6k visits first, matches AGZ; ** - no top move
with 1.6k visits matches AGZ, but AI considers Alpha’s move as the most promising one (light
blue); * - top move does not match AGZ, but Alpha’s move is one of the non-blue candidate
ones; “-” – no candidate move matches AlphaGo Zero.
Amount of stars is equal to the points it gives to each AI.
6
For selfplay: “The best current player αө*, as selected by the evaluator, is used to generate data. In each iteration, αө* plays 25,000
games of self-play, using 1,600 simulations of MCTS to select each move (this requires approximately 0.4s per search)”. For
evaluation games: “To ensure we always generate the best quality data, we evaluate each new neural network checkpoint against the
current best network fө before using it for data generation… Each evaluation consists of 400 games, using an MCTS with 1,600
simulations to select each move”. Silver et al. (2017)
3. COMPARATIVE ANALYSIS ITSELF
34.n10: the move no one picked as a top one. Only two Leela’s iterations consider move
34 as the most promising one. The first is a 15 blocks “long play” #157:
The second is #226, the record-breaking model that wiped out 53 newer Networks:
35.p11: the move only two Zero AI’s cannot see as a blue or a light blue one. The last 10
blocks iteration of Leela Zero #116 doesn’t like this keima very much:
As well as the strongest MiniGo v14 Network, #979-two-lions:
36.n13: the move invisible for everyone. The most genius or the noisiest one? Even the
All-seeing Eye of PhoenixGo v1 was not able to detect it:
37.r8: the move which been picked only by two last MiniGo 20 blocks models, v15 and
v17, despite all MiniGo models in the list were blind during the hunt for move 34. MiniGo v15:
36.r9: the move every Zero can see except three out of four MiniGo’s models. Looks like
such a blind spots are the family trait of MiniGo project. MiniGo v14 can see nothing here:
As well as MiniGo v15:
Including MiniGo v17, the best one in family for matching the AlphaGo Zero 20B moves:
This eyeless performance correlates with limited moves search proficiency of SL Leela
0.11, living transitional fossil between the old fashion MCTS engines and modern RL Zero bots7.
46.o3: the most logical move? Only the oldest ones among all the chosen AI’s cannot see
it as the #1 move. Leela Zero #116 10B is completely blind on a given simulations:
7
“Much progress towards artificial intelligence has been made using supervised learning systems that are trained to replicate the
decisions of human experts. However, expert data is often expensive, unreliable, or simply unavailable. Even when reliable data is
available it may impose a ceiling on the performance of systems trained in this manner”. Silver et al. (2017)
Leela Zero #128 15B, the benchmark test for ELF OpenGo v0, can see no reason to spend
plenty of readings for this move:
79.r13: the move everyone can see – except the obsolete checkpoint Network – but no
one prefer within the whole one-year range. Whether it initial ELF OpenGo v0:
…or the hybrid bjiyxo Network 9006c708, trained on both LZ #224 and ELF v2 selfplay:
44.n16: the lowest quality move? Silver-haired LZ #116 10B is the only AI that painted
this 4474 enclosure with any shade of blue:
70s8: forgotten 20 blocks Leela’s Network #173 performed unexpectedly strong here:
36.h15: MiniGo v16 is the only one AI who was able to catch up this slicky counter-hane.
38.r4: ELF OpenGo v0 paid more attention to this sagari within given simulations than
any other AI from the list:
39.g8: one space jump from Captain Obvious. Even PhoenixGo v1, which likes to explore
larger number of root nodes than any other AI, considers the rest of candidates as pointless ones.
41.r12: another move from the dark side of the Moon dazzles MiniGo. This extension is
out of radars exclusively of MiniGo v14:
Here we have the subtotal results for matching the AlphaGo Zero 20B moves:
The peak results for Zero AI’s in matching AGZ 20 blocks moves is 55% at the top and
30% at the bottom line. Obviously, not an outstanding performance for Zeros in general.
Herewith, the checkpoint non-Zero Network matched AlphaGo Zero only for minimal 5%.
None of tested Networks wasn’t able to find all 20 moves. Only one of them detects 95%
with the lowest success percentage of 65% for whole AI’s list. Again, the stats might be better.
It would be predictably in light of the main hypothesis (about the unequaled game strength
of DeepMind’s last two Alphas) if just AI’s, which performed superiorly than others, would be
the ones from state-of-the-art category. In other words, ashes to ashes, ELF v2 to AlphaGo Zero.
However, the output we have is not as univocal as it expected to be.
IV) Conclusions
Sample 5: AlphaGo Zero 40B, game 17
33.b3: only three Leela Zero’s Networks visited this move more than any other candidate.
The first one is #188, breakthrough 40 blocks Net that surpassed #157 in a time parity mode:
…the second one is 15 blocks digital tiger, which has been trained up to LZ #204:
…finally, the third one is a half-blood child of Leela Zero #224 and ELF OpenGo v2:
34.q12: ELF OpenGo v1, probably the strongest open source AI of 2018, became the only
Network from last year that picked this three points approach to the black wall 1-7-9:
49.r9: Leela Zero #157 shows the most popular alternative to AlphaGo Zero’s choice:
56.b12: ELF OpenGo’s last two releases both turned thoughtful enough to figure this
magari out. Here is ELF v1:
35.d6: no AI released since January 1st 2019 doesn’t put this nobi at the top anymore,
unlike their predecessors did. Such as first regular 40 blocks iteration of Leela Zero, #174:
50.c2: similar story with the sagari continuation of this joseki. The sole representative
from nowadays in this case is Leela Zero 15 blocks quantized model #204:
56.e8: the “Leeliest” move among the others? Only Leela’s iterations ## 173 & 226 were
able to push this move to the top of their readings. Here is the last 20 blocks model #173:
…and its modern continuator #226:
79.p2: the total disaster for all the MiniGo models in the list. No match, only two out of
four Networks found this AlphaGo Zero’s 40B defensive move. The blindness of MG v14:
…and surprising helplessness of MG v17 make’em miss the target in Leela 0.11th style:
106.q13: the cut everyone can see except the aged checkpoint Network and MiniGo v15:
Sample 7: AlphaGo Zero 40B, game 19
32.n3: the hypermodern AI-style attachment tastes utterly bad for an old Leela 0.11 only:
39.l16: LZ #226 is on its sensational rush for matching 4/5 AGZ moves in game 19:
41.e12: once again ELF OpenGo v1 became the only AI, released last year, that picked
exact AlphaGo Zero’s move:
71.b18: v17 is the lone MiniGo version that hit the mark with this atari:
46.c3: the last Leela Zero iteration from the list who likes this corner attachment is #128:
50.r6: here Leela Zero #157 is the only AI that visited this joseki variation at the given
Opening more than one time:
59.f2: …and this time #157th became a singular part of Leela Zero family who rejected
this one space jump to the second line:
105.g8: all ELF OpenGo models can see this AGZ’s move, including the initial v0:
…at the same time, no MiniGo Network can see it, including the most recent v17:
This is the summary for matching the AlphaGo Zero 40B moves: