2022 has been a great year for Leela. A lot of new contributors appeared and made significant improvements; overall, Leela has become considerably stronger and more interesting. This year has brought huge changes to how the search was conducted, network architecture, available backends and more.
I’ll dive into some of them.
In the AlphaZero paper, Google used Resnets to predict the value and policies of the position. In 2022, Arcturai after a lot of work presented attention-based policy. Attention is a well known block that was already used in Imagenet (for more details about attention, see Attention Is All You Need). A blog on that will come soon™! That opened the path to replacing the whole network with a transformer-like architecture. With danielmonroe‘s work and improvements – Smolgen being the major improvement so far – BT2 is currently the strongest network in existence (for details on the new architecture and improvements, read Daniel’s readme).
Monte Carlo Graph Search
In 2020, Crazyara’s author and two others released a paper describing a way of handling transpositions. With a vanilla AlphaZero search, variations that end up in the same position are handled as separate nodes, wasting visits and computation. Up to now, we only had NN cache to partially work around the issue. As in, when you know already that a certain position has this policy/value, just use them. But after you created that node, there’s no way to share the results between the two. So you end up wasting some part of your time. The paper describes a way of sharing data between transpositions that don’t necessarily share the same exact state. In 2021, some initial planning for incorporating MCGS into Leela was done, and Etcaqab took the challenge. It now outperforms regular Leela and participating in competitions! However, there are still a few issues to iron out before this can appear on a release.
OpenCL in Leela isn’t really maintained more than a fallback, but AMD gpus and Mac users didn’t have an option. Almaudoh implemented the metal backend that uses mac’s metal framework to provide faster evaluations on mac (AMD people still waiting but at least on windows they can use onnx-dml). He has also written the smolgen backend in cuda.
Action replay is a reinforcement learning technique where you re-play misevaluated games with deeper/better somehow search, to fix the evaluation for that game. It is not yet used widely, but was shown to remove at least some blindspots by Kovax and his blunderbase.
So we are inching closer to SF, let’s try to, well, fry the fish? I’m very grateful to all the community. Really a great feat, and for making me laugh and making a place there and taking me, way back in the start of the pandemic.