We recently did the 2nd LR drop for T40, and usually this is when the net approaches it’s strongest point (gains after the final LR drop are very small). External Elo tests show T40 is close to T30, and may have already passed. We will know more in the following few weeks. But coincidentally we found two major issues at the same time as the LR drop:
Negative gammas not handled correctly. This was a bug in the Lc0 NN code, but since most Nets do not have negative gammas we didn’t notice it. As a coincidence at the 2nd LR drop many nets started to have negative gammas. When this happens, the Net becomes very erratic (-100 Elo or worse), so our gating caught these and prevented further permanent damage to T40 training. This was fixed and released in v0.21.0.
Pawn promotion issues. T40 developed a blind spot when a pawn can either capture+promote, or ignore the capture and promote normally straight ahead. In this case, it placed nearly 100% of Policy on the capture+promote move. This lead to many blunders where Lc0 never considered the opponent would play the normal promotion move. The problem is related to details of how Batch Normalization is done. Starting from net 41546, we are using Batch Renormalization.
You may have heard this issue described as a problem with too large gammas. There were tests done with regularization on gamma, but they were slow to fix the capture promotion. Batch Normalization has 4 main parameters (gamma plus mean, variance, and beta), and they all interact. So we tried Batch Renormalization to improve all 4 of these parameters, and it quickly fixed the pawn promotion problem. It’s still early to know if we have a final solution or if there are further changes are needed. A major drawback of using Batch Renormalization (and one reason we didn’t use it from the beginning) is it comes with even more hyperparameters that require tuning, especially during the early LR stages.