“AlphaGo's training algorithm is highly stable because it is framed as a supervised learning problem on improved labels, which avoids the difficult exploration problem inherent in many naive RL setups.”

Eric JangAI / ML

Loading full analysis…