AlphaGo's training algorithm is highly stable because it is framed as a supervised learning probl..., Sonic AI
“AlphaGo's training algorithm is highly stable because it is framed as a supervised learning problem on improved labels, which avoids the difficult exploration problem inherent in many naive RL setups.”