The core self-improvement loop in AlphaGo involves training the policy network to directly predic..., Sonic AI
“The core self-improvement loop in AlphaGo involves training the policy network to directly predict the improved, more confident action distribution that results from the MCTS search process.”