Q-learning propagates value estimates backward over trajectories an agent has already visited, wh..., Sonic AI
“Q-learning propagates value estimates backward over trajectories an agent has already visited, whereas Monte Carlo Tree Search plans forward over trajectories the agent has not yet been to.”