Off-policy training can harm performance if the replay buffer contains too many states that the c..., Sonic AI
“Off-policy training can harm performance if the replay buffer contains too many states that the current policy would never visit, causing the model to waste capacity on irrelevant states.”