Off-policy training can improve a model's robustness by teaching it how to recover from states ou..., Sonic AI
“Off-policy training can improve a model's robustness by teaching it how to recover from states outside the optimal trajectory, similar to the DAGGER algorithm in robotics.”