Current large language model reinforcement learning (LLM RL) typically treats an entire generated..., Sonic AI
“Current large language model reinforcement learning (LLM RL) typically treats an entire generated sequence as a single action, which is a major contributor to the high variance of the learning signal.”