Skip to content
Sonic AI
A key innovation of the GRPO algorithm was to completely discard the value model (or critic model... — Sonic AI