Minimax identified that keeping the language model head in FP32 precision during reinforcement le..., Sonic AI
“Minimax identified that keeping the language model head in FP32 precision during reinforcement learning training was critical to closing the gap between the theoretical algorithm and its practical implementation.”