“Minimax discovered through debugging that they needed to run their reinforcement learning processes at FP32 precision to achieve desired results.”