“No scaling law has been established for Reinforcement Learning from Human Feedback (RLHF), unlike for RLVR.”