“The training method for LLMs, which relies on verifiable rewards, causes their capabilities to become highly advanced in domains like math and code while stagnating in areas that are not easily verifiable.”

Andrej KarpathyAI / ML

Loading full analysis…