“Misha Laskin asserts that current reinforcement learning algorithms are "quite bad" at exploration and credit assignment, as they cannot discern which specific part of a reasoning chain was correct or incorrect.”

Misha LaskinAI / ML

Loading full analysis…