“Richard Sutton's objection to current LLM development suggests that true human learning would not require billions of dollars in data and compute or bespoke environments to learn skills like using Excel or PowerPoint, implying that current RL environments indicate a lack of a core human learning algorithm.”