“Small models like TRM and HRM have performed very well on problems like Sudoku and ArcGyI, where pure transformer models struggle.”