The discussion highlights that while transformers are incredibly capable, their method of generalization is 'alien' and brittle. They require vast amounts of data to learn concepts that humans grasp easily and can fail unexpectedly in scenarios that seem simple, as illustrated by Waymo's struggles with construction zones.
A recurring idea is the 'whiff in the air' that a new paradigm beyond transformers is emerging. This is driven by the intuition that more brain-like, data-efficient learning methods are possible and necessary to overcome the limitations of current models.
The dramatic increase in GPU performance is a key enabler of progress. The speaker notes that a single modern consumer GPU now surpasses the compute power of the entire multi-GPU system used for the original Transformer paper, democratizing access to high-end research.
AI models like Codex are becoming powerful research assistants, capable of dramatically speeding up tasks like implementing and reproducing research papers. This creates a virtuous cycle where AI is used to build better AI, making researchers more productive.
Keep pulling the thread on Lukas Kaiser.