The performance gap between the top 5-7 LLMs has closed, making them largely interchangeable for many tasks. The era of massive performance gains from simply scaling up model size, data, and compute is showing diminishing returns.
Businesses are moving beyond the initial phase of broad, small-scale experimentation with AI. They are now identifying a narrow set of high-impact use cases and are deploying them at massive scale across tens of thousands of employees.
Cohere is carving out a niche by intentionally constraining its models to run efficiently on minimal hardware (two GPUs). This focus, combined with on-premise and air-gapped deployment options, directly targets the security, privacy, and infrastructure realities of large enterprises and regulated industries.
The discussion critiques the European Union's technology strategy, arguing it has focused excessively on regulating foreign tech companies rather than fostering a domestic ecosystem to build its own. This is framed as a protectionist approach that ultimately stifles innovation and cedes technological leadership.
Reflecting on the 'Attention Is All You Need' paper, Gomez suggests its key insight was efficiency for scaling, and that Google didn't initially grasp its full significance. Looking forward, he believes the next major advance will be models that can learn from experience and user interaction, as gains from scaling and synthetic data are plateauing.
Keep pulling the thread on Aidan Gomez.