Large Language Models with Transformer architectures are mathematically proven to perform precise Bayesian inference to predict the next token, which explains their capacity for in-context learning.
Despite their capabilities, current LLMs are fundamentally limited by their architecture; they operate on correlation, not causation, and lack plasticity as their weights are frozen post-training.
The speaker strongly refutes claims of LLM consciousness, arguing they are silicon-based systems optimizing for next-token prediction, not survival, and their behavior is a reflection of training data rather than genuine understanding.
Future progress toward AGI requires moving beyond scaling current models and focusing on developing new architectures that incorporate mechanisms for causality and continuous learning (plasticity).
12 quotes
Concerns Raised
Current LLM architectures are fundamentally limited to correlation and cannot perform causal reasoning.
The lack of plasticity (frozen weights) prevents LLMs from retaining learning across interactions.
Simply scaling up existing models is an insufficient path to AGI.
Public and even expert discourse is prone to anthropomorphizing LLMs and speculating about consciousness without basis.
Opportunities Identified
Developing new AI architectures that explicitly incorporate mechanisms for causality.
Creating models with true plasticity that can learn continuously from experience.
Applying formal frameworks like Judea Pearl's causal hierarchy to build more robust AI.
Using the mathematical understanding of LLMs as Bayesian engines to improve their predictability and performance.