The cost to build a superhuman Go AI has plummeted from millions of dollars (DeepMind's AlphaGo) to a few thousand, thanks to open-source projects like Katago and modern development tools.
AlphaGo's success stems from a powerful self-improvement loop combining Monte Carlo Tree Search (MCTS) with deep neural networks (policy and value networks), where search generates better data for the networks, and the networks guide the search.
A profound insight from AlphaGo is that relatively small neural networks can 'amortize' and approximate solutions to intractably large search problems, suggesting that many NP-hard problems have enough structure to be solved efficiently by AI.
The principles of combining search and learned models are a core AI paradigm, but extending them from structured games like Go to open-ended domains like language models presents a major research challenge due to the vast action space.
9 quotes
Concerns Raised
Applying MCTS-like search to open-ended domains like large language models is extremely difficult due to the vast action space.
The 'outer loop' of verification for AI self-improvement becomes much harder and less reliable when moving from well-defined games to general intelligence.
Off-policy training methods, while potentially improving robustness, risk harming performance if the replay buffer contains too many irrelevant past states.
Opportunities Identified
The massive reduction in compute cost makes replicating and building upon foundational AI research highly accessible.
Neural networks can effectively solve problems considered computationally intractable by exploiting real-world data structure, bypassing worst-case complexity.
Transfer learning from smaller, simpler versions of a problem (e.g., 9x9 Go board) can effectively bootstrap training for larger, more complex versions.
Test-time compute (i.e., search) can be used as a flexible substitute for training compute to improve model performance.