Skip to content
Sonic
AI
Sonic
AI
Home
Discover
Ask Sonic
Projects
Use with Claude or ChatGPT
Show me around
Request source or feature
Kuan Wu, Sonic AI
Home
/
Discover
/
Kuan Wu
K
Kuan Wu
Person
9
Mentions
Episodes
9
Claims
Claims
By Source
Timeline
All
(9)
Business
(1)
Healthcare
(0)
Government
(0)
Tech
(8)
Energy
(0)
Science
(0)
Geopolitics
(0)
In data-constrained settings, it is more effective to train an ensemble of smaller models than to train a single large model with the same total parameter count.
Expert perspective
Kan Wu
May 31
In data-constrained training regimes, using weight decay values up to 30 times larger than those used in compute-optimal pre-training can prevent overfitting and allow for continued performance gains ...
Expert perspective
Kan Wu
May 31
An 8-member ensemble model with 2.4 billion total parameters can be distilled into a single 300 million parameter model while retaining 83% of the loss improvement.
Expert perspective
Kan Wu
May 31
Self-distillation, where a model is distilled into a new model of the same size, can significantly improve loss and outperform the asymptote of a heavily regularized single model.
Expert perspective
Kan Wu
May 31
A joint scaling recipe combining aggressive regularization and ensembling can achieve a 5x data efficiency win over standard pre-training methods.
Expert perspective
Kan Wu
May 31
In a continued pre-training scenario on math data, data-efficiency techniques like aggressive epoching and ensembling matched the performance of training on 73 billion tokens while using only 4 billio...
Expert perspective
Kan Wu
May 31
Public projections indicate that the amount of human-generated text on the internet is growing by approximately 3% per year.
Expert perspective
Kan Wu
May 31
The amount of compute spent per data point in pre-training will increase by roughly 4x year-over-year due to the disparity in growth rates between compute availability and data generation.
Speculative
Kan Wu
May 31
The amount of compute spent on pre-training large language models is growing by approximately 4x to 5x per year.
Expert perspective
Kan Wu
May 31
Sign up free to see the full entity analysis
Get started free
Back to Entities
Entity Detail