Pre-training language models requires trillions of tokens, which is a significantly higher sample..., Sonic AI
“Pre-training language models requires trillions of tokens, which is a significantly higher sample efficiency than humans require, as humans do not process trillions of words.”