“An empirical finding from the Time Horizons benchmark is that AI models are consistently more successful on shorter tasks (by human completion time) than longer ones, a trend that holds for models from GPT-2 to recent ones.”

David ReinAI / ML

Loading full analysis…