“Long-form YouTube content has recently become a significant data source for training Large Language Models.”