“The majority of training data used by all major AI labs is now synthetic data generated by other models.”