“A significant challenge in using commercial large language models for quantitative backtesting is the inability to control for "in-sample" data contamination, as the models have been trained on data that includes the outcomes one is trying to predict.”