“The DinoWorld model's superior performance on 3D tasks is attributed to its use of a large foundational backbone pre-trained on relevant image data.”