“Among AI modalities, text generation is the cheapest to serve, followed by audio, while video and world models are the most expensive.”