“Reasoning models reduce the maximum concurrent user batch size on a server to one-fourth or one-fifth of what is possible with standard models.”