“The typical generative AI stack for voice applications consists of a three-stage pipeline: a speech-to-text model, a large language model (such as those from OpenAI, Google, or Anthropic) for processing, and a text-to-speech model for generating the response.”