Skip to content
Sonic
AI
Sonic
AI
Home
Discover
Ask Sonic
Projects
Use with Claude or ChatGPT
Show me around
Request source or feature
How to Engineer AI Inference Systems [Philip Kiely] - 766, Sonic AI
Home
/
TwiML AI Podcast
/
How to Engineer AI Inference Systems [Philip Kiely] - 766
TwiML AI Podcast
Notify me
•
Apr 30, 2026
•
54:21
Interview
How to Engineer AI Inference Systems [Philip Kiely] - 766
From
TwiML AI Podcast
Philip Kiely
(Head of AI Education, Base10, guest)
Get the full transcript next time TwiML AI Podcast releases an episode
Summary, key quotes, top claims, and the searchable transcript — emailed automatically. No card needed.
Sign up
Executive Summary
Inference engineering is a critical and rapidly evolving discipline, combining GPU programming, distributed systems, and applied AI research, with demand for skilled engineers projected to grow 10-100x.
Companies with scaled AI products are maturing from per-token API models to dedicated deployments on specialized infrastructure to control costs and performance, a trend described as "owning your intelligence." NVIDIA's Hopper (H100) GPUs remain highly valuable and in demand for inference, even with the rollout of Blackwell, due to software optimization, export controls affecting research, and their suitability for smaller models.
The future of AI hardware may involve compute disaggregation (specialized chips for pre-fill vs.
decode) and ASICs, but sophisticated software and open-source inference engines remain essential for orchestrating these complex systems.
Continue your research
Keep pulling the thread on Philip Kiely.
The Rise of Inference Engineering
AI Product Maturity Cycle
Hardware and GPU Economics
Or ask anything across 400+ expert conversations
12
quotes
Transcript
Key Arguments
Analysis
Quotes & Entities
12
Related
Loading transcript...
Processed May 4, 2026
Daily intelligence brief →
yt-dlp + mlx-whisper + Gemini