▶Pope consistently describes his startup, MatX (also referred to as Maddox), as developing a 'splittable systolic array' architecture. This technology is designed to allow a large matrix multiplication unit to be partitioned into smaller, independent units to efficiently handle diverse workloads like the attention mechanism in Transformers.May 2026
▶Across multiple discussions, Pope emphasizes the high and rising costs associated with custom chip design and fabrication. He repeatedly cites figures like $30 million for an initial tape-out and up to $100 million for small-volume production, highlighting the significant capital barrier in the semiconductor industry.
▶Pope's background as a former TPU architect at Google is a consistent part of his professional identity. He frequently draws on this experience to contrast the architectural philosophies of Google's TPUs (coarse-grained, scratchpad memory) with NVIDIA's GPUs (many fine-grained SMs, hardware-managed caches).May 2026
▶Pope presents a contrarian view on NVIDIA's CUDA software moat, arguing it is less effective against the handful of 'Frontier Labs' (like OpenAI, Anthropic) who possess the economic incentive and technical capability to write custom software for more efficient hardware, challenging the widely held belief in CUDA's insurmountable advantage.May 2026
▶While acknowledging the superior latency of SRAM-based AI chips from competitors like Groq and Cerebras, Pope argues they are not cost-competitive on a dollars-per-token basis. He positions his company's hybrid SRAM/HBM approach as the optimal solution, creating a debate point around whether latency or throughput economics will ultimately dominate the market.
▶Pope's analysis of GPU architecture evolution highlights a tension in design choices. He notes that historically, halving numerical precision doubled performance, but this is changing with newer generations (e.g., B300's 3x FP4 vs. FP8 speedup), and that allocating die area between different precisions is a primary, non-fungible design trade-off.May 2026
▶He contrasts the deterministic latency of TPUs and Groq chips with the throughput-optimized, higher-latency design of traditional GPUs. This frames a key debate in AI hardware design: whether to optimize for predictable, fast single-token generation or for maximizing overall throughput via large batch processing.May 2026
Sign up free to see the full intelligence report
Get started free