The single most important and primary activity in AI product development is building and iterating on evaluations ('evals').
The latest generation of AI models has crossed a critical threshold, becoming capable of evaluating and improving their own work, which enables a new paradigm of self-improvement.
Incumbent companies face an existential choice: either fundamentally rebuild their products around AI or they will fail.
The choice of a specific AI model is transient and unreliable for long-term planning; therefore, tooling and infrastructure must be model-agnostic.
Founders cannot hire go-to-market leaders to solve fundamental product or strategy problems; these must be figured out by the core team first.
Early Career
Dropped out of the computer science program at Carnegie Mellon University to become the second employee at the database company MemSQL.
First Founding
Founded his first company, Empira, which focused on using AI for document extraction.
Post-Acquisition
After Empira was acquired by Figma, Goyal transitioned to lead the AI team at Figma.
Braintrust Founding
Founded Braintrust, implementing a go-to-market strategy focused on being highly selective with initial customers to ensure their success.
Braintrust Technical Evolution
After Braintrust's initial open-source logging infrastructure failed to scale, the company built its own purpose-built data system, Brainstore, to handle complex LLM workloads.
Braintrust Product Maturation
Braintrust launched a native integration with OpenAI's real-time API ahead of OpenAI's own tooling and introduced an internal AI agent named 'Loop' to simplify the product.
▶The Primacy of 'Evals' in AI DevelopmentMay 2026
Goyal argues that the most critical and primary activity for building AI products is creating and refining evaluations ('evals'). This philosophy is embedded in his company's product, which connects production logs to evaluation datasets linked directly to code and prompts, creating a continuous improvement loop.
This focus on evaluation as the core development loop suggests a paradigm shift from traditional software development, highlighting a key area of tooling and infrastructure investment for AI-native companies that prioritize quality and reliability.
▶AI-Driven Organizational and Product PhilosophyMay 2026
Goyal's management and product strategy are heavily influenced by AI principles and observations from other tech leaders. He advocates for a flat organizational structure inspired by NVIDIA's CEO and believes Braintrust's internal AI agent, Loop, will simplify the product by reducing the need for new features.
This indicates a belief that AI will not just be a feature but will fundamentally reshape how companies are structured and how products evolve, potentially leading to leaner organizations and simpler, agent-driven user experiences.
▶The Unpredictable and Opaque Nature of LLMsMay 2026
Goyal emphasizes the uncertainty in the AI space, stating that practitioners generally agree that no one fully understands how LLMs work. He also predicts that the model a developer chooses today is highly unlikely to be the one they use exclusively in the future, underscoring the field's volatility.
For investors and analysts, this highlights the risk of betting on specific models and the corresponding opportunity for companies like Braintrust that provide model-agnostic infrastructure and tooling that abstract away model-specific dependencies.
▶Pragmatic Go-to-Market and Engineering CultureMay 2026
Goyal details a highly focused go-to-market strategy that involved being selective about initial customers to ensure their success. This is paired with an engineering culture that drops everything to fix customer-reported issues, prioritizing real-world user problems over pre-planned sprints.
This reveals a founder-led, product-centric approach that prioritizes deep customer value and reliability over premature scaling, a potentially durable model for other deep-tech startups navigating nascent markets.