Greylock Notify me• Feb 24, 2026• 27:51Interview

Braintrust's Ankur Goyal on Why Evals Are the Core of AI Development

Ankur Goyal(Founder and CEO, Braintrust, guest)

Get the full transcript next time Greylock releases an episode

Summary, key quotes, top claims, and the searchable transcript - emailed automatically. No card needed.

Executive Summary

Continue your research

Keep pulling the thread on Ankur Goyal.

The Primacy of Evaluations in AI Development Radical Customer Obsession as a Go-to-Market Strategy

12 quotes

Concerns Raised

Incumbent companies face an existential threat if they fail to rebuild their products around AI.
Standard data infrastructure is not equipped to handle the scale and complexity of modern AI workloads.
The unpredictable nature of LLMs makes it difficult to build quality products without a rigorous evaluation framework.

Opportunities Identified

The shift of all software development towards AI creates a massive market for specialized developer tools like Braintrust.
The increasing capability of AI models to evaluate and improve themselves will simplify product development and enable self-optimizing systems.
A model-agnostic platform is a significant competitive advantage in a rapidly evolving and fragmented AI model landscape.

Key Themes

Research Findings12

Ankur Goyal observes that many incumbent companies now face an existential choice to either rebuild their products around AI or risk failure.

Ankur Goyal asserts that the most recent generation of AI models are now capable of evaluating and improving their own work.

Ankur Goyal believes that for AI product development, the primary and most critical activity is focusing on evaluations ('evals').

Ankur Goyal asserts that there is a general consensus among practitioners that the internal workings of Large Language Models are not fully understood.

Ankur Goyal predicts that the AI model a developer chooses today is highly unlikely to be the sole model they use in the future.

Stripe, Instacart, and Airtable are customers of Braintrust.

Braintrust's go-to-market strategy involved being highly selective about initial customers and focusing intensely on making that small group successful.

Braintrust's engineering culture prioritizes immediately fixing customer-reported issues over adhering to pre-planned sprint commitments.

Braintrust's initial open-source-based logging infrastructure failed to scale under the exponential growth of its early AI-native customers.

Braintrust's custom data system, Brainstore, is purpose-built to handle large volumes of text and complex JSON data characteristic of LLM workloads.

Ankur Goyal differentiates Braintrust from Datadog by stating that customers use Braintrust to achieve product 'quality', whereas they use Datadog to achieve 'uptime'.

A key feature of the Braintrust platform is the ability to connect production logs to evaluation datasets that are linked to code and prompts in GitHub.

Topics

AI Development LLM Evaluation Developer Tools B2B SaaS Go-to-Market Strategy Customer Obsession Product-Market Fit AI Infrastructure Data Systems AI Agents Startup Culture Engineering Management Founder Philosophy Observability AI Quality

Processed May 4, 2026Daily intelligence brief → yt-dlp + mlx-whisper + Gemini

Braintrust's Ankur Goyal on Why Evals Are the Core of AI Development

Continue your research

Concerns Raised

Opportunities Identified

Key Themes

The Primacy of Evaluations in AI Development

Radical Customer Obsession as a Go-to-Market Strategy

Purpose-Built Infrastructure for AI's Unique Data

The Founder's Role in Defining GTM

Research Findings12

Topics