Shreya Shankar — Sonic AI

Skip to content

Home Discover Ask Sonic Projects

Home Discover Ask Sonic Projects

Shreya Shankar — Sonic AI

Home/Discover/Shreya Shankar

S

Shreya Shankar

Person · Tech

11

Mentions

Episodes

11

Claims

Research shows that developers' criteria for 'good' and 'bad' LLM outputs evolve as they review more examples, a phenomenon known as 'criteria drift', making it impossible to define a complete evaluat...

Expert perspectiveShreya ShankarApr 3

Products like Anthropic's Claude Code are built upon foundational models that have been extensively evaluated on coding benchmarks, even if the application team itself claims to rely more on 'vibes'.

Expert perspectiveShreya ShankarApr 3

For most AI products, a small number of 'LLM as a judge' evals, typically between four and seven, is sufficient to cover the most critical failure modes.

Expert perspectiveShreya ShankarApr 3

LLM judges can be used both in offline unit tests or CI/CD pipelines and for online monitoring of real production traces to measure failure rates over time.

Expert perspectiveShreya ShankarApr 3

The initial process of setting up a robust evaluation system for an AI product typically takes three to four days, followed by an ongoing maintenance cost of about 30 minutes per week.

Expert perspectiveShreya ShankarApr 3

LLMs often fail at automated error analysis because they lack the necessary product context to identify certain failures, such as hallucinating a feature that does not exist.

Expert perspectiveShreya ShankarApr 3

The acquisition of A/B testing company Statsig by OpenAI was a strategic move, potentially influenced by the fact that OpenAI's competitors were also using Statsig's platform.

SpeculativeShreya ShankarApr 3

An 'LLM as a judge' is most effective when scoped to evaluate a single, narrow failure mode with a binary pass/fail output.

Expert perspectiveShreya ShankarApr 3

To validate an 'LLM as a judge', teams should compare its outputs against human-labeled data using a confusion matrix to analyze false positives and false negatives, rather than relying on a simple ac...

Expert perspectiveShreya ShankarApr 3

The concept of error analysis, including 'open coding' and 'axial coding', is a long-standing technique from machine learning and social science, not a new invention for LLMs.

Expert perspectiveShreya ShankarApr 3

OpenAI's evaluation methods include analyzing public sentiment from sources like Twitter and Reddit to identify product issues.

Expert perspectiveShreya ShankarApr 3

Sign up free to see the full entity analysis

Get started free

Back to Entities Entity Detail