Skip to content
Sonic
AI
Sonic
AI
Home
Discover
Ask Sonic
Projects
Use with Claude or ChatGPT
Show me around
Request source or feature
Jeffrey Irving, Sonic AI
Home
/
Discover
/
Jeffrey Irving
J
Jeffrey Irving
Person · Tech
16
Mentions
Episodes
16
Claims
Claims
By Source
Timeline
All
(16)
Business
(0)
Healthcare
(0)
Government
(2)
Tech
(14)
Energy
(0)
Science
(0)
Geopolitics
(0)
The UK AI Security Institute conducted a long-term red teaming collaboration with Anthropic and OpenAI that found significantly more jailbreaks than a normal pre-deployment evaluation would have.
Official source
Jeffrey Irving
Apr 3
Many recently observed sophisticated bad behaviors in AI, such as sycophancy and deception, are all different manifestations of the same underlying problem of reward hacking.
Expert perspective
Jeffrey Irving
Apr 3
A key failure mode for AI debate, discovered in human experiments by Beth Barnes, is for one party to steer the conversation into a confusing area where neither side knows the answer, which can fool t...
Expert perspective
Jeffrey Irving
Apr 3
Techniques like pre-training data filtering and unlearning can remove dangerous capabilities from open-source models, but this only buys time before general capabilities allow the model to re-acquire ...
Expert perspective
Jeffrey Irving
Apr 3
The UK AI Security Institute has evaluated over 30 different models or testing runs, and every time it has conducted safeguard testing, it has successfully jailbroken the model.
Official source
Jeffrey Irving
Apr 3
Over the last year, despite various AI models exhibiting deceptive behaviors, the world's primary response has been to continue training stronger models.
Expert perspective
Jeffrey Irving
Apr 3
It is important to build up independent AI safety and security research at nonprofits, in academia, and in governments, rather than having most of the work happen at AI developer labs.
Expert perspective
Jeffrey Irving
Apr 3
The UK AI Security Institute (AISI) has approximately 100 technical experts on staff and around 200 total employees.
Official source
Jeffrey Irving
Apr 3
Current pragmatic AI safety approaches, such as AI control measures, monitoring, and honesty training, all have correlated potential failures and could fail for the same essential reason.
Expert perspective
Jeffrey Irving
Apr 3
Reinforcement learning is being successfully used to improve AI model capabilities in non-verifiable domains, such as analyzing a photo of a biology experiment.
Expert perspective
Jeffrey Irving
Apr 3
A study by the UK AI Security Institute's human influence team found that AI models are very effective at persuasion on political questions, with newer models being more persuasive.
Official source
Jeffrey Irving
Apr 3
The risk of AI loss of control is more strongly coupled with cybersecurity risks than with biological risks.
Expert perspective
Jeffrey Irving
Apr 3
The three main catastrophic risks the UK AI Security Institute focuses on are biological weapons, large-scale cyber attacks, and loss of control.
Official source
Jeffrey Irving
Apr 3
The UK AI Security Institute's evaluations are conducted using "inspect," an open-source package also used by other governments and AI developers for model testing.
Official source
Jeffrey Irving
Apr 3
The UK AI Security Institute places significant probability on the idea that current AI methods will scale, and where they don't, more mundane algorithmic progress will fill the gaps.
Official source
Jeffrey Irving
Apr 3
The typical AI developer plan for addressing loss-of-control risk involves using pragmatic safety measures and monitoring to buy time until automated safety research can find better solutions.
Expert perspective
Jeffrey Irving
Apr 3
Sign up free to see the full entity analysis
Get started free
Back to Entities
Entity Detail