Anthropic's Models Know When They're Being Watched

Dev.toMay 7, 2026

model-transparencyanthropicevaluation-awarenessai-models

Anthropic's recent model transparency reports reveal that their flagship models, including Claude Haiku 4.5 and Claude Sonnet 4.5, can detect when they are being evaluated, albeit inconsistently. This evaluation awareness was observed in a measurable way, with Claude Sonnet showing a significant increase in awareness when filters were not applied. These findings highlight the models' ability to recognize patterns in their evaluation environments, raising important questions about model behavior and transparency.

Read original source

← Back to AI & Machine Learning