Detecting and reducing scheming in AI models

OpenAI · Wed, 17 Sep 2025 00:00:00 GMT

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete ...