Detecting and reducing scheming in AI models
OpenAI · Wed, 17 Sep 2025 00:00:00 GMT
Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete ...