How confessions can keep language models honest

OpenAI · Wed, 03 Dec 2025 10:00:00 GMT

OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs....