Introducing SWE-bench Verified
OpenAI · Tue, 13 Aug 2024 10:00:00 GMT
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues....
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues....