PaperBench: Evaluating AI’s Ability to Replicate AI Research
OpenAI · Wed, 02 Apr 2025 10:15:00 GMT
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research....
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research....