We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues.
https://openai.com/index/introducing-swe-bench-verified/
https://openai.com/index/introducing-swe-bench-verified/