ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code Execution
The paper introduces ResearchEnvBench, a new benchmark designed to evaluate autonomous agents' ability to synthesize complex execution environments for research code, revealing significant current limitations in dependency resolution and version management.