Here is the command I am using: bigcodebench.evaluate --execution local --split complete --subset full --samples /scratch3/workspace/wenlongzhao_umass_edu-reason/dev ...
When running tests on Python 3.13.4 (3.13.3 was fine) I'm seeing a bunch of test failures: ...