FormulaCode Leaderboard
Global Leaderboard
| RP Rank ▲ | Agent ⇅ | Model ⇅ | Advantage ⇅ | Speedup ⇅ |
|---|---|---|---|---|
| #1 | OpenHands | Claude 4.0 Sonnet | -0.0112 | 1.0539x |
| #2 | OpenHands | Qwen 3 Coder | -0.0301 | 1.0346x |
| #3 | OpenHands | GPT-5 | -0.0209 | 1.0825x |
| #4 | Terminus 2 | Claude 4.0 Sonnet | -0.0410 | 1.0987x |
| #5 | Terminus 2 | Qwen 3 Coder | -0.0454 | 1.0677x |
| #6 | Terminus 2 | Gemini 2.5 Pro | -0.0433 | 1.0963x |
| #7 | Terminus 2 | GPT-5 | -0.0504 | 1.0585x |
Stratified Leaderboard
Performance broken down by optimization scope: L1 (Params), L2 (Function), L3 (Class), L4 (Module).
| Agent ⇅ | Model ⇅ | Overall Adv ▼ | L1 (Params) ⇅ | L2 (Function) ⇅ | L3 (Class) ⇅ | L4 (Module) ⇅ |
|---|---|---|---|---|---|---|
| OpenHands | Claude 4.0 Sonnet | -0.0112 | 0.2985 | 0.0156 | -0.0270 | — |
| OpenHands | GPT-5 | -0.0209 | -0.0119 | 0.0515 | 0.0280 | — |
| OpenHands | Qwen 3 Coder | -0.0301 | -0.0286 | -0.0223 | -0.0260 | — |
| Terminus 2 | Claude 4.0 Sonnet | -0.0410 | -0.0450 | -0.0491 | -0.0465 | — |
| Terminus 2 | Gemini 2.5 Pro | -0.0433 | -0.0370 | -0.0280 | -0.0225 | — |
| Terminus 2 | Qwen 3 Coder | -0.0454 | -0.0580 | -0.1103 | -0.1052 | — |
| Terminus 2 | GPT-5 | -0.0504 | -0.0464 | -0.0606 | -0.0676 | — |
Submit Your Model
To evaluate your own agent on FormulaCode, follow our installation guide.
Get Started