Skip to content

Add ABIDES arena#104

Open
Muhtasham wants to merge 7 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/abides-arena
Open

Add ABIDES arena#104
Muhtasham wants to merge 7 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/abides-arena

Conversation

@Muhtasham

@Muhtasham Muhtasham commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add an ABIDES financial-market simulation arena backed by abides-sim/abides
  • pin upstream ABIDES to c4bf157678928934417aba6073eb0651aeaf6d15, constrain Python dependencies, and pin pip in the arena image
  • expose a restricted CodeClash policy protocol: submitted abides_agent.py defines decide(observation) and returns declarative buy/sell limit-order intents
  • add the trusted runtime adapter, starter policy, example config, arena docs, and unit coverage
  • harden order-intent normalization so boolean and non-integral numeric fields are rejected instead of silently coerced

Competition Format

ABIDES is an independent score-maximization arena, not direct model-vs-model trading in the same market instance.

Each CodeClash player is evaluated in an isolated ABIDES market process using matched seeds/configuration: exchange, market maker, and background zero-intelligence traders. CodeClash compares players by average trusted mark-to-market profit across those matched market runs.

This follows the now-merged restricted-protocol pattern from CybORG (#110) and SCML (#111): submitted policies do not own mutable simulator objects, and the arena runtime owns validation, scoring, timeouts, and simulator integration. The competition is over who writes the better policy for the same seeded simulator conditions.

Runtime Behavior

  • evaluates each CodeClash player in identical seeded ABIDES market worlds with an exchange, market maker, and background zero-intelligence traders
  • compares players by average mark-to-market profit across configured market seeds
  • keeps ABIDES kernel, exchange, ledgers, order books, and order construction inside trusted runtime code
  • calls submitted decide(observation) code out-of-process with a per-decision timeout and passes only plain observation dictionaries
  • validates and clamps returned order intents before submitting trusted ABIDES LimitOrders
  • computes scores from trusted exchange ORDER_EXECUTED messages plus final exchange price, not from mutable submitted-code state
  • runs each player simulation in a child process with a per-player timeout so import/runtime hangs receive CRASH_SCORE without stalling the whole round
  • prepends the submitted file's parent directory during policy import so normal multi-file submissions like from helper import X work

Hardening

  • replaced the native TradingAgent subclass submission contract with a restricted protocol boundary
  • added validation_timeout, decision_timeout, and player_timeout config knobs
  • changed missing abides_results.json handling from neutral 0.0 ties to CRASH_SCORE with error details
  • uses isolated per-player worlds to avoid direct opponent-object access while keeping matched seeds/configs for fair score comparison
  • retains trusted execution-ledger checks so only simulator-processed player orders affect scored cash/shares
  • rejects boolean and non-integral numeric order fields, while still accepting whole-number floats and applying configured clamps

Verification

  • rebased on latest main after Add Bomberland arena #105, Restrict CybORG player protocol #110, and Restrict SCML player protocol #111 merged
  • GitHub CI after latest hardening commit:
    • markdown link check passed
    • pre-commit passed
    • pytest passed
  • local after latest hardening commit:
    • UV_CACHE_DIR=/private/tmp/codeclash-uv-cache uv run ruff check codeclash/arenas/abides/abides.py codeclash/arenas/abides/runtime/run_abides.py tests/arenas/test_abides.py -> passed
    • UV_CACHE_DIR=/private/tmp/codeclash-uv-cache uv run pytest -q tests/arenas/test_abides.py -> 13 passed
    • UV_CACHE_DIR=/private/tmp/codeclash-uv-cache uv run pre-commit run --files codeclash/arenas/abides/runtime/run_abides.py tests/arenas/test_abides.py -> passed
  • earlier full-runtime verification in this PR:
    • docker build -t codeclash/abides -f codeclash/arenas/abides/ABIDES.Dockerfile . -> passed
    • direct Docker runtime smoke with two starter policies produced per-sim status: "ok", orders_submitted: 18, policy_errors: 0, and average scores of -900.0 for both players
    • full CodeClash dummy smoke completed with both players validated, both rounds tied, and per-sim trusted ledger fields cash, shares, orders_submitted, policy_errors, and status
    • adversarial direct Docker checks confirmed huge intents are clamped/bounded, malformed decisions are logged as policy errors, and infinite-loop decide() calls hit the decision timeout without stalling the simulation

Note: local UV_CACHE_DIR=/private/tmp/codeclash-uv-cache uv run pytest -q tests/arenas reached 219 passed and 2 Figgie failures because Docker is not running locally (/Users/muhtasham/.docker/run/docker.sock missing). The same latest branch's GitHub pytest check is green.

@Muhtasham Muhtasham force-pushed the feat/abides-arena branch from 54128a6 to bbc23f7 Compare May 5, 2026 11:42
@Muhtasham Muhtasham force-pushed the feat/abides-arena branch 2 times, most recently from f9ccb8f to 6adff22 Compare June 24, 2026 22:49
@Muhtasham Muhtasham requested a review from john-b-yang June 25, 2026 14:38
@Muhtasham Muhtasham force-pushed the feat/abides-arena branch from 3acba5c to 526a714 Compare July 1, 2026 15:32
@john-b-yang

Copy link
Copy Markdown
Contributor

@Muhtasham thanks so much for all this push, this is really exciting! I ended up taking a more thorough look though, and for a couple reasons, let's shelve this one for now (leave PR open rather than merge).

To list the reasons:

  • not head-to-head: i think this has similar issues as CybORG, but at least with CybORG there is a common adversary. I guess here, there's a bunch of background traders + market makers. So model solutions don't interact with each other. I guess my main question though is, do the models' solutions face a common opponent like they would in CybORG? Or is there any wiggle room here in that it would be possible to run multiple solutions against each other?
  • the scoring seems a bit hard to understand, as they're kind of stitching together some of the internals in ABIDES.

I do really want to incorporate a market-making arena here, i think the head-to-head format is a key missing component that would be really desirable. for instance, the Jane Street ETC competition unfortunately doesn't seem to have any open source edition, but that'd be closer to what I'd imagine is appropriate for CodeClash.

But I'll leave this open. perhaps we can keep chatting about it, but for now, no pressure to make this one work.

@Muhtasham

Copy link
Copy Markdown
Contributor Author

Thanks John, that makes sense. I agree this is the right place to pause rather than force it in.

To clarify the current design: #104 is independent score-maximization over matched seeded ABIDES worlds. Each player gets the same config/seed schedule with a market maker and ZI background traders, but those worlds are instantiated separately per player. So the submitted policies do not directly interact in the same market, and I agree that this is not head-to-head in the Bomberland/CoreWar sense.

I also agree the scoring is harder to explain than ideal. It uses runtime-owned ledgers updated from trusted ABIDES exchange execution messages, but that does require stitching into ABIDES internals.

I think the next useful version would be a redesign around shared-market competition: multiple CodeClash submissions trading in the same ABIDES market, with runtime-owned ledgers/scoring per participant and careful controls for ordering/position artifacts. That is more work and should probably be designed explicitly rather than patched onto this PR.

I’ll leave this open as a reference implementation for the Docker/runtime adapter and restricted policy boundary, but won’t push for merge as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants