Python Parity Fixtures

Milestone 0 focuses on capturing golden event streams from the Python Codex SDK so that every Elixir parity test can replay deterministic transcripts. This document explains how to harvest, review, and maintain those fixtures.

Goals

Produce JSONL logs that represent the canonical behavior of key workflows (thread lifecycle, tools, structured output, sandbox approvals, error paths).
Store fixtures under integration/fixtures/python with stable filenames and metadata.
Regenerate fixtures as the Python SDK evolves, keeping a clear audit trail.

Current Fixtures

thread_basic.jsonl – Baseline single-turn conversation fixture used by parity tests.
thread_auto_run_step1.jsonl / thread_auto_run_step2.jsonl – Continuation-aware auto-run scenario exercised by the Elixir auto-run loop tests.
thread_auto_run_pending.jsonl – Pending continuation fixture validating max-attempt guardrails.
thread_tool_auto_step1.jsonl / thread_tool_auto_step2.jsonl – Tool invocation scenario with continuation/resumption.
thread_tool_auto_pending.jsonl – Tool-approval denial transcript used to assert policy handling.

Harvesting Workflow

Clone Python SDK
Check out the openai/codex repository next to this project (or set CODEX_PYTHON_SDK_PATH).

Install Dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]

Build or Download codex-rs Binary
Ensure the Python SDK runs against the same codex-rs version we pin in this repo. Point to it via --codex-path when running the harvester if needed.

Run Harvester

python3 scripts/harvest_python_fixtures.py \
 --python-sdk ../openai/codex \
  --output integration/fixtures/python

Use --scenario to target a subset (e.g., --scenario thread_basic).

Review Output
Inspect generated .jsonl files and associated schemas. Confirm naming, metadata comments (if any), and absence of secrets.
Commit Fixtures
Add new or updated files under integration/fixtures/. Note in PR and update the parity checklist.

Scenario Modules

The harvester expects the Python repo to provide modules under harvest_scenarios.* with a record(output_path, **kwargs) function. Each function should:

Execute the relevant workflow using the Python SDK.
Stream codex events into output_path as JSONL.
Optionally write structured output schemas under integration/fixtures/schemas.

Example skeleton (in Python repo):

from codex.client import CodexClient

def record(output_path, codex_path=None):
    client = CodexClient(codex_binary=codex_path)
    thread = client.start_thread()
    turn = client.run(thread, "hello")

    with open(output_path, "w", encoding="utf-8") as f:
        for event in turn.events:
            f.write(event.json() + "\n")

Maintenance Checklist

Update SCENARIOS in scripts/harvest_python_fixtures.py when new workflows need coverage.
Track harvested scenarios and their freshness in docs/python-parity-checklist.md.
Regenerate fixtures whenever the Python SDK changes behavior; keep diffs to confirm expected deltas.
Ensure sensitive data is redacted before committing.

Troubleshooting

Module Not Found: Verify PYTHONPATH includes the Python repo (the harvester adds it automatically).
codex-rs Mismatch: Rebuild or download the binary pinned in config/native.exs once available.
Fixture Drift: Re-run harvester and compare diffs. Legitimate changes should be accompanied by updated Elixir tests.

← Previous Page Codex SDK TDD Implementation Guide

Next Page → Observability Runbook