Remaining Testing & CI Requirements

View Source

This document details the outstanding test coverage, automation, and CI pipeline work needed to support full Codex SDK delivery.

Test Coverage Backlog

AreaRequired AdditionsMilestone
Event DomainStreamData property suite, full fixture decoding coverageM1
Turn PipelineAuto-run loop tests, cancellation, usage aggregationM1
Tooling & MCPTool registry unit tests, MCP handshake integrationM2
Approvals & SandboxPolicy combinator property tests, integration scenariosM2
AttachmentsChunked upload integration tests, cleanup assertionsM3
Structured OutputSchema builder doctests, invalid payload error handlingM3
ObservabilityTelemetry snapshot tests, log diff testsM4
Error HandlingFailure fixture contract testsM4
Regression HarnessPython vs Elixir diff runner, nightly jobM5

Supertester Adoption Tasks

  • Introduce Codex.SupertesterCase with use Supertester.UnifiedTestFoundation.
  • Replace manual Port mocks with Supertester.GenServerHelpers where applicable.
  • Add assert_no_process_leaks/1 checks to integration suites.
  • Document Supertester patterns in docs/04-testing-strategy.md.

Fixture Management

  • Implement fixture checksum manifest to detect drift.
  • Automate fixture regeneration via CI job invoking scripts/harvest_python_fixtures.py.
  • Store structured-output schemas under integration/fixtures/schemas (pending for M3).

CI Pipeline Enhancements

  1. Per-PR Jobs
    • mix compile --warnings-as-errors
    • mix format --check-formatted
    • mix test --include integration
    • mix coveralls.github
    • mix credo --strict
  2. Nightly Jobs
    • mix codex.parity (Python vs Elixir diff)
    • MIX_ENV=dev mix dialyzer (with cached PLTs)
    • Fixture regeneration + diff check
  3. Cross-Platform
    • Matrix for Ubuntu/macOS to validate binary handling and file permissions.

Current Automation

Test Tooling Backlog

  • Build fake codex binary generator (configurable event scripts) for integration tests.
  • Add telemetry capture helper to simplify event assertions.
  • Provide Mox-compatible wrappers for tool registry to maintain async tests.

Success Metrics

  • ≥95 % coverage maintained in CI (mix coveralls).
  • All tests async: true except where external orchestration forbids (document exceptions).
  • Zero flaky tests over a 30-day window (monitored via CI stability dashboard).
  • Nightly parity harness produces zero diffs or opens blocking issue automatically.