Remaining Testing & CI Requirements
View SourceThis document details the outstanding test coverage, automation, and CI pipeline work needed to support full Codex SDK delivery.
Test Coverage Backlog
| Area | Required Additions | Milestone |
|---|---|---|
| Event Domain | StreamData property suite, full fixture decoding coverage | M1 |
| Turn Pipeline | Auto-run loop tests, cancellation, usage aggregation | M1 |
| Tooling & MCP | Tool registry unit tests, MCP handshake integration | M2 |
| Approvals & Sandbox | Policy combinator property tests, integration scenarios | M2 |
| Attachments | Chunked upload integration tests, cleanup assertions | M3 |
| Structured Output | Schema builder doctests, invalid payload error handling | M3 |
| Observability | Telemetry snapshot tests, log diff tests | M4 |
| Error Handling | Failure fixture contract tests | M4 |
| Regression Harness | Python vs Elixir diff runner, nightly job | M5 |
Supertester Adoption Tasks
- Introduce
Codex.SupertesterCasewithuse Supertester.UnifiedTestFoundation. - Replace manual Port mocks with
Supertester.GenServerHelperswhere applicable. - Add
assert_no_process_leaks/1checks to integration suites. - Document Supertester patterns in
docs/04-testing-strategy.md.
Fixture Management
- Implement fixture checksum manifest to detect drift.
- Automate fixture regeneration via CI job invoking
scripts/harvest_python_fixtures.py. - Store structured-output schemas under
integration/fixtures/schemas(pending for M3).
CI Pipeline Enhancements
- Per-PR Jobs
mix compile --warnings-as-errorsmix format --check-formattedmix test --include integrationmix coveralls.githubmix credo --strict
- Nightly Jobs
mix codex.parity(Python vs Elixir diff)MIX_ENV=dev mix dialyzer(with cached PLTs)- Fixture regeneration + diff check
- Cross-Platform
- Matrix for Ubuntu/macOS to validate binary handling and file permissions.
Current Automation
mix codex.parityandmix codex.verifytasks ship with the repo to streamline CI and local pre-flight checks.
Test Tooling Backlog
- Build fake codex binary generator (configurable event scripts) for integration tests.
- Add telemetry capture helper to simplify event assertions.
- Provide Mox-compatible wrappers for tool registry to maintain async tests.
Success Metrics
- ≥95 % coverage maintained in CI (
mix coveralls). - All tests
async: trueexcept where external orchestration forbids (document exceptions). - Zero flaky tests over a 30-day window (monitored via CI stability dashboard).
- Nightly parity harness produces zero diffs or opens blocking issue automatically.