Inference Endpoints

Copy Markdown View Source

ASM.InferenceEndpoint publishes CLI-backed ASM providers as endpoint-shaped inference targets for northbound consumers such as jido_integration.

Stable API

The public northbound surface is intentionally small:

  • consumer_manifest/0
  • ensure_endpoint/3
  • release_endpoint/1

consumer_manifest/0 returns ASM's default completion-oriented consumer contract for the published endpoint seam.

ensure_endpoint/3 accepts:

  • an inference-shaped request
  • a consumer manifest
  • execution context metadata

It returns:

  • %ASM.InferenceEndpoint.EndpointDescriptor{}
  • %ASM.InferenceEndpoint.CompatibilityResult{}

release_endpoint/1 retires the lease-backed endpoint publication.

Publication Rules

ASM publishes the built-in CLI providers:

  • :codex
  • :claude
  • :gemini
  • :amp

Capability publication is derived from the landed core provider profiles rather than handwritten declarations.

Published metadata includes:

  • cli_completion_v1
  • cli_streaming_v1
  • cli_agent_v2

That metadata is available on the compatibility result and backend manifest, but the endpoint seam itself only exposes:

  • completion requests
  • streaming requests

It does not expose agent-loop semantics. Tool-bearing requests are rejected both at compatibility time and on the HTTP route.

Descriptor Contract

The published %EndpointDescriptor{} is OpenAI-compatible on purpose:

  • target_class: :cli_endpoint
  • protocol: :openai_chat_completions
  • loopback base_url
  • bearer auth header
  • pinned provider_identity
  • pinned model_identity
  • source_runtime: :agent_session_manager

The returned metadata also carries:

  • publication metadata
  • backend manifest data

That lets northbound consumers keep the durable route record honest without reconstructing provider claims themselves.

Runtime Behavior

The endpoint server is lease-backed and loopback-only.

Under the published HTTP path:

  • non-streaming requests execute through ASM.query/3
  • streaming requests execute through ASM.stream/3
  • the model is pinned to the published descriptor
  • health is available on the lease health route

The northbound endpoint therefore reuses the same ASM event and result projection path that ordinary session/query callers already consume.

Provider Boundaries

Gemini and Amp remain common-surface-only providers.

They can publish:

  • cli_completion_v1
  • cli_streaming_v1

They do not publish cli_agent_v2 through this seam. Claude and Codex may still expose richer provider-native agent surfaces above the common CLI endpoint path through ASM.Extensions.ProviderSDK.

Proof Surface

  • test/asm/inference_endpoint_test.exs
  • examples/inference_endpoint_http.exs