# Inference Endpoints

`ASM.InferenceEndpoint` publishes CLI-backed ASM providers as
endpoint-shaped inference targets for northbound consumers such as
`jido_integration`.

## Stable API

The public northbound surface is intentionally small:

- `consumer_manifest/0`
- `ensure_endpoint/3`
- `release_endpoint/1`

`consumer_manifest/0` returns ASM's default completion-oriented consumer
contract for the published endpoint seam.

`ensure_endpoint/3` accepts:

- an inference-shaped request
- a consumer manifest
- execution context metadata

It returns:

- `%ASM.InferenceEndpoint.EndpointDescriptor{}`
- `%ASM.InferenceEndpoint.CompatibilityResult{}`

`release_endpoint/1` retires the lease-backed endpoint publication.

## Publication Rules

ASM publishes the built-in CLI providers:

- `:codex`
- `:claude`
- `:gemini`
- `:amp`

Capability publication is derived from the landed core provider profiles rather
than handwritten declarations.

Published metadata includes:

- `cli_completion_v1`
- `cli_streaming_v1`
- `cli_agent_v2`

That metadata is available on the compatibility result and backend manifest,
but the endpoint seam itself only exposes:

- completion requests
- streaming requests

It does not expose agent-loop semantics. Tool-bearing requests are rejected
both at compatibility time and on the HTTP route.

## Descriptor Contract

The published `%EndpointDescriptor{}` is OpenAI-compatible on purpose:

- `target_class: :cli_endpoint`
- `protocol: :openai_chat_completions`
- loopback `base_url`
- bearer auth header
- pinned `provider_identity`
- pinned `model_identity`
- `source_runtime: :agent_session_manager`

The returned `metadata` also carries:

- publication metadata
- backend manifest data

That lets northbound consumers keep the durable route record honest without
reconstructing provider claims themselves.

## Runtime Behavior

The endpoint server is lease-backed and loopback-only.

Under the published HTTP path:

- non-streaming requests execute through `ASM.query/3`
- streaming requests execute through `ASM.stream/3`
- the model is pinned to the published descriptor
- health is available on the lease health route

The northbound endpoint therefore reuses the same ASM event and result
projection path that ordinary session/query callers already consume.

## Provider Boundaries

Gemini and Amp remain common-surface-only providers.

They can publish:

- `cli_completion_v1`
- `cli_streaming_v1`

They do not publish `cli_agent_v2` through this seam. Claude and Codex may
still expose richer provider-native agent surfaces above the common CLI
endpoint path through `ASM.Extensions.ProviderSDK`.

## Proof Surface

- `test/asm/inference_endpoint_test.exs`
- `examples/inference_endpoint_http.exs`
