# `HuggingfaceClient.Hub.InferenceEndpoints`
[🔗](https://github.com/huggingface/huggingface_client/blob/v0.1.0/lib/huggingface_client/hub/deployment/endpoints.ex#L1)

Manage Dedicated Inference Endpoints on the HuggingFace Hub.

Inference Endpoints let you deploy models on fully managed, dedicated
infrastructure (CPU or GPU) with private endpoints and autoscaling.

See: https://huggingface.co/docs/inference-endpoints

## Example

    # Create an endpoint
    {:ok, endpoint} = HuggingfaceClient.create_inference_endpoint(
      name: "my-bert-endpoint",
      repository: "sentence-transformers/all-MiniLM-L6-v2",
      framework: "pytorch",
      task: "sentence-similarity",
      accelerator: "cpu",
      vendor: "aws",
      region: "us-east-1",
      type: "protected",
      instance_size: "small",
      instance_type: "c6i",
      access_token: "hf_..."
    )

    # Get endpoint status
    {:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-bert-endpoint",
      namespace: "my-org",
      access_token: "hf_..."
    )
    IO.puts("Status: #{ep["status"]["state"]}")

# `create`

```elixir
@spec create(keyword()) :: {:ok, map()} | {:error, Exception.t()}
```

Creates a new dedicated inference endpoint.

## Required options

- `:name` — endpoint name (unique within namespace)
- `:repository` — model repo ID (e.g. `"gpt2"`)
- `:accelerator` — `"cpu"` or `"gpu"`
- `:vendor` — cloud vendor: `"aws"`, `"azure"`, `"gcp"`
- `:region` — cloud region (e.g. `"us-east-1"`)
- `:instance_size` — size: `"small"`, `"medium"`, `"large"`, `"xlarge"`
- `:instance_type` — instance type (vendor-specific, e.g. `"c6i"` for AWS CPU)
- `:type` — endpoint type: `"public"`, `"protected"`, or `"private"`

## Optional options

- `:namespace` — organization namespace (defaults to authenticated user)
- `:framework` — ML framework (e.g. `"pytorch"`)
- `:task` — pipeline task (e.g. `"text-generation"`)
- `:revision` — model revision/branch
- `:image` — custom Docker image config
- `:env` — environment variables map
- `:scaling` — scaling config map
- `:access_token`

# `delete`

```elixir
@spec delete(
  String.t(),
  keyword()
) :: :ok | {:error, Exception.t()}
```

Deletes an inference endpoint.

## Example

    :ok = HuggingfaceClient.delete_inference_endpoint("my-endpoint",
      namespace: "my-org",
      access_token: "hf_..."
    )

# `get`

```elixir
@spec get(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Gets details about a specific inference endpoint.

## Example

    {:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-endpoint",
      namespace: "my-org",
      access_token: "hf_..."
    )

# `list`

```elixir
@spec list(keyword()) :: {:ok, [map()]} | {:error, Exception.t()}
```

Lists all inference endpoints for a namespace (user or org).

## Example

    {:ok, endpoints} = HuggingfaceClient.list_inference_endpoints(access_token: "hf_...")

# `pause`

```elixir
@spec pause(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Pauses an inference endpoint (stops billing while keeping configuration).

## Example

    {:ok, ep} = HuggingfaceClient.pause_inference_endpoint("my-endpoint",
      namespace: "my-org",
      access_token: "hf_..."
    )

# `resume`

```elixir
@spec resume(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Resumes a paused inference endpoint.

# `scale_to_zero`

```elixir
@spec scale_to_zero(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Scales an endpoint to zero replicas (same as pause but more explicit).

# `update`

```elixir
@spec update(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Updates an existing inference endpoint configuration.

## Example

    {:ok, ep} = HuggingfaceClient.update_inference_endpoint("my-endpoint",
      instance_size: "large",
      namespace: "my-org",
      access_token: "hf_..."
    )

---

*Consult [api-reference.md](api-reference.md) for complete listing*