HuggingfaceClient.Hub.InferenceEndpoints (huggingface_client v0.1.0)

Manage Dedicated Inference Endpoints on the HuggingFace Hub.

Inference Endpoints let you deploy models on fully managed, dedicated infrastructure (CPU or GPU) with private endpoints and autoscaling.

See: https://huggingface.co/docs/inference-endpoints

Example

# Create an endpoint
{:ok, endpoint} = HuggingfaceClient.create_inference_endpoint(
  name: "my-bert-endpoint",
  repository: "sentence-transformers/all-MiniLM-L6-v2",
  framework: "pytorch",
  task: "sentence-similarity",
  accelerator: "cpu",
  vendor: "aws",
  region: "us-east-1",
  type: "protected",
  instance_size: "small",
  instance_type: "c6i",
  access_token: "hf_..."
)

# Get endpoint status
{:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-bert-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)
IO.puts("Status: #{ep["status"]["state"]}")

Summary

Functions

create(opts \\ [])

Creates a new dedicated inference endpoint.

delete(name, opts \\ [])

Deletes an inference endpoint.

get(name, opts \\ [])

Gets details about a specific inference endpoint.

list(opts \\ [])

Lists all inference endpoints for a namespace (user or org).

pause(name, opts \\ [])

Pauses an inference endpoint (stops billing while keeping configuration).

resume(name, opts \\ [])

Resumes a paused inference endpoint.

scale_to_zero(name, opts \\ [])

Scales an endpoint to zero replicas (same as pause but more explicit).

update(name, opts \\ [])

Updates an existing inference endpoint configuration.

Functions

create(opts \\ [])

@spec create(keyword()) :: {:ok, map()} | {:error, Exception.t()}

Creates a new dedicated inference endpoint.

Required options

:name — endpoint name (unique within namespace)
:repository — model repo ID (e.g. "gpt2")
:accelerator — "cpu" or "gpu"
:vendor — cloud vendor: "aws", "azure", "gcp"
:region — cloud region (e.g. "us-east-1")
:instance_size — size: "small", "medium", "large", "xlarge"
:instance_type — instance type (vendor-specific, e.g. "c6i" for AWS CPU)
:type — endpoint type: "public", "protected", or "private"

Optional options

:namespace — organization namespace (defaults to authenticated user)
:framework — ML framework (e.g. "pytorch")
:task — pipeline task (e.g. "text-generation")
:revision — model revision/branch
:image — custom Docker image config
:env — environment variables map
:scaling — scaling config map
:access_token

delete(name, opts \\ [])

@spec delete(
  String.t(),
  keyword()
) :: :ok | {:error, Exception.t()}

Deletes an inference endpoint.

Example

:ok = HuggingfaceClient.delete_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

get(name, opts \\ [])

@spec get(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Gets details about a specific inference endpoint.

Example

{:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

list(opts \\ [])

@spec list(keyword()) :: {:ok, [map()]} | {:error, Exception.t()}

Lists all inference endpoints for a namespace (user or org).

Example

{:ok, endpoints} = HuggingfaceClient.list_inference_endpoints(access_token: "hf_...")

pause(name, opts \\ [])

@spec pause(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Pauses an inference endpoint (stops billing while keeping configuration).

Example

{:ok, ep} = HuggingfaceClient.pause_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

resume(name, opts \\ [])

@spec resume(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Resumes a paused inference endpoint.

scale_to_zero(name, opts \\ [])

@spec scale_to_zero(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Scales an endpoint to zero replicas (same as pause but more explicit).

update(name, opts \\ [])

@spec update(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Updates an existing inference endpoint configuration.

Example

{:ok, ep} = HuggingfaceClient.update_inference_endpoint("my-endpoint",
  instance_size: "large",
  namespace: "my-org",
  access_token: "hf_..."
)