HuggingfaceClient.Hub.InferenceEndpoints (huggingface_client v0.1.0)

Copy Markdown View Source

Manage Dedicated Inference Endpoints on the HuggingFace Hub.

Inference Endpoints let you deploy models on fully managed, dedicated infrastructure (CPU or GPU) with private endpoints and autoscaling.

See: https://huggingface.co/docs/inference-endpoints

Example

# Create an endpoint
{:ok, endpoint} = HuggingfaceClient.create_inference_endpoint(
  name: "my-bert-endpoint",
  repository: "sentence-transformers/all-MiniLM-L6-v2",
  framework: "pytorch",
  task: "sentence-similarity",
  accelerator: "cpu",
  vendor: "aws",
  region: "us-east-1",
  type: "protected",
  instance_size: "small",
  instance_type: "c6i",
  access_token: "hf_..."
)

# Get endpoint status
{:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-bert-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)
IO.puts("Status: #{ep["status"]["state"]}")

Summary

Functions

Creates a new dedicated inference endpoint.

Deletes an inference endpoint.

Gets details about a specific inference endpoint.

Lists all inference endpoints for a namespace (user or org).

Pauses an inference endpoint (stops billing while keeping configuration).

Resumes a paused inference endpoint.

Scales an endpoint to zero replicas (same as pause but more explicit).

Updates an existing inference endpoint configuration.

Functions

create(opts \\ [])

@spec create(keyword()) :: {:ok, map()} | {:error, Exception.t()}

Creates a new dedicated inference endpoint.

Required options

  • :name — endpoint name (unique within namespace)
  • :repository — model repo ID (e.g. "gpt2")
  • :accelerator"cpu" or "gpu"
  • :vendor — cloud vendor: "aws", "azure", "gcp"
  • :region — cloud region (e.g. "us-east-1")
  • :instance_size — size: "small", "medium", "large", "xlarge"
  • :instance_type — instance type (vendor-specific, e.g. "c6i" for AWS CPU)
  • :type — endpoint type: "public", "protected", or "private"

Optional options

  • :namespace — organization namespace (defaults to authenticated user)
  • :framework — ML framework (e.g. "pytorch")
  • :task — pipeline task (e.g. "text-generation")
  • :revision — model revision/branch
  • :image — custom Docker image config
  • :env — environment variables map
  • :scaling — scaling config map
  • :access_token

delete(name, opts \\ [])

@spec delete(
  String.t(),
  keyword()
) :: :ok | {:error, Exception.t()}

Deletes an inference endpoint.

Example

:ok = HuggingfaceClient.delete_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

get(name, opts \\ [])

@spec get(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Gets details about a specific inference endpoint.

Example

{:ok, ep} = HuggingfaceClient.get_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

list(opts \\ [])

@spec list(keyword()) :: {:ok, [map()]} | {:error, Exception.t()}

Lists all inference endpoints for a namespace (user or org).

Example

{:ok, endpoints} = HuggingfaceClient.list_inference_endpoints(access_token: "hf_...")

pause(name, opts \\ [])

@spec pause(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Pauses an inference endpoint (stops billing while keeping configuration).

Example

{:ok, ep} = HuggingfaceClient.pause_inference_endpoint("my-endpoint",
  namespace: "my-org",
  access_token: "hf_..."
)

resume(name, opts \\ [])

@spec resume(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Resumes a paused inference endpoint.

scale_to_zero(name, opts \\ [])

@spec scale_to_zero(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Scales an endpoint to zero replicas (same as pause but more explicit).

update(name, opts \\ [])

@spec update(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Updates an existing inference endpoint configuration.

Example

{:ok, ep} = HuggingfaceClient.update_inference_endpoint("my-endpoint",
  instance_size: "large",
  namespace: "my-org",
  access_token: "hf_..."
)