HfHub.InferenceEndpoints (HfHub v0.2.0)

Inference Endpoints API for dedicated model hosting.

Provides management of HuggingFace Inference Endpoints - dedicated infrastructure for model inference with auto-scaling and GPU support.

Accelerator Options

:cpu - CPU-based inference
:gpu - GPU-based inference

Instance Sizes

:x1 - 1x resources
:x2 - 2x resources
:x4 - 4x resources
:x8 - 8x resources

Cloud Vendors

:aws - Amazon Web Services
:azure - Microsoft Azure
:gcp - Google Cloud Platform

Endpoint Types

:public - Publicly accessible
:protected - Requires authentication (default)
:private - Private VPC endpoint

Examples

# List all endpoints
{:ok, endpoints} = HfHub.InferenceEndpoints.list()

# Create a GPU endpoint
{:ok, endpoint} = HfHub.InferenceEndpoints.create("my-endpoint",
  repository: "bert-base-uncased",
  accelerator: :gpu,
  instance_size: :x1,
  instance_type: "g5.xlarge",
  region: "us-east-1",
  vendor: :aws,
  task: "text-classification"
)

# Pause endpoint to save costs
{:ok, endpoint} = HfHub.InferenceEndpoints.pause("my-endpoint")

# Resume when needed
{:ok, endpoint} = HfHub.InferenceEndpoints.resume("my-endpoint")

Summary

Types

accelerator()

endpoint_type()

instance_size()

vendor()

Functions

create(name, opts)

Creates a new inference endpoint.

delete(name, opts \\ [])

Deletes an endpoint.

get(name, opts \\ [])

Gets an endpoint by name.

list(opts \\ [])

Lists all inference endpoints.

pause(name, opts \\ [])

Pauses an endpoint.

resume(name, opts \\ [])

Resumes a paused endpoint.

scale_to_zero(name, opts \\ [])

Scales endpoint to zero replicas.

update(name, opts \\ [])

Updates an existing endpoint.

Types

accelerator()

@type accelerator() :: :cpu | :gpu

endpoint_type()

@type endpoint_type() :: :public | :protected | :private

instance_size()

@type instance_size() :: :x1 | :x2 | :x4 | :x8

vendor()

@type vendor() :: :aws | :azure | :gcp

Functions

create(name, opts)

@spec create(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Creates a new inference endpoint.

Arguments

name - Endpoint name

Required Options

:repository - Model repository ID (e.g., "bert-base-uncased")
:accelerator - :cpu or :gpu
:instance_size - :x1, :x2, :x4, or :x8
:instance_type - Instance type (e.g., "g5.xlarge")
:region - Cloud region (e.g., "us-east-1")
:vendor - Cloud vendor: :aws, :azure, or :gcp

Optional

:framework - "pytorch", "tensorflow", etc. (default: "pytorch")
:task - ML task (e.g., "text-classification")
:namespace - Organization namespace (default: current user)
:min_replica - Minimum replicas (default: 0)
:max_replica - Maximum replicas (default: 1)
:scale_to_zero_timeout - Seconds before scaling to zero
:type - :public, :protected, or :private (default: :protected)
:custom_image - Custom Docker image configuration
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.create("my-endpoint",
  repository: "bert-base-uncased",
  accelerator: :gpu,
  instance_size: :x1,
  instance_type: "g5.xlarge",
  region: "us-east-1",
  vendor: :aws,
  task: "text-classification"
)

{:ok, endpoint} = HfHub.InferenceEndpoints.create("my-endpoint",
  repository: "sentence-transformers/all-MiniLM-L6-v2",
  accelerator: :cpu,
  instance_size: :x2,
  instance_type: "c6i.xlarge",
  region: "eu-west-1",
  vendor: :aws,
  min_replica: 1,
  max_replica: 4,
  scale_to_zero_timeout: 300
)

delete(name, opts \\ [])

@spec delete(
  String.t(),
  keyword()
) :: :ok | {:error, term()}

Deletes an endpoint.

Warning: This is destructive and cannot be undone.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace
:token - Authentication token

Examples

:ok = HfHub.InferenceEndpoints.delete("my-endpoint")

get(name, opts \\ [])

@spec get(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Gets an endpoint by name.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace (default: current user)
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.get("my-endpoint")
{:ok, endpoint} = HfHub.InferenceEndpoints.get("my-endpoint", namespace: "my-org")

list(opts \\ [])

@spec list(keyword()) ::
  {:ok, [HfHub.InferenceEndpoints.Endpoint.t()]} | {:error, term()}

Lists all inference endpoints.

Options

:namespace - Organization namespace (default: current user)
:token - Authentication token

Examples

{:ok, endpoints} = HfHub.InferenceEndpoints.list()
{:ok, endpoints} = HfHub.InferenceEndpoints.list(namespace: "my-org")

pause(name, opts \\ [])

@spec pause(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Pauses an endpoint.

Paused endpoints don't incur compute costs but retain configuration. They must be resumed before they can serve requests.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.pause("my-endpoint")

resume(name, opts \\ [])

@spec resume(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Resumes a paused endpoint.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.resume("my-endpoint")

scale_to_zero(name, opts \\ [])

@spec scale_to_zero(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Scales endpoint to zero replicas.

Different from pause: the endpoint can auto-wake on incoming requests, while a paused endpoint must be explicitly resumed.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.scale_to_zero("my-endpoint")

update(name, opts \\ [])

@spec update(
  String.t(),
  keyword()
) :: {:ok, HfHub.InferenceEndpoints.Endpoint.t()} | {:error, term()}

Updates an existing endpoint.

Only provided options are updated; others remain unchanged.

Arguments

name - Endpoint name

Options

:namespace - Organization namespace
:accelerator - :cpu or :gpu
:instance_size - :x1, :x2, :x4, or :x8
:instance_type - Instance type
:min_replica - Minimum replicas
:max_replica - Maximum replicas
:scale_to_zero_timeout - Seconds before scaling to zero
:repository - Model repository ID
:framework - Framework ("pytorch", "tensorflow", etc.)
:revision - Model revision
:task - ML task
:token - Authentication token

Examples

{:ok, endpoint} = HfHub.InferenceEndpoints.update("my-endpoint",
  instance_size: :x2,
  max_replica: 4
)

{:ok, endpoint} = HfHub.InferenceEndpoints.update("my-endpoint",
  repository: "bert-large-uncased"
)