Vllm.Platforms.Platform (VLLM v0.3.0)

Wrapper for Python class Platform.

Summary

Types

t()

Functions

additional_env_vars(ref)

can_update_inplace(ref, opts \\ [])

Checks if the platform allows inplace memory updates

check_and_update_config(ref, vllm_config, opts \\ [])

Check and update the configuration for the current platform.

check_if_supports_dtype(ref, dtype, opts \\ [])

Check if the dtype is supported by the current platform.

check_max_model_len(ref, max_model_len, opts \\ [])

Check max_model_len for the current platform.

device_control_env_var(ref)

device_id_to_physical_device_id(ref, device_id, opts \\ [])

Python method Platform.device_id_to_physical_device_id.

dispatch_key(ref)

dist_backend(ref)

fp8_dtype(ref, opts \\ [])

Returns the preferred FP8 type on the current platform.

get_attn_backend_cls(ref, selected_backend, attn_selector_config, opts \\ [])

Get the attention backend class of a device.

get_compile_backend(ref, opts \\ [])

Get the custom compile backend for current platform.

get_cpu_architecture(ref, opts \\ [])

Determine the CPU architecture of the current system.

get_current_memory_usage(ref, args, opts \\ [])

Return the memory usage in bytes.

get_device_capability(ref, args, opts \\ [])

Stateless version of [torch.cuda.get_device_capability][].

get_device_communicator_cls(ref, opts \\ [])

Get device specific communicator class for distributed communication.

get_device_name(ref, args, opts \\ [])

Get the name of a device.

get_device_total_memory(ref, args, opts \\ [])

Get the total memory of a device in bytes.

get_device_uuid(ref, args, opts \\ [])

Get the uuid of a device, e.g. the PCI bus ID.

get_global_graph_pool(ref, opts \\ [])

Return the global graph pool for this platform.

get_infinity_values(ref, dtype, opts \\ [])

Return the platform specific values for (-inf, inf)

get_lora_vocab_padding_size(ref, opts \\ [])

Returns how much padding the LoRA logits need for kernels

get_max_output_tokens(ref, prompt_len, opts \\ [])

Python method Platform.get_max_output_tokens.

get_nixl_memory_type(ref, opts \\ [])

Returns the nixl memory type for the current platform.

get_nixl_supported_devices(ref, opts \\ [])

Returns a mapping from device_type to a tuple of supported

get_pass_manager_cls(ref, opts \\ [])

Get the pass manager class for this platform.

get_punica_wrapper(ref, opts \\ [])

Return the punica wrapper for current platform.

get_static_graph_wrapper_cls(ref, opts \\ [])

Get static graph wrapper class for static graph.

get_supported_vit_attn_backends(ref, opts \\ [])

Python method Platform.get_supported_vit_attn_backends.

get_vit_attn_backend(ref, head_size, dtype, args, opts \\ [])

Get the vision attention backend class of a device.

has_device_capability(ref, capability, args, opts \\ [])

Test whether this platform is compatible with a device capability.

import_kernels(ref, opts \\ [])

Import any platform-specific C kernels.

inference_mode(ref, opts \\ [])

A device-specific wrapper of torch.inference_mode.

is_cpu(ref, opts \\ [])

Python method Platform.is_cpu.

is_cuda(ref, opts \\ [])

Python method Platform.is_cuda.

is_cuda_alike(ref, opts \\ [])

Stateless version of [torch.cuda.is_available][].

is_device_capability(ref, capability, args, opts \\ [])

Test whether this platform has exactly the specified device capability.

is_device_capability_family(ref, capability, args, opts \\ [])

Returns True if the device capability is any <major>.x.

is_fp8_fnuz(ref, opts \\ [])

Returns whether the preferred FP8 type is FNUZ on the current platform.

is_out_of_tree(ref, opts \\ [])

Python method Platform.is_out_of_tree.

is_pin_memory_available(ref, opts \\ [])

Checks whether pin memory is available on the current platform.

is_rocm(ref, opts \\ [])

Python method Platform.is_rocm.

is_sleep_mode_available(ref, opts \\ [])

Python method Platform.is_sleep_mode_available.

is_tpu(ref, opts \\ [])

Python method Platform.is_tpu.

is_unspecified(ref, opts \\ [])

Python method Platform.is_unspecified.

is_xpu(ref, opts \\ [])

Python method Platform.is_xpu.

make_synced_weight_loader(ref, original_weight_loader, opts \\ [])

Wrap the original weight loader to make it synced.

new(args, opts \\ [])

Initialize self. See help(type(self)) for accurate signature.

opaque_attention_op(ref, opts \\ [])

Returns True if we register attention as one giant opaque custom op

pass_key(ref)

pre_register_and_update(ref, args, opts \\ [])

Do some pre-registration or update action for the current platform.

ray_device_key(ref)

seed_everything(ref, args, opts \\ [])

Set the seed of each random module.

set_additional_forward_context(ref, args, opts \\ [])

Set some additional forward context for the current platform if needs.

set_device(ref, device, opts \\ [])

Set the device for the current platform.

simple_compile_backend(ref)

stateless_init_device_torch_dist_pg(ref, backend, prefix_store, group_rank, group_size, timeout, opts \\ [])

Init platform-specific torch distributed process group.

support_hybrid_kv_cache(ref, opts \\ [])

Returns if the hybrid kv cache is supported by the current platform.

support_static_graph_mode(ref, opts \\ [])

Returns if the graph mode is supported by the current platform.

supported_dtypes(ref)

supported_quantization(ref)

supports_fp8(ref, opts \\ [])

Returns whether the current platform supports FP8 types.

supports_mx(ref, opts \\ [])

Returns whether the current platform supports MX types.

use_all_gather(ref, opts \\ [])

Whether to use allgather in LogitsProcessor to gather the logits.

use_custom_allreduce(ref, opts \\ [])

Returns if custom allreduce is supported on the current platform

use_sync_weight_loader(ref, opts \\ [])

Returns if the current platform needs to sync weight loader.

validate_request(ref, prompt, params, processed_inputs, opts \\ [])

Raises if this request is unsupported on this platform

verify_model_arch(ref, model_arch, opts \\ [])

Verify whether the current platform supports the specified model

verify_quantization(ref, quant, opts \\ [])

Verify whether the quantization is supported by the current platform.

Types

t()

@opaque t()

Functions

additional_env_vars(ref)

@spec additional_env_vars(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

can_update_inplace(ref, opts \\ [])

@spec can_update_inplace(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Checks if the platform allows inplace memory updates

Returns

boolean()

check_and_update_config(ref, vllm_config, opts \\ [])

@spec check_and_update_config(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Check and update the configuration for the current platform.

It can raise an exception if the configuration is not compatible with the current platform, or it can update the configuration to make it compatible with the current platform.

The config is passed by reference, so it can be modified in place.

Parameters

vllm_config (term())

Returns

nil

check_if_supports_dtype(ref, dtype, opts \\ [])

@spec check_if_supports_dtype(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Check if the dtype is supported by the current platform.

Parameters

dtype (term())

Returns

term()

check_max_model_len(ref, max_model_len, opts \\ [])

@spec check_max_model_len(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Check max_model_len for the current platform.

Parameters

max_model_len (integer())

Returns

integer()

device_control_env_var(ref)

@spec device_control_env_var(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

device_id_to_physical_device_id(ref, device_id, opts \\ [])

@spec device_id_to_physical_device_id(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method Platform.device_id_to_physical_device_id.

Parameters

device_id (integer())

Returns

term()

dispatch_key(ref)

@spec dispatch_key(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

dist_backend(ref)

@spec dist_backend(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

fp8_dtype(ref, opts \\ [])

@spec fp8_dtype(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns the preferred FP8 type on the current platform.

See the documentation for is_fp8_fnuz for details.

Returns

term()

get_attn_backend_cls(ref, selected_backend, attn_selector_config, opts \\ [])

@spec get_attn_backend_cls(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the attention backend class of a device.

Parameters

selected_backend (term())
attn_selector_config (term())

Returns

String.t()

get_compile_backend(ref, opts \\ [])

@spec get_compile_backend(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the custom compile backend for current platform.

Returns

String.t()

get_cpu_architecture(ref, opts \\ [])

@spec get_cpu_architecture(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Determine the CPU architecture of the current system.

Returns CpuArchEnum indicating the architecture type.

Returns

term()

get_current_memory_usage(ref, args, opts \\ [])

@spec get_current_memory_usage(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, float()} | {:error, Snakepit.Error.t()}

Return the memory usage in bytes.

Parameters

device (((term() | String.t()) | integer()) | nil default: None)

Returns

float()

get_device_capability(ref, args, opts \\ [])

@spec get_device_capability(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Stateless version of [torch.cuda.get_device_capability][].

Parameters

device_id (integer() default: 0)

Returns

term()

get_device_communicator_cls(ref, opts \\ [])

@spec get_device_communicator_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get device specific communicator class for distributed communication.

Returns

String.t()

get_device_name(ref, args, opts \\ [])

@spec get_device_name(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the name of a device.

Parameters

device_id (integer() default: 0)

Returns

String.t()

get_device_total_memory(ref, args, opts \\ [])

@spec get_device_total_memory(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Get the total memory of a device in bytes.

Parameters

device_id (integer() default: 0)

Returns

integer()

get_device_uuid(ref, args, opts \\ [])

@spec get_device_uuid(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the uuid of a device, e.g. the PCI bus ID.

Parameters

device_id (integer() default: 0)

Returns

String.t()

get_global_graph_pool(ref, opts \\ [])

@spec get_global_graph_pool(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Return the global graph pool for this platform.

Returns

term()

get_infinity_values(ref, dtype, opts \\ [])

@spec get_infinity_values(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, {float(), float()}} | {:error, Snakepit.Error.t()}

Return the platform specific values for (-inf, inf)

Parameters

dtype (term())

Returns

{float(), float()}

get_lora_vocab_padding_size(ref, opts \\ [])

@spec get_lora_vocab_padding_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, integer()} | {:error, Snakepit.Error.t()}

Returns how much padding the LoRA logits need for kernels

Returns

integer()

get_max_output_tokens(ref, prompt_len, opts \\ [])

@spec get_max_output_tokens(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Python method Platform.get_max_output_tokens.

Parameters

prompt_len (integer())

Returns

integer()

get_nixl_memory_type(ref, opts \\ [])

@spec get_nixl_memory_type(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns the nixl memory type for the current platform.

Returns

term()

get_nixl_supported_devices(ref, opts \\ [])

@spec get_nixl_supported_devices(
  SnakeBridge.Ref.t(),
  keyword()
) ::
  {:ok, %{optional(String.t()) => {String.t(), term()}}}
  | {:error, Snakepit.Error.t()}

Returns a mapping from device_type to a tuple of supported

kv_buffer_device for nixl.

Returns

%{optional(String.t()) => {String.t(), term()}}

get_pass_manager_cls(ref, opts \\ [])

@spec get_pass_manager_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the pass manager class for this platform.

It will be registered as a custom pass under the current_platform.pass_key.

Returns

String.t()

get_punica_wrapper(ref, opts \\ [])

@spec get_punica_wrapper(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Return the punica wrapper for current platform.

Returns

String.t()

get_static_graph_wrapper_cls(ref, opts \\ [])

@spec get_static_graph_wrapper_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get static graph wrapper class for static graph.

Returns

String.t()

get_supported_vit_attn_backends(ref, opts \\ [])

@spec get_supported_vit_attn_backends(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}

Python method Platform.get_supported_vit_attn_backends.

Returns

list(term())

get_vit_attn_backend(ref, head_size, dtype, args, opts \\ [])

@spec get_vit_attn_backend(
  SnakeBridge.Ref.t(),
  integer(),
  term(),
  [term()],
  keyword()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Get the vision attention backend class of a device.

NOTE: ViT Attention should be checked and override in the platform-specific implementation. we should not override this in any other places, like the model_executor/models/<model_name>.py.

We check if the backend is None or not:

1. If not, check if the backend is supported by the platform.
2. If None, continue to the default selection logic.

Parameters

head_size (integer())
dtype (term())
backend (term() | nil default: None)

Returns

term()

has_device_capability(ref, capability, args, opts \\ [])

@spec has_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Test whether this platform is compatible with a device capability.

The capability argument can either be:

A tuple (major, minor).
An integer <major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])

Parameters

capability (term())
device_id (integer() default: 0)

Returns

boolean()

import_kernels(ref, opts \\ [])

@spec import_kernels(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Import any platform-specific C kernels.

Returns

nil

inference_mode(ref, opts \\ [])

@spec inference_mode(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

A device-specific wrapper of torch.inference_mode.

This wrapper is recommended because some hardware backends such as TPU do not support torch.inference_mode. In such a case, they will fall back to torch.no_grad by overriding this method.

Returns

term()

is_cpu(ref, opts \\ [])

@spec is_cpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_cpu.

Returns

boolean()

is_cuda(ref, opts \\ [])

@spec is_cuda(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_cuda.

Returns

boolean()

is_cuda_alike(ref, opts \\ [])

@spec is_cuda_alike(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Stateless version of [torch.cuda.is_available][].

Returns

boolean()

is_device_capability(ref, capability, args, opts \\ [])

@spec is_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Test whether this platform has exactly the specified device capability.

The capability argument can either be:

A tuple (major, minor).
An integer <major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])

Parameters

capability (term())
device_id (integer() default: 0)

Returns

boolean()

is_device_capability_family(ref, capability, args, opts \\ [])

@spec is_device_capability_family(SnakeBridge.Ref.t(), integer(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns True if the device capability is any <major>.x.

Mirrors CUDA 13 'family' architecture semantics (e.g. 10.x, 11.x, 12.x).

Parameters

capability (integer())
device_id (integer() default: 0)

Returns

boolean()

is_fp8_fnuz(ref, opts \\ [])

@spec is_fp8_fnuz(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the preferred FP8 type is FNUZ on the current platform.

There are two representations of FP8, OCP FP8 and FNUZ FP8. The OCP specification can be found at https://tinyurl.com/b7jvwpft. The FNUZ specification can be found at https://tinyurl.com/5n6hwwu5.

AMD's MI300 and MI325 have native hardware support for FNUZ. All other hardware has converged on the OCP FP8 standard.

Returns

boolean()

is_out_of_tree(ref, opts \\ [])

@spec is_out_of_tree(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_out_of_tree.

Returns

boolean()

is_pin_memory_available(ref, opts \\ [])

@spec is_pin_memory_available(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Checks whether pin memory is available on the current platform.

Returns

boolean()

is_rocm(ref, opts \\ [])

@spec is_rocm(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_rocm.

Returns

boolean()

is_sleep_mode_available(ref, opts \\ [])

@spec is_sleep_mode_available(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_sleep_mode_available.

Returns

boolean()

is_tpu(ref, opts \\ [])

@spec is_tpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_tpu.

Returns

boolean()

is_unspecified(ref, opts \\ [])

@spec is_unspecified(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_unspecified.

Returns

boolean()

is_xpu(ref, opts \\ [])

@spec is_xpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_xpu.

Returns

boolean()

make_synced_weight_loader(ref, original_weight_loader, opts \\ [])

@spec make_synced_weight_loader(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Wrap the original weight loader to make it synced.

Parameters

original_weight_loader (term())

Returns

term()

new(args, opts \\ [])

@spec new(
  [term()],
  keyword()
) :: {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Initialize self. See help(type(self)) for accurate signature.

Parameters

args (term())
kwargs (term())

opaque_attention_op(ref, opts \\ [])

@spec opaque_attention_op(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns True if we register attention as one giant opaque custom op

on the current platform

Returns

boolean()

pass_key(ref)

@spec pass_key(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

pre_register_and_update(ref, args, opts \\ [])

@spec pre_register_and_update(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Do some pre-registration or update action for the current platform.

This function is called before global VllmConfig is initialized or cli arguments are parsed. It's used for out-of-tree platforms to register or update the configuration.

For example, the out-of-tree quantization config can be imported and registered here dynamically.

Parameters

parser (term() default: None)

Returns

nil

ray_device_key(ref)

@spec ray_device_key(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

seed_everything(ref, args, opts \\ [])

@spec seed_everything(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set the seed of each random module.

torch.manual_seed will set seed on all devices.

Loosely based on: https://github.com/Lightning-AI/pytorch-lightning/blob/2.4.0/src/lightning/fabric/utilities/seed.py#L20

Parameters

seed (term() default: None)

Returns

nil

set_additional_forward_context(ref, args, opts \\ [])

@spec set_additional_forward_context(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, %{optional(String.t()) => term()}} | {:error, Snakepit.Error.t()}

Set some additional forward context for the current platform if needs.

Parameters

args (term())
kwargs (term())

Returns

%{optional(String.t()) => term()}

set_device(ref, device, opts \\ [])

@spec set_device(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set the device for the current platform.

Parameters

device (term())

Returns

nil

simple_compile_backend(ref)

@spec simple_compile_backend(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

stateless_init_device_torch_dist_pg(ref, backend, prefix_store, group_rank, group_size, timeout, opts \\ [])

@spec stateless_init_device_torch_dist_pg(
  SnakeBridge.Ref.t(),
  String.t(),
  term(),
  integer(),
  integer(),
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Init platform-specific torch distributed process group.

Parameters

backend (String.t())
prefix_store (term())
group_rank (integer())
group_size (integer())
timeout (term())

Returns

term()

support_hybrid_kv_cache(ref, opts \\ [])

@spec support_hybrid_kv_cache(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the hybrid kv cache is supported by the current platform.

Returns

boolean()

support_static_graph_mode(ref, opts \\ [])

@spec support_static_graph_mode(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the graph mode is supported by the current platform.

Returns

boolean()

supported_dtypes(ref)

@spec supported_dtypes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

supported_quantization(ref)

@spec supported_quantization(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

supports_fp8(ref, opts \\ [])

@spec supports_fp8(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the current platform supports FP8 types.

Returns

boolean()

supports_mx(ref, opts \\ [])

@spec supports_mx(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the current platform supports MX types.

Returns

boolean()

use_all_gather(ref, opts \\ [])

@spec use_all_gather(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Whether to use allgather in LogitsProcessor to gather the logits.

Returns

boolean()

use_custom_allreduce(ref, opts \\ [])

@spec use_custom_allreduce(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if custom allreduce is supported on the current platform

Returns

boolean()

use_sync_weight_loader(ref, opts \\ [])

@spec use_sync_weight_loader(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the current platform needs to sync weight loader.

Returns

boolean()

validate_request(ref, prompt, params, processed_inputs, opts \\ [])

@spec validate_request(SnakeBridge.Ref.t(), term(), term(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Raises if this request is unsupported on this platform

Parameters

prompt (term())
params (term())
processed_inputs (term())

Returns

nil

verify_model_arch(ref, model_arch, opts \\ [])

@spec verify_model_arch(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Verify whether the current platform supports the specified model

architecture.

This will raise an Error or Warning based on the model support on the current platform.
By default all models are considered supported.

Parameters

model_arch (String.t())

Returns

nil

verify_quantization(ref, quant, opts \\ [])

@spec verify_quantization(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Verify whether the quantization is supported by the current platform.

Parameters

quant (String.t())

Returns

nil