Vllm.Platforms.Platform (VLLM v0.3.0)

Copy Markdown View Source

Wrapper for Python class Platform.

Summary

Functions

Checks if the platform allows inplace memory updates

Check and update the configuration for the current platform.

Check if the dtype is supported by the current platform.

Check max_model_len for the current platform.

Python method Platform.device_id_to_physical_device_id.

Returns the preferred FP8 type on the current platform.

Get the custom compile backend for current platform.

Determine the CPU architecture of the current system.

Return the memory usage in bytes.

Stateless version of [torch.cuda.get_device_capability][].

Get device specific communicator class for distributed communication.

Get the name of a device.

Get the total memory of a device in bytes.

Get the uuid of a device, e.g. the PCI bus ID.

Return the global graph pool for this platform.

Return the platform specific values for (-inf, inf)

Returns how much padding the LoRA logits need for kernels

Python method Platform.get_max_output_tokens.

Returns the nixl memory type for the current platform.

Returns a mapping from device_type to a tuple of supported

Get the pass manager class for this platform.

Return the punica wrapper for current platform.

Get static graph wrapper class for static graph.

Python method Platform.get_supported_vit_attn_backends.

Get the vision attention backend class of a device.

Test whether this platform is compatible with a device capability.

Import any platform-specific C kernels.

A device-specific wrapper of torch.inference_mode.

Python method Platform.is_cpu.

Python method Platform.is_cuda.

Stateless version of [torch.cuda.is_available][].

Test whether this platform has exactly the specified device capability.

Returns True if the device capability is any <major>.x.

Returns whether the preferred FP8 type is FNUZ on the current platform.

Python method Platform.is_out_of_tree.

Checks whether pin memory is available on the current platform.

Python method Platform.is_rocm.

Python method Platform.is_sleep_mode_available.

Python method Platform.is_tpu.

Python method Platform.is_unspecified.

Python method Platform.is_xpu.

Wrap the original weight loader to make it synced.

Initialize self. See help(type(self)) for accurate signature.

Returns True if we register attention as one giant opaque custom op

Do some pre-registration or update action for the current platform.

Set the seed of each random module.

Set some additional forward context for the current platform if needs.

Set the device for the current platform.

Returns if the hybrid kv cache is supported by the current platform.

Returns if the graph mode is supported by the current platform.

Returns whether the current platform supports FP8 types.

Returns whether the current platform supports MX types.

Whether to use allgather in LogitsProcessor to gather the logits.

Returns if custom allreduce is supported on the current platform

Returns if the current platform needs to sync weight loader.

Raises if this request is unsupported on this platform

Verify whether the current platform supports the specified model

Verify whether the quantization is supported by the current platform.

Types

t()

@opaque t()

Functions

additional_env_vars(ref)

@spec additional_env_vars(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

can_update_inplace(ref, opts \\ [])

@spec can_update_inplace(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Checks if the platform allows inplace memory updates

Returns

  • boolean()

check_and_update_config(ref, vllm_config, opts \\ [])

@spec check_and_update_config(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Check and update the configuration for the current platform.

It can raise an exception if the configuration is not compatible with the current platform, or it can update the configuration to make it compatible with the current platform.

The config is passed by reference, so it can be modified in place.

Parameters

  • vllm_config (term())

Returns

  • nil

check_if_supports_dtype(ref, dtype, opts \\ [])

@spec check_if_supports_dtype(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Check if the dtype is supported by the current platform.

Parameters

  • dtype (term())

Returns

  • term()

check_max_model_len(ref, max_model_len, opts \\ [])

@spec check_max_model_len(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Check max_model_len for the current platform.

Parameters

  • max_model_len (integer())

Returns

  • integer()

device_control_env_var(ref)

@spec device_control_env_var(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

device_id_to_physical_device_id(ref, device_id, opts \\ [])

@spec device_id_to_physical_device_id(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method Platform.device_id_to_physical_device_id.

Parameters

  • device_id (integer())

Returns

  • term()

dispatch_key(ref)

@spec dispatch_key(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

dist_backend(ref)

@spec dist_backend(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

fp8_dtype(ref, opts \\ [])

@spec fp8_dtype(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns the preferred FP8 type on the current platform.

See the documentation for is_fp8_fnuz for details.

Returns

  • term()

get_attn_backend_cls(ref, selected_backend, attn_selector_config, opts \\ [])

@spec get_attn_backend_cls(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the attention backend class of a device.

Parameters

  • selected_backend (term())
  • attn_selector_config (term())

Returns

  • String.t()

get_compile_backend(ref, opts \\ [])

@spec get_compile_backend(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the custom compile backend for current platform.

Returns

  • String.t()

get_cpu_architecture(ref, opts \\ [])

@spec get_cpu_architecture(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Determine the CPU architecture of the current system.

Returns CpuArchEnum indicating the architecture type.

Returns

  • term()

get_current_memory_usage(ref, args, opts \\ [])

@spec get_current_memory_usage(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, float()} | {:error, Snakepit.Error.t()}

Return the memory usage in bytes.

Parameters

  • device (((term() | String.t()) | integer()) | nil default: None)

Returns

  • float()

get_device_capability(ref, args, opts \\ [])

@spec get_device_capability(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Stateless version of [torch.cuda.get_device_capability][].

Parameters

  • device_id (integer() default: 0)

Returns

  • term()

get_device_communicator_cls(ref, opts \\ [])

@spec get_device_communicator_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get device specific communicator class for distributed communication.

Returns

  • String.t()

get_device_name(ref, args, opts \\ [])

@spec get_device_name(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the name of a device.

Parameters

  • device_id (integer() default: 0)

Returns

  • String.t()

get_device_total_memory(ref, args, opts \\ [])

@spec get_device_total_memory(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Get the total memory of a device in bytes.

Parameters

  • device_id (integer() default: 0)

Returns

  • integer()

get_device_uuid(ref, args, opts \\ [])

@spec get_device_uuid(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the uuid of a device, e.g. the PCI bus ID.

Parameters

  • device_id (integer() default: 0)

Returns

  • String.t()

get_global_graph_pool(ref, opts \\ [])

@spec get_global_graph_pool(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Return the global graph pool for this platform.

Returns

  • term()

get_infinity_values(ref, dtype, opts \\ [])

@spec get_infinity_values(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, {float(), float()}} | {:error, Snakepit.Error.t()}

Return the platform specific values for (-inf, inf)

Parameters

  • dtype (term())

Returns

  • {float(), float()}

get_lora_vocab_padding_size(ref, opts \\ [])

@spec get_lora_vocab_padding_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, integer()} | {:error, Snakepit.Error.t()}

Returns how much padding the LoRA logits need for kernels

Returns

  • integer()

get_max_output_tokens(ref, prompt_len, opts \\ [])

@spec get_max_output_tokens(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Python method Platform.get_max_output_tokens.

Parameters

  • prompt_len (integer())

Returns

  • integer()

get_nixl_memory_type(ref, opts \\ [])

@spec get_nixl_memory_type(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns the nixl memory type for the current platform.

Returns

  • term()

get_nixl_supported_devices(ref, opts \\ [])

@spec get_nixl_supported_devices(
  SnakeBridge.Ref.t(),
  keyword()
) ::
  {:ok, %{optional(String.t()) => {String.t(), term()}}}
  | {:error, Snakepit.Error.t()}

Returns a mapping from device_type to a tuple of supported

kv_buffer_device for nixl.

Returns

  • %{optional(String.t()) => {String.t(), term()}}

get_pass_manager_cls(ref, opts \\ [])

@spec get_pass_manager_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get the pass manager class for this platform.

It will be registered as a custom pass under the current_platform.pass_key.

Returns

  • String.t()

get_punica_wrapper(ref, opts \\ [])

@spec get_punica_wrapper(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Return the punica wrapper for current platform.

Returns

  • String.t()

get_static_graph_wrapper_cls(ref, opts \\ [])

@spec get_static_graph_wrapper_cls(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Get static graph wrapper class for static graph.

Returns

  • String.t()

get_supported_vit_attn_backends(ref, opts \\ [])

@spec get_supported_vit_attn_backends(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}

Python method Platform.get_supported_vit_attn_backends.

Returns

  • list(term())

get_vit_attn_backend(ref, head_size, dtype, args, opts \\ [])

@spec get_vit_attn_backend(
  SnakeBridge.Ref.t(),
  integer(),
  term(),
  [term()],
  keyword()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Get the vision attention backend class of a device.

NOTE: ViT Attention should be checked and override in the platform-specific implementation. we should not override this in any other places, like the model_executor/models/<model_name>.py.

We check if the backend is None or not:

1. If not, check if the backend is supported by the platform.
2. If None, continue to the default selection logic.

Parameters

  • head_size (integer())
  • dtype (term())
  • backend (term() | nil default: None)

Returns

  • term()

has_device_capability(ref, capability, args, opts \\ [])

@spec has_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Test whether this platform is compatible with a device capability.

The capability argument can either be:

  • A tuple (major, minor).
  • An integer <major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])

Parameters

  • capability (term())
  • device_id (integer() default: 0)

Returns

  • boolean()

import_kernels(ref, opts \\ [])

@spec import_kernels(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Import any platform-specific C kernels.

Returns

  • nil

inference_mode(ref, opts \\ [])

@spec inference_mode(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

A device-specific wrapper of torch.inference_mode.

This wrapper is recommended because some hardware backends such as TPU do not support torch.inference_mode. In such a case, they will fall back to torch.no_grad by overriding this method.

Returns

  • term()

is_cpu(ref, opts \\ [])

@spec is_cpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_cpu.

Returns

  • boolean()

is_cuda(ref, opts \\ [])

@spec is_cuda(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_cuda.

Returns

  • boolean()

is_cuda_alike(ref, opts \\ [])

@spec is_cuda_alike(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Stateless version of [torch.cuda.is_available][].

Returns

  • boolean()

is_device_capability(ref, capability, args, opts \\ [])

@spec is_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Test whether this platform has exactly the specified device capability.

The capability argument can either be:

  • A tuple (major, minor).
  • An integer <major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])

Parameters

  • capability (term())
  • device_id (integer() default: 0)

Returns

  • boolean()

is_device_capability_family(ref, capability, args, opts \\ [])

@spec is_device_capability_family(SnakeBridge.Ref.t(), integer(), [term()], keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns True if the device capability is any <major>.x.

Mirrors CUDA 13 'family' architecture semantics (e.g. 10.x, 11.x, 12.x).

Parameters

  • capability (integer())
  • device_id (integer() default: 0)

Returns

  • boolean()

is_fp8_fnuz(ref, opts \\ [])

@spec is_fp8_fnuz(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the preferred FP8 type is FNUZ on the current platform.

There are two representations of FP8, OCP FP8 and FNUZ FP8. The OCP specification can be found at https://tinyurl.com/b7jvwpft. The FNUZ specification can be found at https://tinyurl.com/5n6hwwu5.

AMD's MI300 and MI325 have native hardware support for FNUZ. All other hardware has converged on the OCP FP8 standard.

Returns

  • boolean()

is_out_of_tree(ref, opts \\ [])

@spec is_out_of_tree(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_out_of_tree.

Returns

  • boolean()

is_pin_memory_available(ref, opts \\ [])

@spec is_pin_memory_available(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Checks whether pin memory is available on the current platform.

Returns

  • boolean()

is_rocm(ref, opts \\ [])

@spec is_rocm(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_rocm.

Returns

  • boolean()

is_sleep_mode_available(ref, opts \\ [])

@spec is_sleep_mode_available(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_sleep_mode_available.

Returns

  • boolean()

is_tpu(ref, opts \\ [])

@spec is_tpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_tpu.

Returns

  • boolean()

is_unspecified(ref, opts \\ [])

@spec is_unspecified(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_unspecified.

Returns

  • boolean()

is_xpu(ref, opts \\ [])

@spec is_xpu(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method Platform.is_xpu.

Returns

  • boolean()

make_synced_weight_loader(ref, original_weight_loader, opts \\ [])

@spec make_synced_weight_loader(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Wrap the original weight loader to make it synced.

Parameters

  • original_weight_loader (term())

Returns

  • term()

new(args, opts \\ [])

@spec new(
  [term()],
  keyword()
) :: {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Initialize self. See help(type(self)) for accurate signature.

Parameters

  • args (term())
  • kwargs (term())

opaque_attention_op(ref, opts \\ [])

@spec opaque_attention_op(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns True if we register attention as one giant opaque custom op

on the current platform

Returns

  • boolean()

pass_key(ref)

@spec pass_key(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

pre_register_and_update(ref, args, opts \\ [])

@spec pre_register_and_update(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Do some pre-registration or update action for the current platform.

This function is called before global VllmConfig is initialized or cli arguments are parsed. It's used for out-of-tree platforms to register or update the configuration.

For example, the out-of-tree quantization config can be imported and registered here dynamically.

Parameters

  • parser (term() default: None)

Returns

  • nil

ray_device_key(ref)

@spec ray_device_key(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

seed_everything(ref, args, opts \\ [])

@spec seed_everything(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set the seed of each random module.

torch.manual_seed will set seed on all devices.

Loosely based on: https://github.com/Lightning-AI/pytorch-lightning/blob/2.4.0/src/lightning/fabric/utilities/seed.py#L20

Parameters

  • seed (term() default: None)

Returns

  • nil

set_additional_forward_context(ref, args, opts \\ [])

@spec set_additional_forward_context(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, %{optional(String.t()) => term()}} | {:error, Snakepit.Error.t()}

Set some additional forward context for the current platform if needs.

Parameters

  • args (term())
  • kwargs (term())

Returns

  • %{optional(String.t()) => term()}

set_device(ref, device, opts \\ [])

@spec set_device(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set the device for the current platform.

Parameters

  • device (term())

Returns

  • nil

simple_compile_backend(ref)

@spec simple_compile_backend(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

stateless_init_device_torch_dist_pg(ref, backend, prefix_store, group_rank, group_size, timeout, opts \\ [])

@spec stateless_init_device_torch_dist_pg(
  SnakeBridge.Ref.t(),
  String.t(),
  term(),
  integer(),
  integer(),
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Init platform-specific torch distributed process group.

Parameters

  • backend (String.t())
  • prefix_store (term())
  • group_rank (integer())
  • group_size (integer())
  • timeout (term())

Returns

  • term()

support_hybrid_kv_cache(ref, opts \\ [])

@spec support_hybrid_kv_cache(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the hybrid kv cache is supported by the current platform.

Returns

  • boolean()

support_static_graph_mode(ref, opts \\ [])

@spec support_static_graph_mode(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the graph mode is supported by the current platform.

Returns

  • boolean()

supported_dtypes(ref)

@spec supported_dtypes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

supported_quantization(ref)

@spec supported_quantization(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

supports_fp8(ref, opts \\ [])

@spec supports_fp8(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the current platform supports FP8 types.

Returns

  • boolean()

supports_mx(ref, opts \\ [])

@spec supports_mx(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns whether the current platform supports MX types.

Returns

  • boolean()

use_all_gather(ref, opts \\ [])

@spec use_all_gather(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Whether to use allgather in LogitsProcessor to gather the logits.

Returns

  • boolean()

use_custom_allreduce(ref, opts \\ [])

@spec use_custom_allreduce(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if custom allreduce is supported on the current platform

Returns

  • boolean()

use_sync_weight_loader(ref, opts \\ [])

@spec use_sync_weight_loader(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Returns if the current platform needs to sync weight loader.

Returns

  • boolean()

validate_request(ref, prompt, params, processed_inputs, opts \\ [])

@spec validate_request(SnakeBridge.Ref.t(), term(), term(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Raises if this request is unsupported on this platform

Parameters

  • prompt (term())
  • params (term())
  • processed_inputs (term())

Returns

  • nil

verify_model_arch(ref, model_arch, opts \\ [])

@spec verify_model_arch(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Verify whether the current platform supports the specified model

architecture.

  • This will raise an Error or Warning based on the model support on the current platform.
  • By default all models are considered supported.

Parameters

  • model_arch (String.t())

Returns

  • nil

verify_quantization(ref, quant, opts \\ [])

@spec verify_quantization(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Verify whether the quantization is supported by the current platform.

Parameters

  • quant (String.t())

Returns

  • nil