Wrapper for Python class Platform.
Summary
Functions
Checks if the platform allows inplace memory updates
Check and update the configuration for the current platform.
Check if the dtype is supported by the current platform.
Check max_model_len for the current platform.
Python method Platform.device_id_to_physical_device_id.
Returns the preferred FP8 type on the current platform.
Get the attention backend class of a device.
Get the custom compile backend for current platform.
Determine the CPU architecture of the current system.
Return the memory usage in bytes.
Stateless version of [torch.cuda.get_device_capability][].
Get device specific communicator class for distributed communication.
Get the name of a device.
Get the total memory of a device in bytes.
Get the uuid of a device, e.g. the PCI bus ID.
Return the global graph pool for this platform.
Return the platform specific values for (-inf, inf)
Returns how much padding the LoRA logits need for kernels
Python method Platform.get_max_output_tokens.
Returns the nixl memory type for the current platform.
Returns a mapping from device_type to a tuple of supported
Get the pass manager class for this platform.
Return the punica wrapper for current platform.
Get static graph wrapper class for static graph.
Python method Platform.get_supported_vit_attn_backends.
Get the vision attention backend class of a device.
Test whether this platform is compatible with a device capability.
Import any platform-specific C kernels.
A device-specific wrapper of torch.inference_mode.
Python method Platform.is_cpu.
Python method Platform.is_cuda.
Stateless version of [torch.cuda.is_available][].
Test whether this platform has exactly the specified device capability.
Returns True if the device capability is any <major>.x.
Returns whether the preferred FP8 type is FNUZ on the current platform.
Python method Platform.is_out_of_tree.
Checks whether pin memory is available on the current platform.
Python method Platform.is_rocm.
Python method Platform.is_sleep_mode_available.
Python method Platform.is_tpu.
Python method Platform.is_unspecified.
Python method Platform.is_xpu.
Wrap the original weight loader to make it synced.
Initialize self. See help(type(self)) for accurate signature.
Returns True if we register attention as one giant opaque custom op
Do some pre-registration or update action for the current platform.
Set the seed of each random module.
Set some additional forward context for the current platform if needs.
Set the device for the current platform.
Init platform-specific torch distributed process group.
Returns if the hybrid kv cache is supported by the current platform.
Returns if the graph mode is supported by the current platform.
Returns whether the current platform supports FP8 types.
Returns whether the current platform supports MX types.
Whether to use allgather in LogitsProcessor to gather the logits.
Returns if custom allreduce is supported on the current platform
Returns if the current platform needs to sync weight loader.
Raises if this request is unsupported on this platform
Verify whether the current platform supports the specified model
Verify whether the quantization is supported by the current platform.
Types
Functions
@spec additional_env_vars(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec can_update_inplace( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Checks if the platform allows inplace memory updates
Returns
boolean()
@spec check_and_update_config(SnakeBridge.Ref.t(), term(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Check and update the configuration for the current platform.
It can raise an exception if the configuration is not compatible with the current platform, or it can update the configuration to make it compatible with the current platform.
The config is passed by reference, so it can be modified in place.
Parameters
vllm_config(term())
Returns
nil
@spec check_if_supports_dtype(SnakeBridge.Ref.t(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Check if the dtype is supported by the current platform.
Parameters
dtype(term())
Returns
term()
@spec check_max_model_len(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Check max_model_len for the current platform.
Parameters
max_model_len(integer())
Returns
integer()
@spec device_control_env_var(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec device_id_to_physical_device_id(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method Platform.device_id_to_physical_device_id.
Parameters
device_id(integer())
Returns
term()
@spec dispatch_key(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec dist_backend(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec fp8_dtype( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Returns the preferred FP8 type on the current platform.
See the documentation for is_fp8_fnuz for details.
Returns
term()
@spec get_attn_backend_cls(SnakeBridge.Ref.t(), term(), term(), keyword()) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get the attention backend class of a device.
Parameters
selected_backend(term())attn_selector_config(term())
Returns
String.t()
@spec get_compile_backend( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get the custom compile backend for current platform.
Returns
String.t()
@spec get_cpu_architecture( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Determine the CPU architecture of the current system.
Returns CpuArchEnum indicating the architecture type.
Returns
term()
@spec get_current_memory_usage(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, float()} | {:error, Snakepit.Error.t()}
Return the memory usage in bytes.
Parameters
device(((term() | String.t()) | integer()) | nil default: None)
Returns
float()
@spec get_device_capability(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Stateless version of [torch.cuda.get_device_capability][].
Parameters
device_id(integer() default: 0)
Returns
term()
@spec get_device_communicator_cls( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get device specific communicator class for distributed communication.
Returns
String.t()
@spec get_device_name(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get the name of a device.
Parameters
device_id(integer() default: 0)
Returns
String.t()
@spec get_device_total_memory(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Get the total memory of a device in bytes.
Parameters
device_id(integer() default: 0)
Returns
integer()
@spec get_device_uuid(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get the uuid of a device, e.g. the PCI bus ID.
Parameters
device_id(integer() default: 0)
Returns
String.t()
@spec get_global_graph_pool( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Return the global graph pool for this platform.
Returns
term()
@spec get_infinity_values(SnakeBridge.Ref.t(), term(), keyword()) :: {:ok, {float(), float()}} | {:error, Snakepit.Error.t()}
Return the platform specific values for (-inf, inf)
Parameters
dtype(term())
Returns
{float(), float()}
@spec get_lora_vocab_padding_size( SnakeBridge.Ref.t(), keyword() ) :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Returns how much padding the LoRA logits need for kernels
Returns
integer()
@spec get_max_output_tokens(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Python method Platform.get_max_output_tokens.
Parameters
prompt_len(integer())
Returns
integer()
@spec get_nixl_memory_type( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Returns the nixl memory type for the current platform.
Returns
term()
@spec get_nixl_supported_devices( SnakeBridge.Ref.t(), keyword() ) :: {:ok, %{optional(String.t()) => {String.t(), term()}}} | {:error, Snakepit.Error.t()}
Returns a mapping from device_type to a tuple of supported
kv_buffer_device for nixl.
Returns
%{optional(String.t()) => {String.t(), term()}}
@spec get_pass_manager_cls( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get the pass manager class for this platform.
It will be registered as a custom pass under the current_platform.pass_key.
Returns
String.t()
@spec get_punica_wrapper( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Return the punica wrapper for current platform.
Returns
String.t()
@spec get_static_graph_wrapper_cls( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
Get static graph wrapper class for static graph.
Returns
String.t()
@spec get_supported_vit_attn_backends( SnakeBridge.Ref.t(), keyword() ) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}
Python method Platform.get_supported_vit_attn_backends.
Returns
list(term())
@spec get_vit_attn_backend( SnakeBridge.Ref.t(), integer(), term(), [term()], keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Get the vision attention backend class of a device.
NOTE: ViT Attention should be checked and override in the platform-specific implementation. we should not override this in any other places, like the model_executor/models/<model_name>.py.
We check if the backend is None or not:
1. If not, check if the backend is supported by the platform.
2. If None, continue to the default selection logic.Parameters
head_size(integer())dtype(term())backend(term() | nil default: None)
Returns
term()
@spec has_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Test whether this platform is compatible with a device capability.
The capability argument can either be:
- A tuple
(major, minor). - An integer
<major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])
Parameters
capability(term())device_id(integer() default: 0)
Returns
boolean()
@spec import_kernels( SnakeBridge.Ref.t(), keyword() ) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Import any platform-specific C kernels.
Returns
nil
@spec inference_mode( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
A device-specific wrapper of torch.inference_mode.
This wrapper is recommended because some hardware backends such as TPU
do not support torch.inference_mode. In such a case, they will fall
back to torch.no_grad by overriding this method.
Returns
term()
@spec is_cpu( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_cpu.
Returns
boolean()
@spec is_cuda( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_cuda.
Returns
boolean()
@spec is_cuda_alike( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Stateless version of [torch.cuda.is_available][].
Returns
boolean()
@spec is_device_capability(SnakeBridge.Ref.t(), term(), [term()], keyword()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Test whether this platform has exactly the specified device capability.
The capability argument can either be:
- A tuple
(major, minor). - An integer
<major><minor>. (See [DeviceCapability.to_int][vllm.platforms.interface.DeviceCapability.to_int])
Parameters
capability(term())device_id(integer() default: 0)
Returns
boolean()
@spec is_device_capability_family(SnakeBridge.Ref.t(), integer(), [term()], keyword()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns True if the device capability is any <major>.x.
Mirrors CUDA 13 'family' architecture semantics (e.g. 10.x, 11.x, 12.x).
Parameters
capability(integer())device_id(integer() default: 0)
Returns
boolean()
@spec is_fp8_fnuz( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns whether the preferred FP8 type is FNUZ on the current platform.
There are two representations of FP8, OCP FP8 and FNUZ FP8. The OCP specification can be found at https://tinyurl.com/b7jvwpft. The FNUZ specification can be found at https://tinyurl.com/5n6hwwu5.
AMD's MI300 and MI325 have native hardware support for FNUZ. All other hardware has converged on the OCP FP8 standard.
Returns
boolean()
@spec is_out_of_tree( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_out_of_tree.
Returns
boolean()
@spec is_pin_memory_available( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Checks whether pin memory is available on the current platform.
Returns
boolean()
@spec is_rocm( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_rocm.
Returns
boolean()
@spec is_sleep_mode_available( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_sleep_mode_available.
Returns
boolean()
@spec is_tpu( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_tpu.
Returns
boolean()
@spec is_unspecified( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_unspecified.
Returns
boolean()
@spec is_xpu( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python method Platform.is_xpu.
Returns
boolean()
@spec make_synced_weight_loader(SnakeBridge.Ref.t(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Wrap the original weight loader to make it synced.
Parameters
original_weight_loader(term())
Returns
term()
@spec new( [term()], keyword() ) :: {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}
Initialize self. See help(type(self)) for accurate signature.
Parameters
args(term())kwargs(term())
@spec opaque_attention_op( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns True if we register attention as one giant opaque custom op
on the current platform
Returns
boolean()
@spec pass_key(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec pre_register_and_update(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Do some pre-registration or update action for the current platform.
This function is called before global VllmConfig is initialized or cli arguments are parsed. It's used for out-of-tree platforms to register or update the configuration.
For example, the out-of-tree quantization config can be imported and registered here dynamically.
Parameters
parser(term() default: None)
Returns
nil
@spec ray_device_key(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec seed_everything(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Set the seed of each random module.
torch.manual_seed will set seed on all devices.
Loosely based on: https://github.com/Lightning-AI/pytorch-lightning/blob/2.4.0/src/lightning/fabric/utilities/seed.py#L20
Parameters
seed(term() default: None)
Returns
nil
@spec set_additional_forward_context(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, %{optional(String.t()) => term()}} | {:error, Snakepit.Error.t()}
Set some additional forward context for the current platform if needs.
Parameters
args(term())kwargs(term())
Returns
%{optional(String.t()) => term()}
@spec set_device(SnakeBridge.Ref.t(), term(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Set the device for the current platform.
Parameters
device(term())
Returns
nil
@spec simple_compile_backend(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec stateless_init_device_torch_dist_pg( SnakeBridge.Ref.t(), String.t(), term(), integer(), integer(), term(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Init platform-specific torch distributed process group.
Parameters
backend(String.t())prefix_store(term())group_rank(integer())group_size(integer())timeout(term())
Returns
term()
@spec support_hybrid_kv_cache( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns if the hybrid kv cache is supported by the current platform.
Returns
boolean()
@spec support_static_graph_mode( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns if the graph mode is supported by the current platform.
Returns
boolean()
@spec supported_dtypes(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec supported_quantization(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec supports_fp8( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns whether the current platform supports FP8 types.
Returns
boolean()
@spec supports_mx( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns whether the current platform supports MX types.
Returns
boolean()
@spec use_all_gather( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Whether to use allgather in LogitsProcessor to gather the logits.
Returns
boolean()
@spec use_custom_allreduce( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns if custom allreduce is supported on the current platform
Returns
boolean()
@spec use_sync_weight_loader( SnakeBridge.Ref.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Returns if the current platform needs to sync weight loader.
Returns
boolean()
@spec validate_request(SnakeBridge.Ref.t(), term(), term(), term(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Raises if this request is unsupported on this platform
Parameters
prompt(term())params(term())processed_inputs(term())
Returns
nil
@spec verify_model_arch(SnakeBridge.Ref.t(), String.t(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Verify whether the current platform supports the specified model
architecture.
- This will raise an Error or Warning based on the model support on the current platform.
- By default all models are considered supported.
Parameters
model_arch(String.t())
Returns
nil
@spec verify_quantization(SnakeBridge.Ref.t(), String.t(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Verify whether the quantization is supported by the current platform.
Parameters
quant(String.t())
Returns
nil