Dataclass which contains all vllm-related configuration. This
simplifies passing around the distinct configurations in the codebase.
Summary
Functions
Apply optimization level defaults using self as root.
Get the quantization config.
Update KVTransferConfig based on top-level configs in VllmConfig.
Set the compile ranges for the compilation config.
Set config attribute to default if not already set by user.
vLLM defines the default candidate list of batch sizes for CUDA graph
Returns a rank-aware path for dumping
WARNING: Whenever a new field is added to this config,
Set up function tracing for the current thread,
Python method VllmConfig.get_quantization_config.
Constructs VllmConfig.
Python method VllmConfig.pad_for_cudagraph.
Python method VllmConfig.try_verify_and_update_config.
Python method VllmConfig.update_sizes_for_sequence_parallelism.
Python method VllmConfig.validate_mamba_block_size.
Python method VllmConfig.with_hf_config.
Types
Functions
@spec _apply_optimization_level_defaults( SnakeBridge.Ref.t(), %{optional(String.t()) => term()}, keyword() ) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Apply optimization level defaults using self as root.
Recursively applies values from defaults into nested config objects. Only fields present in defaults are overwritten.
If the user configuration does not specify a value for a default field and if the default field is still None after all user selections are applied, then default values will be applied to the field. User speciied fields will not be overridden by the default.
Parameters
defaults- Dictionary of default values to apply.
Returns
nil
@spec _get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Get the quantization config.
Parameters
model_config(term())load_config(term())
Returns
term()
@spec _post_init_kv_transfer_config( SnakeBridge.Ref.t(), keyword() ) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Update KVTransferConfig based on top-level configs in VllmConfig.
Right now, this function reads the offloading settings from CacheConfig and configures the KVTransferConfig accordingly.
Returns
nil
@spec _set_compile_ranges( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Set the compile ranges for the compilation config.
Returns
term()
@spec _set_config_default(SnakeBridge.Ref.t(), term(), String.t(), term(), keyword()) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Set config attribute to default if not already set by user.
Parameters
config_obj- Configuration object to update.key- Attribute name.value- Default value (static or callable).
Returns
nil
@spec _set_cudagraph_sizes( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
vLLM defines the default candidate list of batch sizes for CUDA graph
capture as:
max_graph_size = min(max_num_seqs * 2, 512)
# 1, 2, 4, then multiples of 8 up to 256 and then multiples of 16
# up to max_graph_size
cudagraph_capture_sizes = [1, 2, 4] + list(range(8, 256, 8)) + list(
range(256, max_graph_size + 1, 16))In the end, vllm_config.compilation_config.cudagraph_capture_sizes
will be the final sizes to capture cudagraph (in ascending order).
These sizes are used to capture and reuse CUDA graphs for
performance-critical paths (e.g., decoding). Capturing enables
significantly faster kernel dispatch by avoiding Python overhead. The
list is then filtered based on max_num_batched_tokens (e.g., 8192 on
most GPUs), which controls the total allowed number of tokens in a
batch. Since each sequence may have a variable number of tokens, the
maximum usable batch size will depend on actual sequence lengths.
Examples
With `max_num_batched_tokens = 8192`, and typical sequences
averaging ~32 tokens, most practical batch sizes fall below 256.
However, the system will still allow capture sizes up to 512 if
shape and memory permit.Notes
If users explicitly specify cudagraph capture sizes in the
compilation config, those will override this default logic.
At runtime:
- If batch size <= one of the `cudagraph_capture_sizes`, the closest
padded CUDA graph will be used.
- If batch size > largest `cudagraph_capture_sizes`, cudagraph will
not be used.Returns
term()
@spec additional_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec attention_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec cache_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec compilation_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec compile_debug_dump_path( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Returns a rank-aware path for dumping
torch.compile debug information.
Returns
term()
@spec compute_hash( SnakeBridge.Ref.t(), keyword() ) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}
WARNING: Whenever a new field is added to this config,
ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.
Returns
String.t()
@spec device_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec ec_transfer_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec enable_trace_function_call_for_thread( SnakeBridge.Ref.t(), keyword() ) :: {:ok, nil} | {:error, Snakepit.Error.t()}
Set up function tracing for the current thread,
if enabled via the VLLM_TRACE_FUNCTION environment variable.
Returns
nil
@spec get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method VllmConfig.get_quantization_config.
Parameters
model_config(term())load_config(term())
Returns
term()
@spec instance_id(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec kv_events_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec kv_transfer_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec load_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec lora_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec model_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec needs_dp_coordinator(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec new(term(), term(), term(), keyword()) :: {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}
Constructs VllmConfig.
Parameters
dataclass_self__(term())args(term())kwargs(term())
@spec observability_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec optimization_level(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec pad_for_cudagraph(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Python method VllmConfig.pad_for_cudagraph.
Parameters
batch_size(integer())
Returns
integer()
@spec parallel_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec profiler_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec quant_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec scheduler_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec speculative_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec structured_outputs_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec try_verify_and_update_config( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method VllmConfig.try_verify_and_update_config.
Returns
term()
@spec update_sizes_for_sequence_parallelism(SnakeBridge.Ref.t(), [term()], keyword()) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}
Python method VllmConfig.update_sizes_for_sequence_parallelism.
Parameters
possible_sizes(list(term()))
Returns
list(term())
@spec validate_mamba_block_size( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method VllmConfig.validate_mamba_block_size.
Returns
term()
@spec with_hf_config(SnakeBridge.Ref.t(), term(), [term()], keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method VllmConfig.with_hf_config.
Parameters
hf_config(term())architectures(term() default: None)
Returns
term()