Vllm.Config.VllmConfig (VLLM v0.3.0)

Copy Markdown View Source

Dataclass which contains all vllm-related configuration. This

simplifies passing around the distinct configurations in the codebase.

Summary

Functions

Apply optimization level defaults using self as root.

Update KVTransferConfig based on top-level configs in VllmConfig.

Set the compile ranges for the compilation config.

Set config attribute to default if not already set by user.

vLLM defines the default candidate list of batch sizes for CUDA graph

Returns a rank-aware path for dumping

WARNING: Whenever a new field is added to this config,

Set up function tracing for the current thread,

Python method VllmConfig.get_quantization_config.

Python method VllmConfig.pad_for_cudagraph.

Python method VllmConfig.try_verify_and_update_config.

Python method VllmConfig.update_sizes_for_sequence_parallelism.

Python method VllmConfig.validate_mamba_block_size.

Python method VllmConfig.with_hf_config.

Types

t()

@opaque t()

Functions

_apply_optimization_level_defaults(ref, defaults, opts \\ [])

@spec _apply_optimization_level_defaults(
  SnakeBridge.Ref.t(),
  %{optional(String.t()) => term()},
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Apply optimization level defaults using self as root.

Recursively applies values from defaults into nested config objects. Only fields present in defaults are overwritten.

If the user configuration does not specify a value for a default field and if the default field is still None after all user selections are applied, then default values will be applied to the field. User speciied fields will not be overridden by the default.

Parameters

  • defaults - Dictionary of default values to apply.

Returns

  • nil

_get_quantization_config(ref, model_config, load_config, opts \\ [])

@spec _get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Get the quantization config.

Parameters

  • model_config (term())
  • load_config (term())

Returns

  • term()

_post_init_kv_transfer_config(ref, opts \\ [])

@spec _post_init_kv_transfer_config(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Update KVTransferConfig based on top-level configs in VllmConfig.

Right now, this function reads the offloading settings from CacheConfig and configures the KVTransferConfig accordingly.

Returns

  • nil

_set_compile_ranges(ref, opts \\ [])

@spec _set_compile_ranges(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Set the compile ranges for the compilation config.

Returns

  • term()

_set_config_default(ref, config_obj, key, value, opts \\ [])

@spec _set_config_default(SnakeBridge.Ref.t(), term(), String.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set config attribute to default if not already set by user.

Parameters

  • config_obj - Configuration object to update.
  • key - Attribute name.
  • value - Default value (static or callable).

Returns

  • nil

_set_cudagraph_sizes(ref, opts \\ [])

@spec _set_cudagraph_sizes(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

vLLM defines the default candidate list of batch sizes for CUDA graph

capture as:

max_graph_size = min(max_num_seqs * 2, 512)
# 1, 2, 4, then multiples of 8 up to 256 and then multiples of 16
# up to max_graph_size
cudagraph_capture_sizes = [1, 2, 4] + list(range(8, 256, 8)) + list(
    range(256, max_graph_size + 1, 16))

In the end, vllm_config.compilation_config.cudagraph_capture_sizes will be the final sizes to capture cudagraph (in ascending order).

These sizes are used to capture and reuse CUDA graphs for performance-critical paths (e.g., decoding). Capturing enables significantly faster kernel dispatch by avoiding Python overhead. The list is then filtered based on max_num_batched_tokens (e.g., 8192 on most GPUs), which controls the total allowed number of tokens in a batch. Since each sequence may have a variable number of tokens, the maximum usable batch size will depend on actual sequence lengths.

Examples

With `max_num_batched_tokens = 8192`, and typical sequences
averaging ~32 tokens, most practical batch sizes fall below 256.
However, the system will still allow capture sizes up to 512 if
shape and memory permit.

Notes

If users explicitly specify cudagraph capture sizes in the

compilation config, those will override this default logic.
At runtime:

- If batch size <= one of the `cudagraph_capture_sizes`, the closest
padded CUDA graph will be used.
- If batch size > largest `cudagraph_capture_sizes`, cudagraph will
not be used.

Returns

  • term()

additional_config(ref)

@spec additional_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

attention_config(ref)

@spec attention_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cache_config(ref)

@spec cache_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compilation_config(ref)

@spec compilation_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_debug_dump_path(ref, opts \\ [])

@spec compile_debug_dump_path(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns a rank-aware path for dumping

torch.compile debug information.

Returns

  • term()

compute_hash(ref, opts \\ [])

@spec compute_hash(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

WARNING: Whenever a new field is added to this config,

ensure that it is included in the factors list if it affects the computation graph.

Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.

Returns

  • String.t()

device_config(ref)

@spec device_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

ec_transfer_config(ref)

@spec ec_transfer_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

enable_trace_function_call_for_thread(ref, opts \\ [])

@spec enable_trace_function_call_for_thread(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Set up function tracing for the current thread,

if enabled via the VLLM_TRACE_FUNCTION environment variable.

Returns

  • nil

get_quantization_config(ref, model_config, load_config, opts \\ [])

@spec get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.get_quantization_config.

Parameters

  • model_config (term())
  • load_config (term())

Returns

  • term()

instance_id(ref)

@spec instance_id(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

kv_events_config(ref)

@spec kv_events_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

kv_transfer_config(ref)

@spec kv_transfer_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

load_config(ref)

@spec load_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

lora_config(ref)

@spec lora_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

model_config(ref)

@spec model_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

needs_dp_coordinator(ref)

@spec needs_dp_coordinator(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

new(dataclass_self__, args, kwargs, opts \\ [])

@spec new(term(), term(), term(), keyword()) ::
  {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Constructs VllmConfig.

Parameters

  • dataclass_self__ (term())
  • args (term())
  • kwargs (term())

observability_config(ref)

@spec observability_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

optimization_level(ref)

@spec optimization_level(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

pad_for_cudagraph(ref, batch_size, opts \\ [])

@spec pad_for_cudagraph(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.pad_for_cudagraph.

Parameters

  • batch_size (integer())

Returns

  • integer()

parallel_config(ref)

@spec parallel_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

profiler_config(ref)

@spec profiler_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

quant_config(ref)

@spec quant_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

scheduler_config(ref)

@spec scheduler_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

speculative_config(ref)

@spec speculative_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

structured_outputs_config(ref)

@spec structured_outputs_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

try_verify_and_update_config(ref, opts \\ [])

@spec try_verify_and_update_config(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.try_verify_and_update_config.

Returns

  • term()

update_sizes_for_sequence_parallelism(ref, possible_sizes, opts \\ [])

@spec update_sizes_for_sequence_parallelism(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, [term()]} | {:error, Snakepit.Error.t()}

Python method VllmConfig.update_sizes_for_sequence_parallelism.

Parameters

  • possible_sizes (list(term()))

Returns

  • list(term())

validate_mamba_block_size(ref, opts \\ [])

@spec validate_mamba_block_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.validate_mamba_block_size.

Returns

  • term()

with_hf_config(ref, hf_config, args, opts \\ [])

@spec with_hf_config(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.with_hf_config.

Parameters

  • hf_config (term())
  • architectures (term() default: None)

Returns

  • term()