Vllm.Config.VllmConfig (VLLM v0.3.0)

@spec _apply_optimization_level_defaults(
  SnakeBridge.Ref.t(),
  %{optional(String.t()) => term()},
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Apply optimization level defaults using self as root.

Recursively applies values from defaults into nested config objects. Only fields present in defaults are overwritten.

If the user configuration does not specify a value for a default field and if the default field is still None after all user selections are applied, then default values will be applied to the field. User speciied fields will not be overridden by the default.

Parameters

defaults - Dictionary of default values to apply.

Returns

nil

_get_quantization_config(ref, model_config, load_config, opts \\ [])

@spec _get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Get the quantization config.

Parameters

model_config (term())
load_config (term())

Returns

term()

_post_init_kv_transfer_config(ref, opts \\ [])

@spec _post_init_kv_transfer_config(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Update KVTransferConfig based on top-level configs in VllmConfig.

Right now, this function reads the offloading settings from CacheConfig and configures the KVTransferConfig accordingly.

Returns

nil

_set_compile_ranges(ref, opts \\ [])

@spec _set_compile_ranges(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Set the compile ranges for the compilation config.

Returns

term()

_set_config_default(ref, config_obj, key, value, opts \\ [])

@spec _set_config_default(SnakeBridge.Ref.t(), term(), String.t(), term(), keyword()) ::
  {:ok, nil} | {:error, Snakepit.Error.t()}

Set config attribute to default if not already set by user.

Parameters

config_obj - Configuration object to update.
key - Attribute name.
value - Default value (static or callable).

Returns

nil

_set_cudagraph_sizes(ref, opts \\ [])

@spec _set_cudagraph_sizes(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

vLLM defines the default candidate list of batch sizes for CUDA graph

capture as:

max_graph_size = min(max_num_seqs * 2, 512)
# 1, 2, 4, then multiples of 8 up to 256 and then multiples of 16
# up to max_graph_size
cudagraph_capture_sizes = [1, 2, 4] + list(range(8, 256, 8)) + list(
    range(256, max_graph_size + 1, 16))

In the end, vllm_config.compilation_config.cudagraph_capture_sizes will be the final sizes to capture cudagraph (in ascending order).

These sizes are used to capture and reuse CUDA graphs for performance-critical paths (e.g., decoding). Capturing enables significantly faster kernel dispatch by avoiding Python overhead. The list is then filtered based on max_num_batched_tokens (e.g., 8192 on most GPUs), which controls the total allowed number of tokens in a batch. Since each sequence may have a variable number of tokens, the maximum usable batch size will depend on actual sequence lengths.

Examples

With `max_num_batched_tokens = 8192`, and typical sequences
averaging ~32 tokens, most practical batch sizes fall below 256.
However, the system will still allow capture sizes up to 512 if
shape and memory permit.

Notes

If users explicitly specify cudagraph capture sizes in the

compilation config, those will override this default logic.
At runtime:

- If batch size <= one of the `cudagraph_capture_sizes`, the closest
padded CUDA graph will be used.
- If batch size > largest `cudagraph_capture_sizes`, cudagraph will
not be used.

Returns

term()

additional_config(ref)

@spec additional_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

attention_config(ref)

@spec attention_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cache_config(ref)

@spec cache_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compilation_config(ref)

@spec compilation_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_debug_dump_path(ref, opts \\ [])

@spec compile_debug_dump_path(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Returns a rank-aware path for dumping

torch.compile debug information.

Returns

term()

compute_hash(ref, opts \\ [])

@spec compute_hash(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

WARNING: Whenever a new field is added to this config,

ensure that it is included in the factors list if it affects the computation graph.

Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.

Returns

String.t()

device_config(ref)

@spec device_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

ec_transfer_config(ref)

@spec ec_transfer_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

enable_trace_function_call_for_thread(ref, opts \\ [])

@spec enable_trace_function_call_for_thread(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

Set up function tracing for the current thread,

if enabled via the VLLM_TRACE_FUNCTION environment variable.

Returns

nil

get_quantization_config(ref, model_config, load_config, opts \\ [])

@spec get_quantization_config(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.get_quantization_config.

Parameters

model_config (term())
load_config (term())

Returns

term()

instance_id(ref)

@spec instance_id(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

kv_events_config(ref)

@spec kv_events_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

kv_transfer_config(ref)

@spec kv_transfer_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

load_config(ref)

@spec load_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

lora_config(ref)

@spec lora_config(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

model_config(ref)

@spec model_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

needs_dp_coordinator(ref)

@spec needs_dp_coordinator(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

new(dataclass_self__, args, kwargs, opts \\ [])

@spec new(term(), term(), term(), keyword()) ::
  {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Constructs VllmConfig.

Parameters

dataclass_self__ (term())
args (term())
kwargs (term())

observability_config(ref)

@spec observability_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

optimization_level(ref)

@spec optimization_level(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

pad_for_cudagraph(ref, batch_size, opts \\ [])

@spec pad_for_cudagraph(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, integer()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.pad_for_cudagraph.

Parameters

batch_size (integer())

Returns

integer()

parallel_config(ref)

@spec parallel_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

profiler_config(ref)

@spec profiler_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

quant_config(ref)

@spec quant_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

scheduler_config(ref)

@spec scheduler_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

speculative_config(ref)

@spec speculative_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

structured_outputs_config(ref)

@spec structured_outputs_config(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

try_verify_and_update_config(ref, opts \\ [])

@spec try_verify_and_update_config(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.try_verify_and_update_config.

Returns

term()

update_sizes_for_sequence_parallelism(ref, possible_sizes, opts \\ [])

@spec update_sizes_for_sequence_parallelism(SnakeBridge.Ref.t(), [term()], keyword()) ::
  {:ok, [term()]} | {:error, Snakepit.Error.t()}

Python method VllmConfig.update_sizes_for_sequence_parallelism.

Parameters

possible_sizes (list(term()))

Returns

list(term())

validate_mamba_block_size(ref, opts \\ [])

@spec validate_mamba_block_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.validate_mamba_block_size.

Returns

term()

with_hf_config(ref, hf_config, args, opts \\ [])

@spec with_hf_config(SnakeBridge.Ref.t(), term(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method VllmConfig.with_hf_config.

Parameters

hf_config (term())
architectures (term() default: None)

Returns

term()