Vllm.Config.CompilationConfig (VLLM v0.3.0)

Configuration for compilation.

You must pass CompilationConfig to VLLMConfig constructor. VLLMConfig's post_init does further initialization. If used outside of the VLLMConfig, some fields will be left in an improper state.

It has three parts:

Top-level Compilation control:
- [mode][vllm.config.CompilationConfig.mode]
- [debug_dump_path][vllm.config.CompilationConfig.debug_dump_path]
- [cache_dir][vllm.config.CompilationConfig.cache_dir]
- [backend][vllm.config.CompilationConfig.backend]
- [custom_ops][vllm.config.CompilationConfig.custom_ops]
- [splitting_ops][vllm.config.CompilationConfig.splitting_ops]
- [compile_mm_encoder][vllm.config.CompilationConfig.compile_mm_encoder]
CudaGraph capture:
- [cudagraph_mode][vllm.config.CompilationConfig.cudagraph_mode]
- [cudagraph_capture_sizes] [vllm.config.CompilationConfig.cudagraph_capture_sizes]
- [max_cudagraph_capture_size] [vllm.config.CompilationConfig.max_cudagraph_capture_size]
- [cudagraph_num_of_warmups] [vllm.config.CompilationConfig.cudagraph_num_of_warmups]
- [cudagraph_copy_inputs] [vllm.config.CompilationConfig.cudagraph_copy_inputs]
Inductor compilation:
- [compile_sizes][vllm.config.CompilationConfig.compile_sizes]
- [compile_ranges_split_points] [vllm.config.CompilationConfig.compile_ranges_split_points]
- [inductor_compile_config] [vllm.config.CompilationConfig.inductor_compile_config]
- [inductor_passes][vllm.config.CompilationConfig.inductor_passes]
- custom inductor passes

Why we have different sizes for cudagraph and inductor:

cudagraph: a cudagraph captured for a specific size can only be used for the same size. We need to capture all the sizes we want to use.
inductor: a graph compiled by inductor for a general shape can be used for different sizes. Inductor can also compile for specific sizes, where it can have more information to optimize the graph with fully static shapes. However, we find the general shape compilation is sufficient for most cases. It might be beneficial to compile for certain small batchsizes, where inductor is good at optimizing.

Summary

Types

t()

Functions

_skip_none_validation(ref, value, handler, opts \\ [])

Skip validation if the value is None when initialisation is delayed.

adjust_cudagraph_sizes_for_spec_decode(ref, uniform_decode_query_len, tensor_parallel_size, opts \\ [])

Python method CompilationConfig.adjust_cudagraph_sizes_for_spec_decode.

backend(ref)

bs_to_padded_graph_size(ref)

cache_dir(ref)

compilation_time(ref)

compile_mm_encoder(ref)

compile_ranges_split_points(ref)

compile_sizes(ref)

compute_bs_to_padded_graph_size(ref, opts \\ [])

Python method CompilationConfig.compute_bs_to_padded_graph_size.

compute_hash(ref, opts \\ [])

Provide a hash that uniquely identifies all the configs

cudagraph_capture_sizes(ref)

cudagraph_copy_inputs(ref)

cudagraph_mode(ref)

cudagraph_num_of_warmups(ref)

cudagraph_specialize_lora(ref)

custom_op_log_check(ref, opts \\ [])

This method logs the enabled/disabled custom ops and checks that the

debug_dump_path(ref)

get_compile_ranges(ref, opts \\ [])

Get the compile ranges for the compilation config.

init_backend(ref, vllm_config, opts \\ [])

Initialize the backend for the compilation config from a vllm config.

is_attention_compiled_piecewise(ref, opts \\ [])

Python method CompilationConfig.is_attention_compiled_piecewise.

is_custom_op_enabled(ref, op, opts \\ [])

Python method CompilationConfig.is_custom_op_enabled.

level(ref)

local_cache_dir(ref)

max_cudagraph_capture_size(ref)

mode(ref)

new(dataclass_self__, args, kwargs, opts \\ [])

Constructs CompilationConfig.

post_init_cudagraph_sizes(ref, opts \\ [])

To complete the initialization after cudagraph related

set_splitting_ops_for_attn_fusion(ref, opts \\ [])

Python method CompilationConfig.set_splitting_ops_for_attn_fusion.

set_splitting_ops_for_v1(ref, all2all_backend, args, opts \\ [])

Python method CompilationConfig.set_splitting_ops_for_v1.

splitting_ops(ref)

splitting_ops_contain_attention(ref, opts \\ [])

Python method CompilationConfig.splitting_ops_contain_attention.

use_inductor_graph_partition(ref)

validate_compile_cache_save_format(ref, value, opts \\ [])

Python method CompilationConfig.validate_compile_cache_save_format.

validate_cudagraph_mode_before(ref, value, opts \\ [])

Enable parsing of the cudagraph_mode enum type from string.

validate_mode_before(ref, value, opts \\ [])

Enable parsing the mode field from string mode names.

validate_pass_config_before(ref, value, opts \\ [])

Enable parsing of the pass_config field from a dictionary.

Types

t()

@opaque t()

Functions

_skip_none_validation(ref, value, handler, opts \\ [])

@spec _skip_none_validation(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Skip validation if the value is None when initialisation is delayed.

Parameters

value (term())
handler (term())

Returns

term()

adjust_cudagraph_sizes_for_spec_decode(ref, uniform_decode_query_len, tensor_parallel_size, opts \\ [])

@spec adjust_cudagraph_sizes_for_spec_decode(
  SnakeBridge.Ref.t(),
  integer(),
  integer(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.adjust_cudagraph_sizes_for_spec_decode.

Parameters

uniform_decode_query_len (integer())
tensor_parallel_size (integer())

Returns

term()

backend(ref)

@spec backend(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

bs_to_padded_graph_size(ref)

@spec bs_to_padded_graph_size(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cache_dir(ref)

@spec cache_dir(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

compilation_time(ref)

@spec compilation_time(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_mm_encoder(ref)

@spec compile_mm_encoder(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_ranges_split_points(ref)

@spec compile_ranges_split_points(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_sizes(ref)

@spec compile_sizes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compute_bs_to_padded_graph_size(ref, opts \\ [])

@spec compute_bs_to_padded_graph_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.compute_bs_to_padded_graph_size.

Returns

term()

compute_hash(ref, opts \\ [])

@spec compute_hash(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Provide a hash that uniquely identifies all the configs

that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.

Returns

String.t()

cudagraph_capture_sizes(ref)

@spec cudagraph_capture_sizes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_copy_inputs(ref)

@spec cudagraph_copy_inputs(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_mode(ref)

@spec cudagraph_mode(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_num_of_warmups(ref)

@spec cudagraph_num_of_warmups(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_specialize_lora(ref)

@spec cudagraph_specialize_lora(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

custom_op_log_check(ref, opts \\ [])

@spec custom_op_log_check(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

This method logs the enabled/disabled custom ops and checks that the

passed custom_ops field only contains relevant ops. It is called at the end of set_current_vllm_config, after the custom ops have been instantiated.

Returns

term()

debug_dump_path(ref)

@spec debug_dump_path(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

get_compile_ranges(ref, opts \\ [])

@spec get_compile_ranges(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}

Get the compile ranges for the compilation config.

Returns

list(term())

init_backend(ref, vllm_config, opts \\ [])

@spec init_backend(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Initialize the backend for the compilation config from a vllm config.

Parameters

vllm_config - The vllm config to initialize the backend from.

Returns

term()

is_attention_compiled_piecewise(ref, opts \\ [])

@spec is_attention_compiled_piecewise(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.is_attention_compiled_piecewise.

Returns

boolean()

is_custom_op_enabled(ref, op, opts \\ [])

@spec is_custom_op_enabled(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.is_custom_op_enabled.

Parameters

op (String.t())

Returns

boolean()

level(ref)

@spec level(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

local_cache_dir(ref)

@spec local_cache_dir(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

max_cudagraph_capture_size(ref)

@spec max_cudagraph_capture_size(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

mode(ref)

@spec mode(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

new(dataclass_self__, args, kwargs, opts \\ [])

@spec new(term(), term(), term(), keyword()) ::
  {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Constructs CompilationConfig.

Parameters

dataclass_self__ (term())
args (term())
kwargs (term())

post_init_cudagraph_sizes(ref, opts \\ [])

@spec post_init_cudagraph_sizes(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

To complete the initialization after cudagraph related

configs are set. This includes:

initialize compile_sizes
pre-compute the mapping bs_to_padded_graph_size

Returns

nil

set_splitting_ops_for_attn_fusion(ref, opts \\ [])

@spec set_splitting_ops_for_attn_fusion(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.set_splitting_ops_for_attn_fusion.

Returns

term()

set_splitting_ops_for_v1(ref, all2all_backend, args, opts \\ [])

@spec set_splitting_ops_for_v1(SnakeBridge.Ref.t(), String.t(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.set_splitting_ops_for_v1.

Parameters

all2all_backend (String.t())
data_parallel_size (integer() default: 1)

Returns

term()

splitting_ops(ref)

@spec splitting_ops(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

splitting_ops_contain_attention(ref, opts \\ [])

@spec splitting_ops_contain_attention(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.splitting_ops_contain_attention.

Returns

boolean()

use_inductor_graph_partition(ref)

@spec use_inductor_graph_partition(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

validate_compile_cache_save_format(ref, value, opts \\ [])

@spec validate_compile_cache_save_format(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.validate_compile_cache_save_format.

Parameters

value (String.t())

Returns

String.t()

validate_cudagraph_mode_before(ref, value, opts \\ [])

@spec validate_cudagraph_mode_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing of the cudagraph_mode enum type from string.

Parameters

value (term())

Returns

term()

validate_mode_before(ref, value, opts \\ [])

@spec validate_mode_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing the mode field from string mode names.

Accepts both integers (0-3) and string names, like NONE, STOCK_TORCH_COMPILE, DYNAMO_TRACE_ONCE, VLLM_COMPILE.

Parameters

value (term())

Returns

term()

validate_pass_config_before(ref, value, opts \\ [])

@spec validate_pass_config_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing of the pass_config field from a dictionary.

Parameters

value (term())

Returns

term()