Vllm.Config.CompilationConfig (VLLM v0.3.0)

Copy Markdown View Source

Configuration for compilation.

You must pass CompilationConfig to VLLMConfig constructor. VLLMConfig's post_init does further initialization. If used outside of the VLLMConfig, some fields will be left in an improper state.

It has three parts:

  • Top-level Compilation control:
    • [mode][vllm.config.CompilationConfig.mode]
    • [debug_dump_path][vllm.config.CompilationConfig.debug_dump_path]
    • [cache_dir][vllm.config.CompilationConfig.cache_dir]
    • [backend][vllm.config.CompilationConfig.backend]
    • [custom_ops][vllm.config.CompilationConfig.custom_ops]
    • [splitting_ops][vllm.config.CompilationConfig.splitting_ops]
    • [compile_mm_encoder][vllm.config.CompilationConfig.compile_mm_encoder]
  • CudaGraph capture:
    • [cudagraph_mode][vllm.config.CompilationConfig.cudagraph_mode]
    • [cudagraph_capture_sizes] [vllm.config.CompilationConfig.cudagraph_capture_sizes]
    • [max_cudagraph_capture_size] [vllm.config.CompilationConfig.max_cudagraph_capture_size]
    • [cudagraph_num_of_warmups] [vllm.config.CompilationConfig.cudagraph_num_of_warmups]
    • [cudagraph_copy_inputs] [vllm.config.CompilationConfig.cudagraph_copy_inputs]
  • Inductor compilation:
    • [compile_sizes][vllm.config.CompilationConfig.compile_sizes]
    • [compile_ranges_split_points] [vllm.config.CompilationConfig.compile_ranges_split_points]
    • [inductor_compile_config] [vllm.config.CompilationConfig.inductor_compile_config]
    • [inductor_passes][vllm.config.CompilationConfig.inductor_passes]
    • custom inductor passes

Why we have different sizes for cudagraph and inductor:

  • cudagraph: a cudagraph captured for a specific size can only be used for the same size. We need to capture all the sizes we want to use.
  • inductor: a graph compiled by inductor for a general shape can be used for different sizes. Inductor can also compile for specific sizes, where it can have more information to optimize the graph with fully static shapes. However, we find the general shape compilation is sufficient for most cases. It might be beneficial to compile for certain small batchsizes, where inductor is good at optimizing.

Summary

Functions

Skip validation if the value is None when initialisation is delayed.

Python method CompilationConfig.adjust_cudagraph_sizes_for_spec_decode.

Python method CompilationConfig.compute_bs_to_padded_graph_size.

Provide a hash that uniquely identifies all the configs

This method logs the enabled/disabled custom ops and checks that the

Get the compile ranges for the compilation config.

Initialize the backend for the compilation config from a vllm config.

Python method CompilationConfig.is_attention_compiled_piecewise.

Python method CompilationConfig.is_custom_op_enabled.

Constructs CompilationConfig.

To complete the initialization after cudagraph related

Python method CompilationConfig.set_splitting_ops_for_attn_fusion.

Python method CompilationConfig.set_splitting_ops_for_v1.

Python method CompilationConfig.splitting_ops_contain_attention.

Python method CompilationConfig.validate_compile_cache_save_format.

Enable parsing of the cudagraph_mode enum type from string.

Enable parsing the mode field from string mode names.

Enable parsing of the pass_config field from a dictionary.

Types

t()

@opaque t()

Functions

_skip_none_validation(ref, value, handler, opts \\ [])

@spec _skip_none_validation(SnakeBridge.Ref.t(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Skip validation if the value is None when initialisation is delayed.

Parameters

  • value (term())
  • handler (term())

Returns

  • term()

adjust_cudagraph_sizes_for_spec_decode(ref, uniform_decode_query_len, tensor_parallel_size, opts \\ [])

@spec adjust_cudagraph_sizes_for_spec_decode(
  SnakeBridge.Ref.t(),
  integer(),
  integer(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.adjust_cudagraph_sizes_for_spec_decode.

Parameters

  • uniform_decode_query_len (integer())
  • tensor_parallel_size (integer())

Returns

  • term()

backend(ref)

@spec backend(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

bs_to_padded_graph_size(ref)

@spec bs_to_padded_graph_size(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cache_dir(ref)

@spec cache_dir(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

compilation_time(ref)

@spec compilation_time(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_mm_encoder(ref)

@spec compile_mm_encoder(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_ranges_split_points(ref)

@spec compile_ranges_split_points(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compile_sizes(ref)

@spec compile_sizes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

compute_bs_to_padded_graph_size(ref, opts \\ [])

@spec compute_bs_to_padded_graph_size(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.compute_bs_to_padded_graph_size.

Returns

  • term()

compute_hash(ref, opts \\ [])

@spec compute_hash(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, String.t()} | {:error, Snakepit.Error.t()}

Provide a hash that uniquely identifies all the configs

that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.

Returns

  • String.t()

cudagraph_capture_sizes(ref)

@spec cudagraph_capture_sizes(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_copy_inputs(ref)

@spec cudagraph_copy_inputs(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_mode(ref)

@spec cudagraph_mode(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_num_of_warmups(ref)

@spec cudagraph_num_of_warmups(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

cudagraph_specialize_lora(ref)

@spec cudagraph_specialize_lora(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

custom_op_log_check(ref, opts \\ [])

@spec custom_op_log_check(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

This method logs the enabled/disabled custom ops and checks that the

passed custom_ops field only contains relevant ops. It is called at the end of set_current_vllm_config, after the custom ops have been instantiated.

Returns

  • term()

debug_dump_path(ref)

@spec debug_dump_path(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

get_compile_ranges(ref, opts \\ [])

@spec get_compile_ranges(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, [term()]} | {:error, Snakepit.Error.t()}

Get the compile ranges for the compilation config.

Returns

  • list(term())

init_backend(ref, vllm_config, opts \\ [])

@spec init_backend(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Initialize the backend for the compilation config from a vllm config.

Parameters

  • vllm_config - The vllm config to initialize the backend from.

Returns

  • term()

is_attention_compiled_piecewise(ref, opts \\ [])

@spec is_attention_compiled_piecewise(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.is_attention_compiled_piecewise.

Returns

  • boolean()

is_custom_op_enabled(ref, op, opts \\ [])

@spec is_custom_op_enabled(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.is_custom_op_enabled.

Parameters

  • op (String.t())

Returns

  • boolean()

level(ref)

@spec level(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

local_cache_dir(ref)

@spec local_cache_dir(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

max_cudagraph_capture_size(ref)

@spec max_cudagraph_capture_size(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

mode(ref)

@spec mode(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

new(dataclass_self__, args, kwargs, opts \\ [])

@spec new(term(), term(), term(), keyword()) ::
  {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Constructs CompilationConfig.

Parameters

  • dataclass_self__ (term())
  • args (term())
  • kwargs (term())

post_init_cudagraph_sizes(ref, opts \\ [])

@spec post_init_cudagraph_sizes(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

To complete the initialization after cudagraph related

configs are set. This includes:

  • initialize compile_sizes
  • pre-compute the mapping bs_to_padded_graph_size

Returns

  • nil

set_splitting_ops_for_attn_fusion(ref, opts \\ [])

@spec set_splitting_ops_for_attn_fusion(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.set_splitting_ops_for_attn_fusion.

Returns

  • term()

set_splitting_ops_for_v1(ref, all2all_backend, args, opts \\ [])

@spec set_splitting_ops_for_v1(SnakeBridge.Ref.t(), String.t(), [term()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.set_splitting_ops_for_v1.

Parameters

  • all2all_backend (String.t())
  • data_parallel_size (integer() default: 1)

Returns

  • term()

splitting_ops(ref)

@spec splitting_ops(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

splitting_ops_contain_attention(ref, opts \\ [])

@spec splitting_ops_contain_attention(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.splitting_ops_contain_attention.

Returns

  • boolean()

use_inductor_graph_partition(ref)

@spec use_inductor_graph_partition(SnakeBridge.Ref.t()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

validate_compile_cache_save_format(ref, value, opts \\ [])

@spec validate_compile_cache_save_format(SnakeBridge.Ref.t(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, Snakepit.Error.t()}

Python method CompilationConfig.validate_compile_cache_save_format.

Parameters

  • value (String.t())

Returns

  • String.t()

validate_cudagraph_mode_before(ref, value, opts \\ [])

@spec validate_cudagraph_mode_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing of the cudagraph_mode enum type from string.

Parameters

  • value (term())

Returns

  • term()

validate_mode_before(ref, value, opts \\ [])

@spec validate_mode_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing the mode field from string mode names.

Accepts both integers (0-3) and string names, like NONE, STOCK_TORCH_COMPILE, DYNAMO_TRACE_ONCE, VLLM_COMPILE.

Parameters

  • value (term())

Returns

  • term()

validate_pass_config_before(ref, value, opts \\ [])

@spec validate_pass_config_before(SnakeBridge.Ref.t(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Enable parsing of the pass_config field from a dictionary.

Parameters

  • value (term())

Returns

  • term()