Vllm.EnvOverride (VLLM v0.3.0)

Submodule bindings for vllm.env_override.

Version

Requested: 0.14.0
Observed at generation: 0.14.0

Runtime Options

All functions accept a __runtime__ option for controlling execution behavior:

Vllm.EnvOverride.some_function(args, __runtime__: [timeout: 120_000])

Supported runtime options

:timeout - Call timeout in milliseconds (default: 120,000ms / 2 minutes)
:timeout_profile - Use a named profile (:default, :ml_inference, :batch_job, :streaming)
:stream_timeout - Timeout for streaming operations (default: 1,800,000ms / 30 minutes)
:session_id - Override the session ID for this call
:pool_name - Target a specific Snakepit pool (multi-pool setups)
:affinity - Override session affinity (:hint, :strict_queue, :strict_fail_fast)

Timeout Profiles

:default - 2 minute timeout for regular calls
:ml_inference - 10 minute timeout for ML/LLM workloads
:batch_job - Unlimited timeout for long-running jobs
:streaming - 2 minute timeout, 30 minute stream_timeout

Example with timeout override

# For a long-running ML inference call
Vllm.EnvOverride.predict(data, __runtime__: [timeout_profile: :ml_inference])

# Or explicit timeout
Vllm.EnvOverride.predict(data, __runtime__: [timeout: 600_000])

# Route to a pool and enforce strict affinity
Vllm.EnvOverride.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])

See SnakeBridge.Defaults for global timeout configuration.

Summary

Functions

_patch_get_raw_stream_if_needed(opts \\ [])

Workaround for TorchInductor autotune get_raw_stream() bug.

_update_scheduler_patched(self, opts \\ [])

(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN

get_graph_partition_signature_patched(self, partitions, skip_cudagraphs, opts \\ [])

Gets signature for each graph partition, including input nodes, output nodes, and

init_logger(name, opts \\ [])

The main purpose of this function is to ensure that loggers are

is_torch_equal(target, opts \\ [])

Check if the installed torch version is == the target version.

logger()

Python module attribute vllm.env_override.logger.

memory_plan_reuse_patched(self, opts \\ [])

Python binding for vllm.env_override.memory_plan_reuse_patched.

should_partition_patched(self, node)

Return True if we should partition the inductor graph on this node

should_partition_patched(self, node, opts)

should_partition_patched(self, node, should_log, opts)

Functions

_patch_get_raw_stream_if_needed(opts \\ [])

@spec _patch_get_raw_stream_if_needed(keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Workaround for TorchInductor autotune get_raw_stream() bug.

Returns

term()

_update_scheduler_patched(self, opts \\ [])

@spec _update_scheduler_patched(
  term(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN

files should be generated (to avoid biasing any benchmarks and pessimizing fusion decisions).

Parameters

self (term())

Returns

nil

get_graph_partition_signature_patched(self, partitions, skip_cudagraphs, opts \\ [])

@spec get_graph_partition_signature_patched(term(), term(), [boolean()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Gets signature for each graph partition, including input nodes, output nodes, and

whether deallocating an input within graph partition.

Parameters

self (term())
partitions (term())
skip_cudagraphs (list(boolean()))

Returns

term()

init_logger(name, opts \\ [])

@spec init_logger(
  String.t(),
  keyword()
) :: {:ok, Vllm.Logger.VllmLogger.t()} | {:error, Snakepit.Error.t()}

The main purpose of this function is to ensure that loggers are

retrieved in such a way that we can be sure the root vllm logger has already been configured.

Parameters

name (String.t())

Returns

Vllm.Logger.VllmLogger.t()

is_torch_equal(target, opts \\ [])

@spec is_torch_equal(
  String.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Check if the installed torch version is == the target version.

Parameters

target - a version string, like "2.6.0".

Returns

boolean()

logger()

@spec logger() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.env_override.logger.

Returns

term()

memory_plan_reuse_patched(self, opts \\ [])

@spec memory_plan_reuse_patched(
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python binding for vllm.env_override.memory_plan_reuse_patched.

Parameters

self (term())

Returns

term()

should_partition_patched(self, node)

@spec should_partition_patched(term(), term()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Return True if we should partition the inductor graph on this node

Parameters

self (term())
node (term())
should_log (boolean() default: False)

Returns

boolean()

should_partition_patched(self, node, opts)

@spec should_partition_patched(term(), term(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

@spec should_partition_patched(term(), term(), boolean()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

should_partition_patched(self, node, should_log, opts)

@spec should_partition_patched(term(), term(), boolean(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}