Vllm.EnvOverride (VLLM v0.3.0)

Copy Markdown View Source

Submodule bindings for vllm.env_override.

Version

  • Requested: 0.14.0
  • Observed at generation: 0.14.0

Runtime Options

All functions accept a __runtime__ option for controlling execution behavior:

Vllm.EnvOverride.some_function(args, __runtime__: [timeout: 120_000])

Supported runtime options

  • :timeout - Call timeout in milliseconds (default: 120,000ms / 2 minutes)
  • :timeout_profile - Use a named profile (:default, :ml_inference, :batch_job, :streaming)
  • :stream_timeout - Timeout for streaming operations (default: 1,800,000ms / 30 minutes)
  • :session_id - Override the session ID for this call
  • :pool_name - Target a specific Snakepit pool (multi-pool setups)
  • :affinity - Override session affinity (:hint, :strict_queue, :strict_fail_fast)

Timeout Profiles

  • :default - 2 minute timeout for regular calls
  • :ml_inference - 10 minute timeout for ML/LLM workloads
  • :batch_job - Unlimited timeout for long-running jobs
  • :streaming - 2 minute timeout, 30 minute stream_timeout

Example with timeout override

# For a long-running ML inference call
Vllm.EnvOverride.predict(data, __runtime__: [timeout_profile: :ml_inference])

# Or explicit timeout
Vllm.EnvOverride.predict(data, __runtime__: [timeout: 600_000])

# Route to a pool and enforce strict affinity
Vllm.EnvOverride.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])

See SnakeBridge.Defaults for global timeout configuration.

Summary

Functions

Workaround for TorchInductor autotune get_raw_stream() bug.

(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN

Gets signature for each graph partition, including input nodes, output nodes, and

The main purpose of this function is to ensure that loggers are

Check if the installed torch version is == the target version.

Python module attribute vllm.env_override.logger.

Python binding for vllm.env_override.memory_plan_reuse_patched.

Return True if we should partition the inductor graph on this node

Functions

_patch_get_raw_stream_if_needed(opts \\ [])

@spec _patch_get_raw_stream_if_needed(keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Workaround for TorchInductor autotune get_raw_stream() bug.

Returns

  • term()

_update_scheduler_patched(self, opts \\ [])

@spec _update_scheduler_patched(
  term(),
  keyword()
) :: {:ok, nil} | {:error, Snakepit.Error.t()}

(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN

files should be generated (to avoid biasing any benchmarks and pessimizing fusion decisions).

Parameters

  • self (term())

Returns

  • nil

get_graph_partition_signature_patched(self, partitions, skip_cudagraphs, opts \\ [])

@spec get_graph_partition_signature_patched(term(), term(), [boolean()], keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Gets signature for each graph partition, including input nodes, output nodes, and

whether deallocating an input within graph partition.

Parameters

  • self (term())
  • partitions (term())
  • skip_cudagraphs (list(boolean()))

Returns

  • term()

init_logger(name, opts \\ [])

@spec init_logger(
  String.t(),
  keyword()
) :: {:ok, Vllm.Logger.VllmLogger.t()} | {:error, Snakepit.Error.t()}

The main purpose of this function is to ensure that loggers are

retrieved in such a way that we can be sure the root vllm logger has already been configured.

Parameters

  • name (String.t())

Returns

  • Vllm.Logger.VllmLogger.t()

is_torch_equal(target, opts \\ [])

@spec is_torch_equal(
  String.t(),
  keyword()
) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Check if the installed torch version is == the target version.

Parameters

  • target - a version string, like "2.6.0".

Returns

  • boolean()

logger()

@spec logger() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.env_override.logger.

Returns

  • term()

memory_plan_reuse_patched(self, opts \\ [])

@spec memory_plan_reuse_patched(
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python binding for vllm.env_override.memory_plan_reuse_patched.

Parameters

  • self (term())

Returns

  • term()

should_partition_patched(self, node)

@spec should_partition_patched(term(), term()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Return True if we should partition the inductor graph on this node

Parameters

  • self (term())
  • node (term())
  • should_log (boolean() default: False)

Returns

  • boolean()

should_partition_patched(self, node, opts)

@spec should_partition_patched(term(), term(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}
@spec should_partition_patched(term(), term(), boolean()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

should_partition_patched(self, node, should_log, opts)

@spec should_partition_patched(term(), term(), boolean(), keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}