Submodule bindings for vllm.env_override.
Version
- Requested: 0.14.0
- Observed at generation: 0.14.0
Runtime Options
All functions accept a __runtime__ option for controlling execution behavior:
Vllm.EnvOverride.some_function(args, __runtime__: [timeout: 120_000])Supported runtime options
:timeout- Call timeout in milliseconds (default: 120,000ms / 2 minutes):timeout_profile- Use a named profile (:default,:ml_inference,:batch_job,:streaming):stream_timeout- Timeout for streaming operations (default: 1,800,000ms / 30 minutes):session_id- Override the session ID for this call:pool_name- Target a specific Snakepit pool (multi-pool setups):affinity- Override session affinity (:hint,:strict_queue,:strict_fail_fast)
Timeout Profiles
:default- 2 minute timeout for regular calls:ml_inference- 10 minute timeout for ML/LLM workloads:batch_job- Unlimited timeout for long-running jobs:streaming- 2 minute timeout, 30 minute stream_timeout
Example with timeout override
# For a long-running ML inference call
Vllm.EnvOverride.predict(data, __runtime__: [timeout_profile: :ml_inference])
# Or explicit timeout
Vllm.EnvOverride.predict(data, __runtime__: [timeout: 600_000])
# Route to a pool and enforce strict affinity
Vllm.EnvOverride.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])See SnakeBridge.Defaults for global timeout configuration.
Summary
Functions
Workaround for TorchInductor autotune get_raw_stream() bug.
(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN
Gets signature for each graph partition, including input nodes, output nodes, and
The main purpose of this function is to ensure that loggers are
Check if the installed torch version is == the target version.
Python module attribute vllm.env_override.logger.
Python binding for vllm.env_override.memory_plan_reuse_patched.
Return True if we should partition the inductor graph on this node
Functions
@spec _patch_get_raw_stream_if_needed(keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Workaround for TorchInductor autotune get_raw_stream() bug.
Returns
term()
@spec _update_scheduler_patched( term(), keyword() ) :: {:ok, nil} | {:error, Snakepit.Error.t()}
(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN
files should be generated (to avoid biasing any benchmarks and pessimizing fusion decisions).
Parameters
self(term())
Returns
nil
@spec get_graph_partition_signature_patched(term(), term(), [boolean()], keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Gets signature for each graph partition, including input nodes, output nodes, and
whether deallocating an input within graph partition.
Parameters
self(term())partitions(term())skip_cudagraphs(list(boolean()))
Returns
term()
@spec init_logger( String.t(), keyword() ) :: {:ok, Vllm.Logger.VllmLogger.t()} | {:error, Snakepit.Error.t()}
The main purpose of this function is to ensure that loggers are
retrieved in such a way that we can be sure the root vllm logger has already been configured.
Parameters
name(String.t())
Returns
Vllm.Logger.VllmLogger.t()
@spec is_torch_equal( String.t(), keyword() ) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Check if the installed torch version is == the target version.
Parameters
target- a version string, like "2.6.0".
Returns
boolean()
@spec logger() :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.env_override.logger.
Returns
term()
@spec memory_plan_reuse_patched( term(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python binding for vllm.env_override.memory_plan_reuse_patched.
Parameters
self(term())
Returns
term()
@spec should_partition_patched(term(), term()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Return True if we should partition the inductor graph on this node
Parameters
self(term())node(term())should_log(boolean() default: False)
Returns
boolean()
@spec should_partition_patched(term(), term(), keyword()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
@spec should_partition_patched(term(), term(), boolean()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}