Submodule bindings for vllm.forward_context.
Version
- Requested: 0.14.0
- Observed at generation: 0.14.0
Runtime Options
All functions accept a __runtime__ option for controlling execution behavior:
Vllm.ForwardContext.Module.some_function(args, __runtime__: [timeout: 120_000])Supported runtime options
:timeout- Call timeout in milliseconds (default: 120,000ms / 2 minutes):timeout_profile- Use a named profile (:default,:ml_inference,:batch_job,:streaming):stream_timeout- Timeout for streaming operations (default: 1,800,000ms / 30 minutes):session_id- Override the session ID for this call:pool_name- Target a specific Snakepit pool (multi-pool setups):affinity- Override session affinity (:hint,:strict_queue,:strict_fail_fast)
Timeout Profiles
:default- 2 minute timeout for regular calls:ml_inference- 10 minute timeout for ML/LLM workloads:batch_job- Unlimited timeout for long-running jobs:streaming- 2 minute timeout, 30 minute stream_timeout
Example with timeout override
# For a long-running ML inference call
Vllm.ForwardContext.Module.predict(data, __runtime__: [timeout_profile: :ml_inference])
# Or explicit timeout
Vllm.ForwardContext.Module.predict(data, __runtime__: [timeout: 600_000])
# Route to a pool and enforce strict affinity
Vllm.ForwardContext.Module.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])See SnakeBridge.Defaults for global timeout configuration.
Summary
Functions
Python binding for vllm.forward_context._compute_chunked_local_num_tokens.
Python binding for vllm.forward_context._compute_sp_num_tokens.
Python binding for vllm.forward_context._forward_context.
Python module attribute vllm.forward_context.batchsize_forward_time.
Python module attribute vllm.forward_context.batchsize_logging_interval.
Coordinates amongst all DP ranks to determine if and how the full batch
Python binding for vllm.forward_context.create_forward_context.
Python module attribute vllm.forward_context.current_platform.
Python module attribute vllm.forward_context.forward_start_time.
Get the current forward context.
The main purpose of this function is to ensure that loggers are
Python binding for vllm.forward_context.is_forward_context_available.
Python module attribute vllm.forward_context.last_logging_time.
Python module attribute vllm.forward_context.logger.
A context manager that overrides the current forward context.
A context manager that stores the current forward context,
Python module attribute vllm.forward_context.track_batchsize.
Built-in mutable sequence.
Functions
@spec _compute_chunked_local_num_tokens( term(), integer(), integer(), integer(), keyword() ) :: {:ok, [integer()]} | {:error, Snakepit.Error.t()}
Python binding for vllm.forward_context._compute_chunked_local_num_tokens.
Parameters
num_tokens_across_dp_cpu(term())sequence_parallel_size(integer())max_num_tokens(integer())chunk_idx(integer())
Returns
list(integer())
@spec _compute_sp_num_tokens(term(), integer(), keyword()) :: {:ok, [integer()]} | {:error, Snakepit.Error.t()}
Python binding for vllm.forward_context._compute_sp_num_tokens.
Parameters
num_tokens_across_dp_cpu(term())sequence_parallel_size(integer())
Returns
list(integer())
@spec _forward_context() :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python binding for vllm.forward_context._forward_context.
Returns
term()
@spec batchsize_forward_time() :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.batchsize_forward_time.
Returns
term()
@spec batchsize_logging_interval() :: {:ok, float()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.batchsize_logging_interval.
Returns
float()
@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term()) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
Coordinates amongst all DP ranks to determine if and how the full batch
should be split into microbatches.
Parameters
num_tokens_unpadded- Number of tokens without accounting for paddingallow_microbatching- If microbatching should be attemptedallow_dp_padding- If all DP ranks should be padded up to the same valueparallel_config- The parallel confignum_tokens_padded- Number of tokens including any non-DP padding (CUDA graphs, TP, etc)uniform_decode- Only used if allow_microbatching is True. True if the batch only contains single token decodesnum_scheduled_tokens_per_request- Only used if allow_microbatching is True. The number of tokens per request.cudagraph_mode- The cudagraph mode for this rank (0=NONE, 1=PIECEWISE, 2=FULL)
Returns
{boolean(), term(), integer()}
@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term(), keyword()) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term(), term()) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), keyword() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), term() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), term(), keyword() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), term(), term() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), term(), term(), keyword() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp( integer(), boolean(), boolean(), term(), term(), term(), term(), integer() ) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python binding for vllm.forward_context.create_forward_context.
Parameters
attn_metadata(term())vllm_config(term())virtual_engine(integer() default: 0)dp_metadata(term() default: None)cudagraph_runtime_mode(term() default: <CUDAGraphMode.NONE: 0>)batch_descriptor(term() default: None)ubatch_slices(term() default: None)additional_kwargs(term() default: None)
Returns
term()
@spec create_forward_context(term(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term(), integer()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec current_platform() :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.current_platform.
Returns
term()
@spec forward_start_time() :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.forward_start_time.
Returns
integer()
@spec get_forward_context(keyword()) :: {:ok, Vllm.ForwardContext.t()} | {:error, Snakepit.Error.t()}
Get the current forward context.
Returns
Vllm.ForwardContext.t()
@spec init_logger( String.t(), keyword() ) :: {:ok, Vllm.Logger.VllmLogger.t()} | {:error, Snakepit.Error.t()}
The main purpose of this function is to ensure that loggers are
retrieved in such a way that we can be sure the root vllm logger has already been configured.
Parameters
name(String.t())
Returns
Vllm.Logger.VllmLogger.t()
@spec is_forward_context_available(keyword()) :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python binding for vllm.forward_context.is_forward_context_available.
Returns
boolean()
@spec last_logging_time() :: {:ok, integer()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.last_logging_time.
Returns
integer()
@spec logger() :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.logger.
Returns
term()
@spec override_forward_context( term(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
A context manager that overrides the current forward context.
This is used to override the forward context for a specific forward pass.
Parameters
forward_context(term())
Returns
term()
@spec set_forward_context(term(), term()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
A context manager that stores the current forward context,
can be attention metadata, etc. Here we can inject common logic for every model forward pass.
Parameters
attn_metadata(term())vllm_config(term())virtual_engine(integer() default: 0)num_tokens(term() default: None)num_tokens_across_dp(term() default: None)cudagraph_runtime_mode(term() default: <CUDAGraphMode.NONE: 0>)batch_descriptor(term() default: None)ubatch_slices(term() default: None)
Returns
term()
@spec set_forward_context(term(), term(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec track_batchsize() :: {:ok, boolean()} | {:error, Snakepit.Error.t()}
Python module attribute vllm.forward_context.track_batchsize.
Returns
boolean()
@spec u_batch_slices(keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
Parameters
args(term())kwargs(term())
Returns
term()