Vllm.ForwardContext.Module (VLLM v0.3.0)

Copy Markdown View Source

Submodule bindings for vllm.forward_context.

Version

  • Requested: 0.14.0
  • Observed at generation: 0.14.0

Runtime Options

All functions accept a __runtime__ option for controlling execution behavior:

Vllm.ForwardContext.Module.some_function(args, __runtime__: [timeout: 120_000])

Supported runtime options

  • :timeout - Call timeout in milliseconds (default: 120,000ms / 2 minutes)
  • :timeout_profile - Use a named profile (:default, :ml_inference, :batch_job, :streaming)
  • :stream_timeout - Timeout for streaming operations (default: 1,800,000ms / 30 minutes)
  • :session_id - Override the session ID for this call
  • :pool_name - Target a specific Snakepit pool (multi-pool setups)
  • :affinity - Override session affinity (:hint, :strict_queue, :strict_fail_fast)

Timeout Profiles

  • :default - 2 minute timeout for regular calls
  • :ml_inference - 10 minute timeout for ML/LLM workloads
  • :batch_job - Unlimited timeout for long-running jobs
  • :streaming - 2 minute timeout, 30 minute stream_timeout

Example with timeout override

# For a long-running ML inference call
Vllm.ForwardContext.Module.predict(data, __runtime__: [timeout_profile: :ml_inference])

# Or explicit timeout
Vllm.ForwardContext.Module.predict(data, __runtime__: [timeout: 600_000])

# Route to a pool and enforce strict affinity
Vllm.ForwardContext.Module.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])

See SnakeBridge.Defaults for global timeout configuration.

Summary

Functions

Python binding for vllm.forward_context._compute_sp_num_tokens.

Python binding for vllm.forward_context._forward_context.

Python module attribute vllm.forward_context.batchsize_forward_time.

Python module attribute vllm.forward_context.batchsize_logging_interval.

Coordinates amongst all DP ranks to determine if and how the full batch

Python binding for vllm.forward_context.create_forward_context.

Python module attribute vllm.forward_context.current_platform.

Python module attribute vllm.forward_context.forward_start_time.

Get the current forward context.

The main purpose of this function is to ensure that loggers are

Python binding for vllm.forward_context.is_forward_context_available.

Python module attribute vllm.forward_context.last_logging_time.

Python module attribute vllm.forward_context.logger.

A context manager that overrides the current forward context.

A context manager that stores the current forward context,

Python module attribute vllm.forward_context.track_batchsize.

Built-in mutable sequence.

Functions

_compute_chunked_local_num_tokens(num_tokens_across_dp_cpu, sequence_parallel_size, max_num_tokens, chunk_idx, opts \\ [])

@spec _compute_chunked_local_num_tokens(
  term(),
  integer(),
  integer(),
  integer(),
  keyword()
) ::
  {:ok, [integer()]} | {:error, Snakepit.Error.t()}

Python binding for vllm.forward_context._compute_chunked_local_num_tokens.

Parameters

  • num_tokens_across_dp_cpu (term())
  • sequence_parallel_size (integer())
  • max_num_tokens (integer())
  • chunk_idx (integer())

Returns

  • list(integer())

_compute_sp_num_tokens(num_tokens_across_dp_cpu, sequence_parallel_size, opts \\ [])

@spec _compute_sp_num_tokens(term(), integer(), keyword()) ::
  {:ok, [integer()]} | {:error, Snakepit.Error.t()}

Python binding for vllm.forward_context._compute_sp_num_tokens.

Parameters

  • num_tokens_across_dp_cpu (term())
  • sequence_parallel_size (integer())

Returns

  • list(integer())

_forward_context()

@spec _forward_context() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python binding for vllm.forward_context._forward_context.

Returns

  • term()

batchsize_forward_time()

@spec batchsize_forward_time() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.batchsize_forward_time.

Returns

  • term()

batchsize_logging_interval()

@spec batchsize_logging_interval() :: {:ok, float()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.batchsize_logging_interval.

Returns

  • float()

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config)

@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term()) ::
  {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

Coordinates amongst all DP ranks to determine if and how the full batch

should be split into microbatches.

Parameters

  • num_tokens_unpadded - Number of tokens without accounting for padding
  • allow_microbatching - If microbatching should be attempted
  • allow_dp_padding - If all DP ranks should be padded up to the same value
  • parallel_config - The parallel config
  • num_tokens_padded - Number of tokens including any non-DP padding (CUDA graphs, TP, etc)
  • uniform_decode - Only used if allow_microbatching is True. True if the batch only contains single token decodes
  • num_scheduled_tokens_per_request - Only used if allow_microbatching is True. The number of tokens per request.
  • cudagraph_mode - The cudagraph mode for this rank (0=NONE, 1=PIECEWISE, 2=FULL)

Returns

  • {boolean(), term(), integer()}

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config, opts)

@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term(), keyword()) ::
  {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp(integer(), boolean(), boolean(), term(), term()) ::
  {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config, num_tokens_padded, opts)

@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  keyword()
) ::
  {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term()
) ::
  {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config, num_tokens_padded, uniform_decode, opts)

@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term(),
  keyword()
) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term(),
  term()
) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config, num_tokens_padded, uniform_decode, num_scheduled_tokens_per_request, opts)

@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term(),
  term(),
  keyword()
) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}
@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term(),
  term(),
  integer()
) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

coordinate_batch_across_dp(num_tokens_unpadded, allow_microbatching, allow_dp_padding, parallel_config, num_tokens_padded, uniform_decode, num_scheduled_tokens_per_request, cudagraph_mode, opts)

@spec coordinate_batch_across_dp(
  integer(),
  boolean(),
  boolean(),
  term(),
  term(),
  term(),
  term(),
  integer(),
  keyword()
) :: {:ok, {boolean(), term(), integer()}} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config)

@spec create_forward_context(term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python binding for vllm.forward_context.create_forward_context.

Parameters

  • attn_metadata (term())
  • vllm_config (term())
  • virtual_engine (integer() default: 0)
  • dp_metadata (term() default: None)
  • cudagraph_runtime_mode (term() default: <CUDAGraphMode.NONE: 0>)
  • batch_descriptor (term() default: None)
  • ubatch_slices (term() default: None)
  • additional_kwargs (term() default: None)

Returns

  • term()

create_forward_context(attn_metadata, vllm_config, opts)

@spec create_forward_context(term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term(), integer()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, opts)

@spec create_forward_context(term(), term(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term(), integer(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, dp_metadata, opts)

@spec create_forward_context(term(), term(), integer(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term(), integer(), term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, dp_metadata, cudagraph_runtime_mode, opts)

@spec create_forward_context(term(), term(), integer(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(term(), term(), integer(), term(), term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, dp_metadata, cudagraph_runtime_mode, batch_descriptor, opts)

@spec create_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  keyword()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, dp_metadata, cudagraph_runtime_mode, batch_descriptor, ubatch_slices, opts)

@spec create_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec create_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  term()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

create_forward_context(attn_metadata, vllm_config, virtual_engine, dp_metadata, cudagraph_runtime_mode, batch_descriptor, ubatch_slices, additional_kwargs, opts)

@spec create_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

current_platform()

@spec current_platform() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.current_platform.

Returns

  • term()

forward_start_time()

@spec forward_start_time() :: {:ok, integer()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.forward_start_time.

Returns

  • integer()

get_forward_context(opts \\ [])

@spec get_forward_context(keyword()) ::
  {:ok, Vllm.ForwardContext.t()} | {:error, Snakepit.Error.t()}

Get the current forward context.

Returns

  • Vllm.ForwardContext.t()

init_logger(name, opts \\ [])

@spec init_logger(
  String.t(),
  keyword()
) :: {:ok, Vllm.Logger.VllmLogger.t()} | {:error, Snakepit.Error.t()}

The main purpose of this function is to ensure that loggers are

retrieved in such a way that we can be sure the root vllm logger has already been configured.

Parameters

  • name (String.t())

Returns

  • Vllm.Logger.VllmLogger.t()

is_forward_context_available(opts \\ [])

@spec is_forward_context_available(keyword()) ::
  {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python binding for vllm.forward_context.is_forward_context_available.

Returns

  • boolean()

last_logging_time()

@spec last_logging_time() :: {:ok, integer()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.last_logging_time.

Returns

  • integer()

logger()

@spec logger() :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.logger.

Returns

  • term()

override_forward_context(forward_context, opts \\ [])

@spec override_forward_context(
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

A context manager that overrides the current forward context.

This is used to override the forward context for a specific forward pass.

Parameters

  • forward_context (term())

Returns

  • term()

set_forward_context(attn_metadata, vllm_config)

@spec set_forward_context(term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

A context manager that stores the current forward context,

can be attention metadata, etc. Here we can inject common logic for every model forward pass.

Parameters

  • attn_metadata (term())
  • vllm_config (term())
  • virtual_engine (integer() default: 0)
  • num_tokens (term() default: None)
  • num_tokens_across_dp (term() default: None)
  • cudagraph_runtime_mode (term() default: <CUDAGraphMode.NONE: 0>)
  • batch_descriptor (term() default: None)
  • ubatch_slices (term() default: None)

Returns

  • term()

set_forward_context(attn_metadata, vllm_config, opts)

@spec set_forward_context(term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, opts)

@spec set_forward_context(term(), term(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, num_tokens, opts)

@spec set_forward_context(term(), term(), integer(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer(), term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, num_tokens, num_tokens_across_dp, opts)

@spec set_forward_context(term(), term(), integer(), term(), term(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer(), term(), term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, num_tokens, num_tokens_across_dp, cudagraph_runtime_mode, opts)

@spec set_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  keyword()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(term(), term(), integer(), term(), term(), term(), term()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, num_tokens, num_tokens_across_dp, cudagraph_runtime_mode, batch_descriptor, opts)

@spec set_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  keyword()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}
@spec set_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  term()
) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

set_forward_context(attn_metadata, vllm_config, virtual_engine, num_tokens, num_tokens_across_dp, cudagraph_runtime_mode, batch_descriptor, ubatch_slices, opts)

@spec set_forward_context(
  term(),
  term(),
  integer(),
  term(),
  term(),
  term(),
  term(),
  term(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

track_batchsize()

@spec track_batchsize() :: {:ok, boolean()} | {:error, Snakepit.Error.t()}

Python module attribute vllm.forward_context.track_batchsize.

Returns

  • boolean()

u_batch_slices(opts \\ [])

@spec u_batch_slices(keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

Parameters

  • args (term())
  • kwargs (term())

Returns

  • term()