Vllm.Grpc (VLLM v0.3.0)

Copy Markdown View Source

vLLM gRPC protocol definitions.

This module contains the protocol buffer definitions for vLLM's gRPC API. The protobuf files are compiled into Python code using grpcio-tools.

Version

  • Requested: 0.14.0
  • Observed at generation: 0.14.0

Runtime Options

All functions accept a __runtime__ option for controlling execution behavior:

Vllm.Grpc.some_function(args, __runtime__: [timeout: 120_000])

Supported runtime options

  • :timeout - Call timeout in milliseconds (default: 120,000ms / 2 minutes)
  • :timeout_profile - Use a named profile (:default, :ml_inference, :batch_job, :streaming)
  • :stream_timeout - Timeout for streaming operations (default: 1,800,000ms / 30 minutes)
  • :session_id - Override the session ID for this call
  • :pool_name - Target a specific Snakepit pool (multi-pool setups)
  • :affinity - Override session affinity (:hint, :strict_queue, :strict_fail_fast)

Timeout Profiles

  • :default - 2 minute timeout for regular calls
  • :ml_inference - 10 minute timeout for ML/LLM workloads
  • :batch_job - Unlimited timeout for long-running jobs
  • :streaming - 2 minute timeout, 30 minute stream_timeout

Example with timeout override

# For a long-running ML inference call
Vllm.Grpc.predict(data, __runtime__: [timeout_profile: :ml_inference])

# Or explicit timeout
Vllm.Grpc.predict(data, __runtime__: [timeout: 600_000])

# Route to a pool and enforce strict affinity
Vllm.Grpc.predict(data, __runtime__: [pool_name: :strict_pool, affinity: :strict_queue])

See SnakeBridge.Defaults for global timeout configuration.

Summary

Functions

Python module attribute vllm.grpc.__all__.

Functions

__all__()

@spec __all__() :: {:ok, [term()]} | {:error, Snakepit.Error.t()}

Python module attribute vllm.grpc.__all__.

Returns

  • list(term())