Nous.Providers.VLLM (nous v0.13.3)

View Source

vLLM provider implementation.

vLLM is a high-performance inference engine that provides an OpenAI-compatible API. By default it runs on http://localhost:8000/v1.

Configuration

No API key is required for local usage. Configure the base URL if needed:

config :nous, :vllm,
  base_url: "http://localhost:8000/v1"

Or use environment variable:

export VLLM_BASE_URL="http://localhost:8000/v1"

Usage

# Via Model.parse
model = Nous.Model.parse("vllm:meta-llama/Llama-3-8B-Instruct")

# Direct provider usage
{:ok, response} = Nous.Providers.VLLM.chat(%{
  "model" => "meta-llama/Llama-3-8B-Instruct",
  "messages" => [%{"role" => "user", "content" => "Hello"}]
})

Features

vLLM supports:

  • OpenAI-compatible chat completions
  • Streaming responses
  • High-throughput batched inference
  • PagedAttention for memory efficiency
  • Tensor parallelism for multi-GPU

vLLM-Specific Parameters

Additional parameters supported (pass in params map):

  • best_of - Number of outputs to generate and return the best
  • use_beam_search - Use beam search instead of sampling
  • ignore_eos - Ignore end-of-sequence token
  • skip_special_tokens - Skip special tokens in output

Summary

Functions

Get the API key from options, environment, or application config.

Get the base URL from options, application config, or default.

Count tokens in messages (rough estimate).

High-level request with message conversion, telemetry, and error wrapping.

High-level streaming request with message conversion and telemetry.

Functions

api_key(opts \\ [])

@spec api_key(keyword()) :: String.t() | nil

Get the API key from options, environment, or application config.

Lookup order:

  1. :api_key option passed directly
  2. Environment variable (VLLM_API_KEY)
  3. Application config: config :nous, vllm, api_key: "..."

base_url(opts \\ [])

@spec base_url(keyword()) :: String.t()

Get the base URL from options, application config, or default.

Lookup order:

  1. :base_url option passed directly
  2. Application config: config :nous, vllm, base_url: "..."
  3. Default: http://localhost:8000/v1

count_tokens(messages)

@spec count_tokens(list()) :: integer()

Count tokens in messages (rough estimate).

Override this in your provider for more accurate counting.

request(model, messages, settings)

High-level request with message conversion, telemetry, and error wrapping.

Default implementation that:

  1. Converts messages to provider format
  2. Builds request params
  3. Calls chat/2
  4. Parses response
  5. Emits telemetry events
  6. Wraps errors

request_stream(model, messages, settings)

High-level streaming request with message conversion and telemetry.