Nous.Providers.SGLang (nous v0.13.3)

View Source

SGLang provider implementation.

SGLang (Structured Generation Language) is a framework for efficient LLM serving with an OpenAI-compatible API. By default it runs on http://localhost:30000/v1.

Configuration

No API key is required for local usage. Configure the base URL if needed:

config :nous, :sglang,
  base_url: "http://localhost:30000/v1"

Or use environment variable:

export SGLANG_BASE_URL="http://localhost:30000/v1"

Usage

# Via Model.parse
model = Nous.Model.parse("sglang:meta-llama/Llama-3-8B-Instruct")

# Direct provider usage
{:ok, response} = Nous.Providers.SGLang.chat(%{
  "model" => "meta-llama/Llama-3-8B-Instruct",
  "messages" => [%{"role" => "user", "content" => "Hello"}]
})

Features

SGLang supports:

  • OpenAI-compatible chat completions
  • Streaming responses
  • RadixAttention for KV cache reuse
  • Constrained decoding (JSON, regex)
  • Speculative decoding
  • Multi-modal inputs

SGLang-Specific Parameters

Additional parameters supported (pass in params map):

  • regex - Constrain output to match a regex pattern
  • json_schema - Constrain output to match a JSON schema

Summary

Functions

Get the API key from options, environment, or application config.

Get the base URL from options, application config, or default.

Count tokens in messages (rough estimate).

High-level request with message conversion, telemetry, and error wrapping.

High-level streaming request with message conversion and telemetry.

Functions

api_key(opts \\ [])

@spec api_key(keyword()) :: String.t() | nil

Get the API key from options, environment, or application config.

Lookup order:

  1. :api_key option passed directly
  2. Environment variable (SGLANG_API_KEY)
  3. Application config: config :nous, sglang, api_key: "..."

base_url(opts \\ [])

@spec base_url(keyword()) :: String.t()

Get the base URL from options, application config, or default.

Lookup order:

  1. :base_url option passed directly
  2. Application config: config :nous, sglang, base_url: "..."
  3. Default: http://localhost:30000/v1

count_tokens(messages)

@spec count_tokens(list()) :: integer()

Count tokens in messages (rough estimate).

Override this in your provider for more accurate counting.

request(model, messages, settings)

High-level request with message conversion, telemetry, and error wrapping.

Default implementation that:

  1. Converts messages to provider format
  2. Builds request params
  3. Calls chat/2
  4. Parses response
  5. Emits telemetry events
  6. Wraps errors

request_stream(model, messages, settings)

High-level streaming request with message conversion and telemetry.