Nous.Providers.SGLang (nous v0.13.3)
View SourceSGLang provider implementation.
SGLang (Structured Generation Language) is a framework for efficient
LLM serving with an OpenAI-compatible API. By default it runs on
http://localhost:30000/v1.
Configuration
No API key is required for local usage. Configure the base URL if needed:
config :nous, :sglang,
base_url: "http://localhost:30000/v1"Or use environment variable:
export SGLANG_BASE_URL="http://localhost:30000/v1"Usage
# Via Model.parse
model = Nous.Model.parse("sglang:meta-llama/Llama-3-8B-Instruct")
# Direct provider usage
{:ok, response} = Nous.Providers.SGLang.chat(%{
"model" => "meta-llama/Llama-3-8B-Instruct",
"messages" => [%{"role" => "user", "content" => "Hello"}]
})Features
SGLang supports:
- OpenAI-compatible chat completions
- Streaming responses
- RadixAttention for KV cache reuse
- Constrained decoding (JSON, regex)
- Speculative decoding
- Multi-modal inputs
SGLang-Specific Parameters
Additional parameters supported (pass in params map):
regex- Constrain output to match a regex patternjson_schema- Constrain output to match a JSON schema
Summary
Functions
Get the API key from options, environment, or application config.
Get the base URL from options, application config, or default.
Count tokens in messages (rough estimate).
High-level request with message conversion, telemetry, and error wrapping.
High-level streaming request with message conversion and telemetry.
Functions
Get the API key from options, environment, or application config.
Lookup order:
:api_keyoption passed directly- Environment variable (SGLANG_API_KEY)
- Application config:
config :nous, sglang, api_key: "..."
Get the base URL from options, application config, or default.
Lookup order:
:base_urloption passed directly- Application config:
config :nous, sglang, base_url: "..." - Default: http://localhost:30000/v1
Count tokens in messages (rough estimate).
Override this in your provider for more accurate counting.
High-level request with message conversion, telemetry, and error wrapping.
Default implementation that:
- Converts messages to provider format
- Builds request params
- Calls chat/2
- Parses response
- Emits telemetry events
- Wraps errors
High-level streaming request with message conversion and telemetry.