ReqLLM.Providers.Groq (ReqLLM v1.0.0)
View SourceGroq provider – 100% OpenAI Chat Completions compatible with Groq's high-performance hardware.
Implementation
Uses built-in OpenAI-style encoding/decoding defaults. No custom request/response handling needed – leverages the standard OpenAI wire format.
Groq-Specific Extensions
Beyond standard OpenAI parameters, Groq supports:
service_tier- Performance tier (auto, on_demand, flex, performance)reasoning_effort- Reasoning level (none, default, low, medium, high)reasoning_format- Format for reasoning outputsearch_settings- Web search configurationcompound_custom- Custom Compound systems configurationlogit_bias- Token bias adjustments
See provider_schema/0 for the complete Groq-specific schema and
ReqLLM.Provider.Options for inherited OpenAI parameters.
Configuration
# Add to .env file (automatically loaded)
GROQ_API_KEY=gsk_...
Summary
Functions
Default implementation of attach/3.
Custom attach_stream that ensures translate_options is called for streaming requests.
Custom response decoding that normalizes <think> tags into reasoning content parts.
Default implementation of decode_stream_event/2.
Stateful SSE event decoding that normalizes <think> tags.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Custom body encoding that adds Groq-specific extensions to the default OpenAI-compatible format.
Default implementation of extract_usage/2.
Flush any remaining buffered content when stream ends.
Initialize streaming state for <think> tag normalization.
Custom prepare_request for :object operations to maintain Groq-specific max_tokens handling.
Default implementation of translate_options/3.
Functions
Default implementation of attach/3.
Sets up Bearer token authentication and standard pipeline steps.
Custom attach_stream that ensures translate_options is called for streaming requests.
This is necessary because the default streaming path doesn't call translate_options, which means model-specific option normalization (like omitting reasoning_effort for qwen models) wouldn't be applied to streaming requests.
Custom response decoding that normalizes <think> tags into reasoning content parts.
Some Groq models (qwen/qwen3-32b, deepseek-r1-distill-llama-70b) embed thinking content
within <think>...</think> tags in the message content field. This override normalizes
those responses to extract thinking content as :thinking content parts, matching the
behavior of models that use separate delta.reasoning fields.
For non-streaming: splits <think> blocks from message content into :thinking and :text parts.
For streaming: wraps the stream to convert embedded <think> sequences in chunks into separate chunks.
Default implementation of decode_stream_event/2.
Decodes SSE events using OpenAI-compatible format.
Stateful SSE event decoding that normalizes <think> tags.
Maintains state across events to handle tags split across chunks. Returns updated chunks and new state.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Custom body encoding that adds Groq-specific extensions to the default OpenAI-compatible format.
Adds support for:
- service_tier (auto, on_demand, flex, performance)
- reasoning_effort (none, default, low, medium, high)
- reasoning_format
- search_settings
- compound_custom
- logit_bias (in addition to standard options)
Default implementation of extract_usage/2.
Extracts usage data from standard usage field in response body.
Flush any remaining buffered content when stream ends.
Emits final thinking or text chunk if buffer is non-empty.
Initialize streaming state for <think> tag normalization.
Returns initial state with :text mode and empty buffer.
Custom prepare_request for :object operations to maintain Groq-specific max_tokens handling.
Ensures that structured output requests have adequate token limits while delegating other operations to the default implementation.
Default implementation of translate_options/3.
Pass-through implementation that returns options unchanged.