Groq Provider Guide
View SourceGroq provides ultra-fast LLM inference with their custom hardware, delivering exceptional performance for real-time applications.
Configuration
Set your Groq API key:
# Add to .env file (automatically loaded)
GROQ_API_KEY=gsk_...
Or use in-memory storage:
ReqLLM.put_key(:groq_api_key, "gsk_...")Supported Models
Popular Groq models include:
llama-3.3-70b-versatile- Latest Llama 3.3llama-3.1-8b-instant- Fast, efficientmixtral-8x7b-32768- Large context windowgemma2-9b-it- Google's Gemma 2
See the full list with mix req_llm.model_sync groq.
Basic Usage
# Simple text generation
{:ok, response} = ReqLLM.generate_text(
"groq:llama-3.3-70b-versatile",
"Explain async programming"
)
# Streaming (ultra-fast with Groq hardware)
{:ok, stream_response} = ReqLLM.stream_text(
"groq:llama-3.1-8b-instant",
"Write a story"
)
ReqLLM.StreamResponse.tokens(stream_response)
|> Stream.each(&IO.write/1)
|> Stream.run()Provider-Specific Options
Service Tier
Control performance tier for requests:
{:ok, response} = ReqLLM.generate_text(
"groq:llama-3.3-70b-versatile",
"Hello",
provider_options: [service_tier: "performance"]
)Tiers:
"auto"- Automatic selection (default)"on_demand"- Standard on-demand"flex"- Flexible pricing"performance"- Highest performance
Reasoning Effort
Control reasoning level for compatible models:
{:ok, response} = ReqLLM.generate_text(
"groq:deepseek-r1-distill-llama-70b",
"Complex problem",
provider_options: [reasoning_effort: "high"]
)Levels: "none", "default", "low", "medium", "high"
Reasoning Format
Specify format for reasoning output:
{:ok, response} = ReqLLM.generate_text(
"groq:deepseek-r1-distill-llama-70b",
"Problem to solve",
provider_options: [reasoning_format: "detailed"]
)Web Search
Enable web search capabilities:
{:ok, response} = ReqLLM.generate_text(
"groq:llama-3.3-70b-versatile",
"Latest tech news",
provider_options: [
search_settings: %{
include_domains: ["techcrunch.com", "arstechnica.com"],
exclude_domains: ["spam.com"]
}
]
)Compound Custom
Custom configuration for Compound systems:
{:ok, response} = ReqLLM.generate_text(
"groq:model",
"Text",
provider_options: [
compound_custom: %{
# Compound-specific settings
}
]
)Complete Example
import ReqLLM.Context
context = Context.new([
system("You are a fast, helpful coding assistant"),
user("Explain tail call optimization")
])
{:ok, response} = ReqLLM.generate_text(
"groq:llama-3.3-70b-versatile",
context,
temperature: 0.7,
max_tokens: 1000,
provider_options: [
service_tier: "performance",
search_settings: %{
include_domains: ["developer.mozilla.org", "stackoverflow.com"]
}
]
)
text = ReqLLM.Response.text(response)
usage = response.usage
IO.puts(text)
IO.puts("Tokens: #{usage.total_tokens}, Cost: $#{usage.total_cost}")Tool Calling
Groq supports function calling on compatible models:
weather_tool = ReqLLM.tool(
name: "get_weather",
description: "Get weather for a location",
parameter_schema: [
location: [type: :string, required: true]
],
callback: {WeatherAPI, :fetch}
)
{:ok, response} = ReqLLM.generate_text(
"groq:llama-3.3-70b-versatile",
"What's the weather in Berlin?",
tools: [weather_tool]
)Structured Output
Groq supports structured output generation:
schema = [
name: [type: :string, required: true],
age: [type: :integer, required: true],
skills: [type: {:list, :string}]
]
{:ok, response} = ReqLLM.generate_object(
"groq:llama-3.3-70b-versatile",
"Generate a software engineer profile",
schema
)
person = ReqLLM.Response.object(response)Performance Tips
- Use Streaming: Groq's hardware excels at streaming - you'll see tokens instantly
- Choose Right Model: Use
8b-instantfor speed,70bfor quality - Service Tier: Use
"performance"tier for lowest latency - Batch Requests: Groq handles concurrent requests efficiently
Streaming Performance
Groq's custom hardware provides exceptional streaming performance:
{:ok, stream_response} = ReqLLM.stream_text(
"groq:llama-3.1-8b-instant",
"Count from 1 to 100"
)
# You'll see tokens appearing almost instantly
stream_response
|> ReqLLM.StreamResponse.tokens()
|> Stream.each(&IO.write/1)
|> Stream.run()Error Handling
case ReqLLM.generate_text("groq:llama-3.3-70b-versatile", "Hello") do
{:ok, response} ->
handle_success(response)
{:error, error} ->
IO.puts("Error: #{error.message}")
endKey Advantages
- Speed: Custom LPU hardware for ultra-fast inference
- Cost: Competitive pricing for high performance
- Reliability: Enterprise-grade infrastructure
- Compatibility: OpenAI-compatible API