Cerebras

Ultra-fast inference with OpenAI-compatible API and Cerebras-specific optimizations.

Configuration

CEREBRAS_API_KEY=csk_...

Provider Options

No custom provider options - uses OpenAI-compatible defaults with Cerebras-specific handling.

Passed via standard ReqLLM options:

temperature, max_tokens, top_p
tools (with automatic strict: true for non-Qwen models)
tool_choice ("auto" or "none" only)

Implementation Notes

System Messages

System messages have stronger influence compared to OpenAI's implementation.

Tool Calling

Requires strict: true in tool schemas (automatically added)
Qwen models do NOT support strict: true (automatically excluded)
Only supports tool_choice: "auto" or "none" (not function-specific)

Streaming Limitations

Streaming not supported with:

Reasoning models in JSON mode
Tool calling scenarios

Unsupported OpenAI Features

These fields will result in a 400 error:

frequency_penalty
logit_bias
presence_penalty
parallel_tool_calls
service_tier

All restrictions handled automatically by ReqLLM.

Resources

Cerebras Documentation

← Previous Page Amazon Bedrock

Next Page → Meta (Llama)