Thinking mode enables models to show their reasoning process before providing a final answer. This is useful for complex problems where you want to see how the model arrives at its conclusion.
Overview
When thinking is enabled, the model response includes both:
thinking- The model's internal reasoning processcontent- The final answer
Enabling Thinking
{:ok, response} = Ollixir.chat(client,
model: "deepseek-r1:1.5b",
messages: [%{role: "user", content: "How many Rs are in 'strawberry'?"}],
think: true
)
# Access thinking and response
IO.puts("Thinking: #{response["message"]["thinking"]}")
IO.puts("Answer: #{response["message"]["content"]}")Thinking Levels
Some models support different thinking intensities:
| Level | Description | Use Case |
|---|---|---|
true | Default thinking | General reasoning |
"low" | Light thinking | Simple problems |
"medium" | Standard thinking | Moderate complexity |
"high" | Deep reasoning | Complex problems |
# Deep reasoning for complex problems
{:ok, response} = Ollixir.chat(client,
model: "gpt-oss:20b-cloud",
messages: [%{role: "user", content: "Prove that sqrt(2) is irrational."}],
think: "high"
)Compatible Models
| Model | Think Support | Levels |
|---|---|---|
deepseek-r1 | Yes | Boolean only |
deepseek-r1:1.5b | Yes | Boolean only |
deepseek-v3.1:671b-cloud | Yes | Boolean only |
gemini-3-flash-preview:cloud | Yes | Boolean only |
gpt-oss:20b-cloud | Yes | low/medium/high |
gpt-oss:120b-cloud | Yes | low/medium/high |
kimi-k2-thinking:cloud | Yes | Boolean only |
kimi-k2.5:cloud | Yes | Boolean only |
minimax-m2.5:cloud | Yes | Boolean only |
nemotron-3-nano:30b-cloud | Yes | Boolean only |
qwen3 | Yes | Boolean only |
Streaming with Thinking
Stream thinking and response chunks separately:
{:ok, stream} = Ollixir.chat(client,
model: "deepseek-r1:1.5b",
messages: [%{role: "user", content: "Explain recursion step by step."}],
think: true,
stream: true
)
stream
|> Stream.each(fn chunk ->
if thinking = get_in(chunk, ["message", "thinking"]) do
IO.write("[thinking] #{thinking}")
end
if content = get_in(chunk, ["message", "content"]) do
IO.write(content)
end
end)
|> Stream.run()Thinking with Typed Responses
{:ok, response} = Ollixir.chat(client,
model: "deepseek-r1:1.5b",
messages: [%{role: "user", content: "What is 15 * 23?"}],
think: true,
response_format: :struct
)
IO.puts("Thinking: #{response.message.thinking}")
IO.puts("Answer: #{response.message.content}")Completion with Thinking
Thinking also works with the completion endpoint:
{:ok, response} = Ollixir.completion(client,
model: "deepseek-r1:1.5b",
prompt: "Calculate the factorial of 7, showing your work.",
think: true
)
IO.puts("Thinking: #{response["thinking"]}")
IO.puts("Response: #{response["response"]}")Error Handling
Not all models support thinking. Handle unsupported models gracefully:
case Ollixir.chat(client, model: model, messages: messages, think: true) do
{:ok, response} ->
response
{:error, %Ollixir.ResponseError{status: 400}} ->
# Model doesn't support thinking, retry without
{:ok, response} = Ollixir.chat(client, model: model, messages: messages)
response
{:error, error} ->
raise "Request failed: #{inspect(error)}"
endPerformance Considerations
- Thinking increases response time and token usage
- Use
"low"for simple problems to reduce latency - Use
"high"only when deep reasoning is needed - Consider disabling thinking for straightforward queries
Best Practices
- Match level to complexity - Don't use high thinking for simple questions
- Stream long responses - Thinking can produce lengthy output
- Handle unsupported models - Not all models support thinking
- Consider token costs - Thinking tokens count toward usage
See Also
- Streaming Guide - Real-time thinking output
- Examples - Working thinking examples