Thinking mode enables models to show their reasoning process before providing a final answer. This is useful for complex problems where you want to see how the model arrives at its conclusion.

Overview

When thinking is enabled, the model response includes both:

  • thinking - The model's internal reasoning process
  • content - The final answer

Enabling Thinking

{:ok, response} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "How many Rs are in 'strawberry'?"}],
  think: true
)

# Access thinking and response
IO.puts("Thinking: #{response["message"]["thinking"]}")
IO.puts("Answer: #{response["message"]["content"]}")

Thinking Levels

Some models support different thinking intensities:

LevelDescriptionUse Case
trueDefault thinkingGeneral reasoning
"low"Light thinkingSimple problems
"medium"Standard thinkingModerate complexity
"high"Deep reasoningComplex problems
# Deep reasoning for complex problems
{:ok, response} = Ollixir.chat(client,
  model: "gpt-oss:20b-cloud",
  messages: [%{role: "user", content: "Prove that sqrt(2) is irrational."}],
  think: "high"
)

Compatible Models

ModelThink SupportLevels
deepseek-r1YesBoolean only
deepseek-r1:1.5bYesBoolean only
deepseek-v3.1:671b-cloudYesBoolean only
gemini-3-flash-preview:cloudYesBoolean only
gpt-oss:20b-cloudYeslow/medium/high
gpt-oss:120b-cloudYeslow/medium/high
kimi-k2-thinking:cloudYesBoolean only
kimi-k2.5:cloudYesBoolean only
minimax-m2.5:cloudYesBoolean only
nemotron-3-nano:30b-cloudYesBoolean only
qwen3YesBoolean only

Streaming with Thinking

Stream thinking and response chunks separately:

{:ok, stream} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "Explain recursion step by step."}],
  think: true,
  stream: true
)

stream
|> Stream.each(fn chunk ->
  if thinking = get_in(chunk, ["message", "thinking"]) do
    IO.write("[thinking] #{thinking}")
  end
  if content = get_in(chunk, ["message", "content"]) do
    IO.write(content)
  end
end)
|> Stream.run()

Thinking with Typed Responses

{:ok, response} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "What is 15 * 23?"}],
  think: true,
  response_format: :struct
)

IO.puts("Thinking: #{response.message.thinking}")
IO.puts("Answer: #{response.message.content}")

Completion with Thinking

Thinking also works with the completion endpoint:

{:ok, response} = Ollixir.completion(client,
  model: "deepseek-r1:1.5b",
  prompt: "Calculate the factorial of 7, showing your work.",
  think: true
)

IO.puts("Thinking: #{response["thinking"]}")
IO.puts("Response: #{response["response"]}")

Error Handling

Not all models support thinking. Handle unsupported models gracefully:

case Ollixir.chat(client, model: model, messages: messages, think: true) do
  {:ok, response} ->
    response

  {:error, %Ollixir.ResponseError{status: 400}} ->
    # Model doesn't support thinking, retry without
    {:ok, response} = Ollixir.chat(client, model: model, messages: messages)
    response

  {:error, error} ->
    raise "Request failed: #{inspect(error)}"
end

Performance Considerations

  • Thinking increases response time and token usage
  • Use "low" for simple problems to reduce latency
  • Use "high" only when deep reasoning is needed
  • Consider disabling thinking for straightforward queries

Best Practices

  1. Match level to complexity - Don't use high thinking for simple questions
  2. Stream long responses - Thinking can produce lengthy output
  3. Handle unsupported models - Not all models support thinking
  4. Consider token costs - Thinking tokens count toward usage

See Also