Thinking Mode

Thinking mode enables models to show their reasoning process before providing a final answer. This is useful for complex problems where you want to see how the model arrives at its conclusion.

Overview

When thinking is enabled, the model response includes both:

thinking - The model's internal reasoning process
content - The final answer

Enabling Thinking

{:ok, response} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "How many Rs are in 'strawberry'?"}],
  think: true
)

# Access thinking and response
IO.puts("Thinking: #{response["message"]["thinking"]}")
IO.puts("Answer: #{response["message"]["content"]}")

Thinking Levels

Some models support different thinking intensities:

Level	Description	Use Case
`true`	Default thinking	General reasoning
`"low"`	Light thinking	Simple problems
`"medium"`	Standard thinking	Moderate complexity
`"high"`	Deep reasoning	Complex problems

# Deep reasoning for complex problems
{:ok, response} = Ollixir.chat(client,
  model: "gpt-oss:20b-cloud",
  messages: [%{role: "user", content: "Prove that sqrt(2) is irrational."}],
  think: "high"
)

Compatible Models

Model	Think Support	Levels
`deepseek-r1`	Yes	Boolean only
`deepseek-r1:1.5b`	Yes	Boolean only
`deepseek-v3.1:671b-cloud`	Yes	Boolean only
`gemini-3-flash-preview:cloud`	Yes	Boolean only
`gpt-oss:20b-cloud`	Yes	low/medium/high
`gpt-oss:120b-cloud`	Yes	low/medium/high
`kimi-k2-thinking:cloud`	Yes	Boolean only
`kimi-k2.5:cloud`	Yes	Boolean only
`minimax-m2.5:cloud`	Yes	Boolean only
`nemotron-3-nano:30b-cloud`	Yes	Boolean only
`qwen3`	Yes	Boolean only

Streaming with Thinking

Stream thinking and response chunks separately:

{:ok, stream} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "Explain recursion step by step."}],
  think: true,
  stream: true
)

stream
|> Stream.each(fn chunk ->
  if thinking = get_in(chunk, ["message", "thinking"]) do
    IO.write("[thinking] #{thinking}")
  end
  if content = get_in(chunk, ["message", "content"]) do
    IO.write(content)
  end
end)
|> Stream.run()

Thinking with Typed Responses

{:ok, response} = Ollixir.chat(client,
  model: "deepseek-r1:1.5b",
  messages: [%{role: "user", content: "What is 15 * 23?"}],
  think: true,
  response_format: :struct
)

IO.puts("Thinking: #{response.message.thinking}")
IO.puts("Answer: #{response.message.content}")

Completion with Thinking

Thinking also works with the completion endpoint:

{:ok, response} = Ollixir.completion(client,
  model: "deepseek-r1:1.5b",
  prompt: "Calculate the factorial of 7, showing your work.",
  think: true
)

IO.puts("Thinking: #{response["thinking"]}")
IO.puts("Response: #{response["response"]}")

Error Handling

Not all models support thinking. Handle unsupported models gracefully:

case Ollixir.chat(client, model: model, messages: messages, think: true) do
  {:ok, response} ->
    response

  {:error, %Ollixir.ResponseError{status: 400}} ->
    # Model doesn't support thinking, retry without
    {:ok, response} = Ollixir.chat(client, model: model, messages: messages)
    response

  {:error, error} ->
    raise "Request failed: #{inspect(error)}"
end

Performance Considerations

Thinking increases response time and token usage
Use "low" for simple problems to reduce latency
Use "high" only when deep reasoning is needed
Consider disabling thinking for straightforward queries

Best Practices

Match level to complexity - Don't use high thinking for simple questions
Stream long responses - Thinking can produce lengthy output
Handle unsupported models - Not all models support thinking
Consider token costs - Thinking tokens count toward usage