Parser for <think>...</think> blocks in thinking model output.
Thinking models (e.g. Qwen 3.5 with enable_thinking: true) wrap their
chain-of-thought reasoning in <think>...</think> tags. This module provides
both a one-shot parser for complete text and a streaming parser that handles
token boundary splits.
Summary
Functions
Feeds a text chunk to the streaming parser.
Splits completed text into {reasoning_content, content}.
Creates a new streaming parser state.
Functions
Feeds a text chunk to the streaming parser.
Returns {events, new_parser} where events are {:thinking, text} or
{:content, text} tuples.
The parser buffers partial <think> and </think> tags to correctly handle
token boundary splits.
Examples
parser = LlamaCppEx.Thinking.stream_parser()
{events, parser} = LlamaCppEx.Thinking.feed(parser, "<think>")
# events = [] (tag consumed)
{events, parser} = LlamaCppEx.Thinking.feed(parser, "reasoning")
# events = [{:thinking, "reasoning"}]
{events, _parser} = LlamaCppEx.Thinking.feed(parser, "</think>answer")
# events = [{:content, "answer"}]
Splits completed text into {reasoning_content, content}.
Handles both explicit <think>...</think> wrapping and the common case where
the chat template already opened the <think> block (so generated text starts
directly with reasoning followed by </think>).
Examples
iex> LlamaCppEx.Thinking.parse("<think>I need to think</think>The answer is 42")
{"I need to think", "The answer is 42"}
iex> LlamaCppEx.Thinking.parse("reasoning here\n</think>\nThe answer is 42")
{"reasoning here", "The answer is 42"}
iex> LlamaCppEx.Thinking.parse("Just a response")
{"", "Just a response"}
iex> LlamaCppEx.Thinking.parse("<think>reasoning only</think>")
{"reasoning only", ""}
Creates a new streaming parser state.
Use with feed/2 to incrementally parse streamed tokens.
Options
:thinking- Whentrue, assumes the template already opened a<think>block, so generated text starts in thinking mode. Defaults tofalse.