LlamaCppEx.Thinking (LlamaCppEx v0.7.0)

Copy Markdown View Source

Parser for <think>...</think> blocks in thinking model output.

Thinking models (e.g. Qwen 3.5 with enable_thinking: true) wrap their chain-of-thought reasoning in <think>...</think> tags. This module provides both a one-shot parser for complete text and a streaming parser that handles token boundary splits.

Summary

Functions

Feeds a text chunk to the streaming parser.

Splits completed text into {reasoning_content, content}.

Creates a new streaming parser state.

Functions

feed(parser, text)

@spec feed(map(), String.t()) :: {[{:thinking | :content, String.t()}], map()}

Feeds a text chunk to the streaming parser.

Returns {events, new_parser} where events are {:thinking, text} or {:content, text} tuples.

The parser buffers partial <think> and </think> tags to correctly handle token boundary splits.

Examples

parser = LlamaCppEx.Thinking.stream_parser()
{events, parser} = LlamaCppEx.Thinking.feed(parser, "<think>")
# events = []  (tag consumed)
{events, parser} = LlamaCppEx.Thinking.feed(parser, "reasoning")
# events = [{:thinking, "reasoning"}]
{events, _parser} = LlamaCppEx.Thinking.feed(parser, "</think>answer")
# events = [{:content, "answer"}]

parse(text)

@spec parse(String.t()) :: {String.t(), String.t()}

Splits completed text into {reasoning_content, content}.

Handles both explicit <think>...</think> wrapping and the common case where the chat template already opened the <think> block (so generated text starts directly with reasoning followed by </think>).

Examples

iex> LlamaCppEx.Thinking.parse("<think>I need to think</think>The answer is 42")
{"I need to think", "The answer is 42"}

iex> LlamaCppEx.Thinking.parse("reasoning here\n</think>\nThe answer is 42")
{"reasoning here", "The answer is 42"}

iex> LlamaCppEx.Thinking.parse("Just a response")
{"", "Just a response"}

iex> LlamaCppEx.Thinking.parse("<think>reasoning only</think>")
{"reasoning only", ""}

stream_parser(opts \\ [])

@spec stream_parser(keyword()) :: map()

Creates a new streaming parser state.

Use with feed/2 to incrementally parse streamed tokens.

Options

  • :thinking - When true, assumes the template already opened a <think> block, so generated text starts in thinking mode. Defaults to false.