LlamaCppEx.Server.Strategy.DecodeMaximal (LlamaCppEx v0.7.0)

Copy Markdown View Source

Decode-maximal batching strategy.

Decode tokens (one per generating slot) are always added to the batch first. They represent active generation that users are waiting on, so they get priority. Remaining budget is filled with prefill chunks.

This is the default strategy and optimal for interactive use where low generation latency matters most.