LlamaCppEx.Server.Strategy.PrefillPriority (LlamaCppEx v0.7.0)

Copy Markdown View Source

Prefill-priority batching strategy.

Prefill chunks are added to the batch first, decode tokens fill the remaining budget. This prioritizes getting new requests through prefill quickly, which is optimal for batch processing workloads where overall throughput matters more than per-request generation latency.