# `LlamaCppEx.Server.Strategy.Balanced`
[🔗](https://github.com/nyo16/llama_cpp_ex/blob/main/lib/llama_cpp_ex/server/strategy/balanced.ex#L1)

Balanced batching strategy.

Splits the token budget equally between decode and prefill operations.
Decode tokens always use 1 token per slot, so the decode half is capped
at the number of generating slots. The prefill half gets the remainder.

Fair under mixed workloads where both generation latency and prefill
throughput matter equally.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
