Torus.Embeddings.Batcher (Torus v0.5.2)
View SourceSize/time‑bounded batcher for embedding generation.
Torus.Embeddings.Batcher
is a long‑running GenServer that collects
individual generate/2
calls, groups them into a single batch, and forwards the
batch to the configured embedding_module
.
Why batch?
- Fewer model / network invocations – one request with n terms is cheaper than n single‑term requests.
- Lower latency under load – callers wait only for the current batch to flush, not for an entire queue of independent requests.
- Higher throughput per API quota – most providers charge per request, so batched calls extract more value from the same quota.
Flush conditions
A batch is flushed when either condition is met (whichever comes first):
- the queue reaches
max_batch_size
terms, or max_batch_wait_ms
elapses after the first term was queued.
Both limits are fully configurable.
Configuration
It's considered a good practise to batch requests to the embedding module, especially when you are dealing with a high-traffic applications.
To use it:
- Add the following to your
config.exs
:
config :torus, batcher: Torus.Embeddings.Batcher
config :torus, Torus.Embeddings.Batcher,
max_batch_size: 10,
default_batch_timeout: 100,
embedding_module: Torus.Embeddings.HuggingFace
- Add it to your supervision tree:
def start(_type, _args) do
children = [
# Your deps
Torus.Embeddings.Batcher
]
opts = [strategy: :one_for_one, name: YourApp.Supervisor]
Supervisor.start_link(children, opts)
end
- Configure your
embedding_module
of choice (see corresponding section)
And you should be good to call Torus.to_vector/1
and Torus.to_vectors/1
functions.
Also, you can configure call_timeout
option in Torus.to_vector/2
and Torus.to_vectors/2
functions to override the default timeout for the batching call. This is useful if you're okay to wait longer for the batch to flush and your embedder to generate the embedding.
See Torus.semantic/5
on how to use this module to introduce semantic search in your application.
Summary
Functions
Returns a specification to start this module under a supervisor.
Callback implementation for Torus.Embedding.embedding_model/1
.
Callback implementation for Torus.Embedding.generate/2
.
Callback implementation for GenServer.init/1
.
Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Callback implementation for Torus.Embedding.embedding_model/1
.
Callback implementation for Torus.Embedding.generate/2
.
Callback implementation for GenServer.init/1
.