Torus.Embeddings.Batcher (Torus v0.5.3)

Size/time‑bounded batcher for embedding generation.

Torus.Embeddings.Batcher is a long‑running GenServer that collects individual generate/2 calls, groups them into a single batch, and forwards the batch to the configured embedding_module.

Why batch?

Fewer model / network invocations – one request with n terms is cheaper than n single‑term requests.
Lower latency under load – callers wait only for the current batch to flush, not for an entire queue of independent requests.
Higher throughput per API quota – most providers charge per request, so batched calls extract more value from the same quota.

Flush conditions

A batch is flushed when either condition is met (whichever comes first):

the queue reaches max_batch_size terms, or
max_batch_wait_ms elapses after the first term was queued.

Both limits are fully configurable.

Configuration

It's considered a good practise to batch requests to the embedding module, especially when you are dealing with a high-traffic applications.

To use it:

Add the following to your config.exs:

 config :torus, batcher: Torus.Embeddings.Batcher

 config :torus, Torus.Embeddings.Batcher,
    max_batch_size: 10,
    default_batch_timeout: 100,
    embedding_module: Torus.Embeddings.HuggingFace

Add it to your supervision tree:

def start(_type, _args) do
  children = [
    # Your deps
    Torus.Embeddings.Batcher
  ]

  opts = [strategy: :one_for_one, name: YourApp.Supervisor]
  Supervisor.start_link(children, opts)
end

Configure your embedding_module of choice (see corresponding section)

And you should be good to call Torus.to_vector/1 and Torus.to_vectors/1 functions.

Also, you can configure call_timeout option in Torus.to_vector/2 and Torus.to_vectors/2 functions to override the default timeout for the batching call. This is useful if you're okay to wait longer for the batch to flush and your embedder to generate the embedding.

See Torus.semantic/5 on how to use this module to introduce semantic search in your application.