Tinkex.SamplingClient (Tinkex v0.2.0)
View SourceSampling client that performs lock-free reads via ETS.
Init runs in a GenServer to create the sampling session and register state in
Tinkex.SamplingRegistry. Once initialized, sample/4 reads configuration
directly from ETS without touching the GenServer, avoiding bottlenecks under
high load.
For plain-text prompts, build a Tinkex.Types.ModelInput via
Tinkex.Types.ModelInput.from_text/2 with the target model name. Chat
templates are not applied automatically.
Queue State Observer
This client implements Tinkex.QueueStateObserver and automatically logs
human-readable warnings when queue state changes indicate rate limiting
or capacity issues:
[warning] Sampling is paused for session-123. Reason: concurrent LoRA rate limit hitLogs are debounced to once per 60 seconds per session to avoid spam.
Summary
Functions
Returns a specification to start this module under a supervisor.
Convenience helper to compute prompt token log probabilities.
Create a sampling client asynchronously.
Submit a sampling request.
Types
@type t() :: pid()
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec compute_logprobs(t(), map(), keyword()) :: {:ok, Task.t()} | {:error, Tinkex.Error.t()}
Convenience helper to compute prompt token log probabilities.
Returns a Task that yields {:ok, [float() | nil]} or {:error, %Tinkex.Error{}}.
Create a sampling client asynchronously.
This is a convenience function that delegates to ServiceClient.create_sampling_client_async/2.
Examples
task = SamplingClient.create_async(service_pid, base_model: "meta-llama/Llama-3.2-1B")
{:ok, sampling_pid} = Task.await(task)
Submit a sampling request.
Returns a Task.t() that yields {:ok, %SampleResponse{}} or
{:error, %Tinkex.Error{}}.
@spec start_link(keyword()) :: GenServer.on_start()