Tinkex.TrainingClient.DataProcessor (Tinkex v0.3.4)
View SourceData chunking, numbering, and tensor operations for TrainingClient.
This module handles:
- Chunking training data based on size limits
- Estimating chunk sizes using byte heuristics
- Building placeholder gradients for custom loss
- Extracting target tokens from loss function inputs
Summary
Functions
Allocate sequential request IDs for a batch of requests.
Build placeholder gradients (zeros) for custom loss computation.
Chunk data into manageable pieces based on size and byte limits.
Extract target_tokens tensor from a datum's loss_fn_inputs.
Functions
@spec allocate_request_ids(non_neg_integer(), pos_integer()) :: {[pos_integer()], pos_integer()}
Allocate sequential request IDs for a batch of requests.
Returns {[id1, id2, ...], new_counter} where the IDs are consecutive
starting from the current counter.
@spec build_placeholder_gradients([Tinkex.Types.Datum.t()]) :: {:ok, [Nx.Tensor.t()]} | {:error, Tinkex.Error.t()}
Build placeholder gradients (zeros) for custom loss computation.
Creates zero-filled tensors matching the shape of target_tokens for each datum. These are used as placeholder gradients before the actual loss computation.
Chunk data into manageable pieces based on size and byte limits.
Ensures no chunk exceeds:
- 1024 items
- 5000000 total estimated bytes
@spec fetch_target_tokens_tensor(Tinkex.Types.Datum.t()) :: {:ok, Nx.Tensor.t()} | {:error, Tinkex.Error.t()}
Extract target_tokens tensor from a datum's loss_fn_inputs.
Supports both TensorData and Nx.Tensor formats.