Nasty.Statistics.Neural.Transformers.Inference (Nasty v0.3.0)
View SourceOptimized inference for transformer models.
Provides optimizations including:
- Batch processing for multiple documents
- Model quantization for faster inference
- EXLA compilation for GPU acceleration
- Prediction caching for repeated inputs
Summary
Functions
Performs batch prediction on multiple document sequences.
Gets cache statistics.
Clears the prediction cache.
Optimizes a model for inference.
Predicts labels for a single sequence using optimized model.
Types
@type optimization() :: :quantize | :compile | :gpu | :cache
@type optimized_model() :: %{ classifier: map(), optimizations: [optimization()], cache: :ets.tid() | nil, compiled_serving: pid() | nil }
Functions
@spec batch_predict(optimized_model(), [[Nasty.AST.Token.t()]], keyword()) :: {:ok, [[map()]]} | {:error, term()}
Performs batch prediction on multiple document sequences.
More efficient than individual predictions for processing many documents.
Examples
{:ok, all_predictions} = Inference.batch_predict(
optimized_model,
[doc1_tokens, doc2_tokens, doc3_tokens]
)
@spec cache_stats(optimized_model()) :: {:ok, map()} | :no_cache
Gets cache statistics.
Examples
{:ok, stats} = Inference.cache_stats(optimized_model)
# => %{entries: 150, hits: 450, misses: 50}
@spec clear_cache(optimized_model()) :: :ok
Clears the prediction cache.
Examples
Inference.clear_cache(optimized_model)
@spec optimize_for_inference( map(), keyword() ) :: {:ok, optimized_model()} | {:error, term()}
Optimizes a model for inference.
Options
:optimizations- List of optimizations to apply (default: [:compile]):cache_size- Maximum number of cached predictions (default: 1000):device- Device to use (:cpu or :cuda, default: :cpu)
Examples
{:ok, optimized} = Inference.optimize_for_inference(classifier,
optimizations: [:compile, :cache],
device: :cuda
)
@spec predict(optimized_model(), [Nasty.AST.Token.t()], keyword()) :: {:ok, [map()]} | {:error, term()}
Predicts labels for a single sequence using optimized model.
Falls back to cache if available.
Examples
{:ok, predictions} = Inference.predict(optimized_model, tokens)