Nasty.Statistics.Neural.Transformers.Inference (Nasty v0.3.0)

View Source

Optimized inference for transformer models.

Provides optimizations including:

  • Batch processing for multiple documents
  • Model quantization for faster inference
  • EXLA compilation for GPU acceleration
  • Prediction caching for repeated inputs

Summary

Functions

Performs batch prediction on multiple document sequences.

Gets cache statistics.

Clears the prediction cache.

Optimizes a model for inference.

Predicts labels for a single sequence using optimized model.

Types

optimization()

@type optimization() :: :quantize | :compile | :gpu | :cache

optimized_model()

@type optimized_model() :: %{
  classifier: map(),
  optimizations: [optimization()],
  cache: :ets.tid() | nil,
  compiled_serving: pid() | nil
}

Functions

batch_predict(optimized_model, document_sequences, opts \\ [])

@spec batch_predict(optimized_model(), [[Nasty.AST.Token.t()]], keyword()) ::
  {:ok, [[map()]]} | {:error, term()}

Performs batch prediction on multiple document sequences.

More efficient than individual predictions for processing many documents.

Examples

{:ok, all_predictions} = Inference.batch_predict(
  optimized_model,
  [doc1_tokens, doc2_tokens, doc3_tokens]
)

cache_stats(map)

@spec cache_stats(optimized_model()) :: {:ok, map()} | :no_cache

Gets cache statistics.

Examples

{:ok, stats} = Inference.cache_stats(optimized_model)
# => %{entries: 150, hits: 450, misses: 50}

clear_cache(map)

@spec clear_cache(optimized_model()) :: :ok

Clears the prediction cache.

Examples

Inference.clear_cache(optimized_model)

optimize_for_inference(classifier, opts \\ [])

@spec optimize_for_inference(
  map(),
  keyword()
) :: {:ok, optimized_model()} | {:error, term()}

Optimizes a model for inference.

Options

  • :optimizations - List of optimizations to apply (default: [:compile])
  • :cache_size - Maximum number of cached predictions (default: 1000)
  • :device - Device to use (:cpu or :cuda, default: :cpu)

Examples

{:ok, optimized} = Inference.optimize_for_inference(classifier,
  optimizations: [:compile, :cache],
  device: :cuda
)

predict(optimized_model, tokens, opts \\ [])

@spec predict(optimized_model(), [Nasty.AST.Token.t()], keyword()) ::
  {:ok, [map()]} | {:error, term()}

Predicts labels for a single sequence using optimized model.

Falls back to cache if available.

Examples

{:ok, predictions} = Inference.predict(optimized_model, tokens)