Nasty.Statistics.Neural.Transformers.Inference (Nasty v0.3.0)

Optimized inference for transformer models.

Provides optimizations including:

Summary

optimization()

Performs batch prediction on multiple document sequences.

cache_stats(map)

Gets cache statistics.

clear_cache(map)

Clears the prediction cache.

Optimizes a model for inference.

Predicts labels for a single sequence using optimized model.

@type optimization() :: :quantize | :compile | :gpu | :cache

@type optimized_model() :: %{
  classifier: map(),
  optimizations: [optimization()],
  cache: :ets.tid() | nil,
  compiled_serving: pid() | nil
}

@spec batch_predict(optimized_model(), [[Nasty.AST.Token.t()]], keyword()) ::
  {:ok, [[map()]]} | {:error, term()}

Performs batch prediction on multiple document sequences.

More efficient than individual predictions for processing many documents.

{:ok, all_predictions} = Inference.batch_predict(
  optimized_model,
  [doc1_tokens, doc2_tokens, doc3_tokens]
)

@spec cache_stats(optimized_model()) :: {:ok, map()} | :no_cache

Gets cache statistics.

{:ok, stats} = Inference.cache_stats(optimized_model)
# => %{entries: 150, hits: 450, misses: 50}

@spec clear_cache(optimized_model()) :: :ok

Clears the prediction cache.

Inference.clear_cache(optimized_model)

@spec optimize_for_inference(
  map(),
  keyword()
) :: {:ok, optimized_model()} | {:error, term()}

Optimizes a model for inference.

{:ok, optimized} = Inference.optimize_for_inference(classifier,
  optimizations: [:compile, :cache],
  device: :cuda
)

@spec predict(optimized_model(), [Nasty.AST.Token.t()], keyword()) ::
  {:ok, [map()]} | {:error, term()}

Predicts labels for a single sequence using optimized model.

Falls back to cache if available.

{:ok, predictions} = Inference.predict(optimized_model, tokens)