Performance Guide

Benchmarks, optimization tips, and performance considerations for Nasty.

Overview

Nasty is designed for accuracy and correctness first, with performance optimization as a secondary goal. However, there are many ways to improve throughput for production workloads.

Benchmark Results

Hardware Used

CPU: AMD Ryzen / Intel Core i7 (8 cores)
RAM: 16GB
Elixir: 1.14+
Erlang/OTP: 25+

Tokenization Speed

Language	Tokens/sec	Text Length	Time
English	~50,000	100 words	2ms
Spanish	~48,000	100 words	2ms
Catalan	~47,000	100 words	2ms

Note: NimbleParsec-based tokenization is very fast.

POS Tagging Speed

Model	Tokens/sec	Accuracy	Memory
Rule-based	~20,000	85%	10MB
HMM	~15,000	95%	50MB
Neural	~5,000	97-98%	200MB
Ensemble	~4,000	98%	250MB

Tradeoff: Accuracy vs. Speed

Parsing Speed

Task	Sentences/sec	Time (100 words)
Phrase parsing	~1,000	10ms
Full parse	~500	20ms
With deps	~400	25ms

Translation Speed

Operation	Time (per sentence)	Complexity
Simple (5 words)	15ms	Low
Medium (15 words)	35ms	Medium
Complex (30 words)	80ms	High

Includes: Parsing, translation, agreement, rendering

End-to-End Pipeline

Complete pipeline (tokenize → parse → analyze):

Document Size	Time (rule-based)	Time (HMM)	Time (neural)
100 words	50ms	80ms	250ms
500 words	200ms	350ms	1,200ms
1,000 words	400ms	700ms	2,400ms

Optimization Strategies

1. Use Appropriate Models

Choose the right model for your accuracy/speed requirements:

# Fast but less accurate
{:ok, tagged} = English.tag_pos(tokens, model: :rule)

# Balanced
{:ok, tagged} = English.tag_pos(tokens, model: :hmm)

# Most accurate but slowest
{:ok, tagged} = English.tag_pos(tokens, model: :neural)

2. Parallel Processing

Process multiple documents in parallel:

documents
|> Task.async_stream(
  fn doc -> process_document(doc) end,
  max_concurrency: System.schedulers_online(),
  timeout: 30_000
)
|> Enum.to_list()

Speedup: Near-linear with CPU cores for independent documents

3. Caching

Cache parsed documents to avoid re-parsing:

defmodule DocumentCache do
  use Agent

  def start_link(_) do
    Agent.start_link(fn -> %{} end, name: __MODULE__)
  end

  def get_or_parse(text, language) do
    key = {text, language}
    
    Agent.get_and_update(__MODULE__, fn cache ->
      case Map.get(cache, key) do
        nil ->
          {:ok, doc} = Nasty.parse(text, language: language)
          {doc, Map.put(cache, key, doc)}
        doc ->
          {doc, cache}
      end
    end)
  end
end

Speedup: ~10-100x for repeated texts

4. Selective Parsing

Skip expensive operations when not needed:

# Basic parsing (fast)
{:ok, doc} = English.parse(tokens)

# With semantic roles (slower)
{:ok, doc} = English.parse(tokens, semantic_roles: true)

# With coreference (slowest)
{:ok, doc} = English.parse(tokens, 
  semantic_roles: true,
  coreference: true
)

5. Batch Operations

Batch related operations together:

# Less efficient
Enum.each(documents, fn doc ->
  {:ok, tokens} = tokenize(doc)
  {:ok, tagged} = tag_pos(tokens)
  {:ok, parsed} = parse(tagged)
end)

# More efficient
documents
|> Enum.map(&tokenize/1)
|> Enum.map(&tag_pos/1)
|> Enum.map(&parse/1)

6. Model Pre-loading

Load models once at startup:

defmodule MyApp.Application do
  def start(_type, _args) do
    # Pre-load statistical models
    Nasty.Statistics.ModelLoader.load_from_priv("models/hmm.model")
    
    # ... rest of application startup
  end
end

7. Stream Processing

For large documents, process incrementally:

File.stream!("large_document.txt")
|> Stream.chunk_by(&(&1 == "\n"))
|> Stream.map(&process_paragraph/1)
|> Enum.to_list()

Memory Optimization

Memory Usage by Component

Component	Memory (baseline)	Per document
Tokenizer	5MB	~1KB
POS Tagger	50MB (HMM)	~5KB
Parser	10MB	~10KB
Neural Model	200MB	~50KB
Transformer	500MB	~100KB

Reducing Memory Usage

1. Use simpler models:

# Rule-based uses minimal memory
{:ok, tagged} = English.tag_pos(tokens, model: :rule)

2. Clear caches periodically:

# Clear parsed document cache
GenServer.call(DocumentCache, :clear)

3. Process in batches:

documents
|> Enum.chunk_every(100)
|> Enum.each(fn batch ->
  process_batch(batch)
  # Memory freed between batches
end)

4. Use garbage collection:

Enum.each(large_dataset, fn item ->
  process(item)
  
  # Force GC every 100 items
  if rem(index, 100) == 0 do
    :erlang.garbage_collect()
  end
end)

Profiling

Measuring Performance

# Simple timing
{time, result} = :timer.tc(fn ->
  Nasty.parse(text, language: :en)
end)

IO.puts("Took #{time / 1000}ms")

Using :eprof

:eprof.start()
:eprof.start_profiling([self()])

# Your code here
Nasty.parse(text, language: :en)

:eprof.stop_profiling()
:eprof.analyze(:total)

Using :fprof

:fprof.start()
:fprof.trace([:start])

# Your code here
Nasty.parse(text, language: :en)

:fprof.trace([:stop])
:fprof.profile()
:fprof.analyse()

Production Recommendations

For High-Throughput Systems

Use HMM models: Best balance of speed/accuracy
Enable parallel processing: 4-8x throughput improvement
Cache aggressively: Massive wins for repeated content
Pre-load models: Avoid startup latency
Monitor memory: Set limits and clear caches

For Low-Latency Systems

Use rule-based tagging: Fastest option
Skip optional analysis: Only parse what you need
Warm up: Run dummy requests on startup
Keep it simple: Avoid neural models for real-time

For Batch Processing

Use neural models: Maximize accuracy
Process in parallel: Utilize all cores
Stream large files: Don't load everything into memory
Checkpoint progress: Save intermediate results

Benchmarking Your Setup

Run the included benchmark:

# Create benchmark.exs
Mix.install([{:nasty, path: "."}])

alias Nasty.Language.English

texts = [
  "The quick brown fox jumps over the lazy dog.",
  "She sells seashells by the seashore.",
  "How much wood would a woodchuck chuck?"
]

# Warm up
Enum.each(texts, &English.tokenize/1)

# Benchmark
{time, _} = :timer.tc(fn ->
  Enum.each(1..1000, fn _ ->
    Enum.each(texts, fn text ->
      {:ok, tokens} = English.tokenize(text)
      {:ok, tagged} = English.tag_pos(tokens, model: :rule)
      {:ok, _doc} = English.parse(tagged)
    end)
  end)
end)

IO.puts("Processed 3000 documents in #{time / 1_000_000}s")
IO.puts("Throughput: #{3000 / (time / 1_000_000)} docs/sec")

Performance Comparison

vs. Other NLP Libraries

Library	Language	Speed	Accuracy
Nasty	Elixir	Medium	High
spaCy	Python	Fast	High
Stanford	Java	Slow	Very High
NLTK	Python	Slow	Medium

Nasty advantages:

Pure Elixir (no Python interop overhead)
Built-in parallelism via BEAM
AST-first design
Multi-language from ground up

Known Bottlenecks

Neural models: Slow inference (use HMM for speed)
Complex parsing: Can be slow for long sentences
Translation: Requires full parse + agreement + rendering
First request: Model loading adds latency

Future Optimizations

Planned improvements:

[ ] Compile-time grammar optimization
[ ] Native NIFs for hot paths
[ ] GPU acceleration for neural models
[ ] Incremental parsing for edits
[ ] Streaming translation
[ ] Model quantization (INT8/INT4)

Tips & Tricks

Monitor performance:

:observer.start()

Profile specific functions:

:fprof.apply(&Nasty.parse/2, [text, [language: :en]])

Check for memory leaks:

:recon.proc_count(:memory, 10)

Tune VM flags:

elixir --erl "+S 8:8" --erl "+sbwt very_long" yourscript.exs

Summary

Tokenization: Very fast (~50K tokens/sec)
POS Tagging: Fast to medium depending on model
Parsing: Medium speed (~500 sentences/sec)
Translation: Medium to slow depending on complexity
Optimization: Parallel processing gives best speedup
Production: Use HMM models with caching

For most applications, Nasty provides good throughput. For extreme performance needs, consider using rule-based models and aggressive caching.

← Previous Page Model Quantization Guide

Next Page → Architecture Refactoring Guide