Performance Guide

View Source

Benchmarks, optimization tips, and performance considerations for Nasty.

Overview

Nasty is designed for accuracy and correctness first, with performance optimization as a secondary goal. However, there are many ways to improve throughput for production workloads.

Benchmark Results

Hardware Used

  • CPU: AMD Ryzen / Intel Core i7 (8 cores)
  • RAM: 16GB
  • Elixir: 1.14+
  • Erlang/OTP: 25+

Tokenization Speed

LanguageTokens/secText LengthTime
English~50,000100 words2ms
Spanish~48,000100 words2ms
Catalan~47,000100 words2ms

Note: NimbleParsec-based tokenization is very fast.

POS Tagging Speed

ModelTokens/secAccuracyMemory
Rule-based~20,00085%10MB
HMM~15,00095%50MB
Neural~5,00097-98%200MB
Ensemble~4,00098%250MB

Tradeoff: Accuracy vs. Speed

Parsing Speed

TaskSentences/secTime (100 words)
Phrase parsing~1,00010ms
Full parse~50020ms
With deps~40025ms

Translation Speed

OperationTime (per sentence)Complexity
Simple (5 words)15msLow
Medium (15 words)35msMedium
Complex (30 words)80msHigh

Includes: Parsing, translation, agreement, rendering

End-to-End Pipeline

Complete pipeline (tokenize → parse → analyze):

Document SizeTime (rule-based)Time (HMM)Time (neural)
100 words50ms80ms250ms
500 words200ms350ms1,200ms
1,000 words400ms700ms2,400ms

Optimization Strategies

1. Use Appropriate Models

Choose the right model for your accuracy/speed requirements:

# Fast but less accurate
{:ok, tagged} = English.tag_pos(tokens, model: :rule)

# Balanced
{:ok, tagged} = English.tag_pos(tokens, model: :hmm)

# Most accurate but slowest
{:ok, tagged} = English.tag_pos(tokens, model: :neural)

2. Parallel Processing

Process multiple documents in parallel:

documents
|> Task.async_stream(
  fn doc -> process_document(doc) end,
  max_concurrency: System.schedulers_online(),
  timeout: 30_000
)
|> Enum.to_list()

Speedup: Near-linear with CPU cores for independent documents

3. Caching

Cache parsed documents to avoid re-parsing:

defmodule DocumentCache do
  use Agent

  def start_link(_) do
    Agent.start_link(fn -> %{} end, name: __MODULE__)
  end

  def get_or_parse(text, language) do
    key = {text, language}
    
    Agent.get_and_update(__MODULE__, fn cache ->
      case Map.get(cache, key) do
        nil ->
          {:ok, doc} = Nasty.parse(text, language: language)
          {doc, Map.put(cache, key, doc)}
        doc ->
          {doc, cache}
      end
    end)
  end
end

Speedup: ~10-100x for repeated texts

4. Selective Parsing

Skip expensive operations when not needed:

# Basic parsing (fast)
{:ok, doc} = English.parse(tokens)

# With semantic roles (slower)
{:ok, doc} = English.parse(tokens, semantic_roles: true)

# With coreference (slowest)
{:ok, doc} = English.parse(tokens, 
  semantic_roles: true,
  coreference: true
)

5. Batch Operations

Batch related operations together:

# Less efficient
Enum.each(documents, fn doc ->
  {:ok, tokens} = tokenize(doc)
  {:ok, tagged} = tag_pos(tokens)
  {:ok, parsed} = parse(tagged)
end)

# More efficient
documents
|> Enum.map(&tokenize/1)
|> Enum.map(&tag_pos/1)
|> Enum.map(&parse/1)

6. Model Pre-loading

Load models once at startup:

defmodule MyApp.Application do
  def start(_type, _args) do
    # Pre-load statistical models
    Nasty.Statistics.ModelLoader.load_from_priv("models/hmm.model")
    
    # ... rest of application startup
  end
end

7. Stream Processing

For large documents, process incrementally:

File.stream!("large_document.txt")
|> Stream.chunk_by(&(&1 == "\n"))
|> Stream.map(&process_paragraph/1)
|> Enum.to_list()

Memory Optimization

Memory Usage by Component

ComponentMemory (baseline)Per document
Tokenizer5MB~1KB
POS Tagger50MB (HMM)~5KB
Parser10MB~10KB
Neural Model200MB~50KB
Transformer500MB~100KB

Reducing Memory Usage

1. Use simpler models:

# Rule-based uses minimal memory
{:ok, tagged} = English.tag_pos(tokens, model: :rule)

2. Clear caches periodically:

# Clear parsed document cache
GenServer.call(DocumentCache, :clear)

3. Process in batches:

documents
|> Enum.chunk_every(100)
|> Enum.each(fn batch ->
  process_batch(batch)
  # Memory freed between batches
end)

4. Use garbage collection:

Enum.each(large_dataset, fn item ->
  process(item)
  
  # Force GC every 100 items
  if rem(index, 100) == 0 do
    :erlang.garbage_collect()
  end
end)

Profiling

Measuring Performance

# Simple timing
{time, result} = :timer.tc(fn ->
  Nasty.parse(text, language: :en)
end)

IO.puts("Took #{time / 1000}ms")

Using :eprof

:eprof.start()
:eprof.start_profiling([self()])

# Your code here
Nasty.parse(text, language: :en)

:eprof.stop_profiling()
:eprof.analyze(:total)

Using :fprof

:fprof.start()
:fprof.trace([:start])

# Your code here
Nasty.parse(text, language: :en)

:fprof.trace([:stop])
:fprof.profile()
:fprof.analyse()

Production Recommendations

For High-Throughput Systems

  1. Use HMM models: Best balance of speed/accuracy
  2. Enable parallel processing: 4-8x throughput improvement
  3. Cache aggressively: Massive wins for repeated content
  4. Pre-load models: Avoid startup latency
  5. Monitor memory: Set limits and clear caches

For Low-Latency Systems

  1. Use rule-based tagging: Fastest option
  2. Skip optional analysis: Only parse what you need
  3. Warm up: Run dummy requests on startup
  4. Keep it simple: Avoid neural models for real-time

For Batch Processing

  1. Use neural models: Maximize accuracy
  2. Process in parallel: Utilize all cores
  3. Stream large files: Don't load everything into memory
  4. Checkpoint progress: Save intermediate results

Benchmarking Your Setup

Run the included benchmark:

# Create benchmark.exs
Mix.install([{:nasty, path: "."}])

alias Nasty.Language.English

texts = [
  "The quick brown fox jumps over the lazy dog.",
  "She sells seashells by the seashore.",
  "How much wood would a woodchuck chuck?"
]

# Warm up
Enum.each(texts, &English.tokenize/1)

# Benchmark
{time, _} = :timer.tc(fn ->
  Enum.each(1..1000, fn _ ->
    Enum.each(texts, fn text ->
      {:ok, tokens} = English.tokenize(text)
      {:ok, tagged} = English.tag_pos(tokens, model: :rule)
      {:ok, _doc} = English.parse(tagged)
    end)
  end)
end)

IO.puts("Processed 3000 documents in #{time / 1_000_000}s")
IO.puts("Throughput: #{3000 / (time / 1_000_000)} docs/sec")

Performance Comparison

vs. Other NLP Libraries

LibraryLanguageSpeedAccuracy
NastyElixirMediumHigh
spaCyPythonFastHigh
StanfordJavaSlowVery High
NLTKPythonSlowMedium

Nasty advantages:

  • Pure Elixir (no Python interop overhead)
  • Built-in parallelism via BEAM
  • AST-first design
  • Multi-language from ground up

Known Bottlenecks

  1. Neural models: Slow inference (use HMM for speed)
  2. Complex parsing: Can be slow for long sentences
  3. Translation: Requires full parse + agreement + rendering
  4. First request: Model loading adds latency

Future Optimizations

Planned improvements:

  • [ ] Compile-time grammar optimization
  • [ ] Native NIFs for hot paths
  • [ ] GPU acceleration for neural models
  • [ ] Incremental parsing for edits
  • [ ] Streaming translation
  • [ ] Model quantization (INT8/INT4)

Tips & Tricks

Monitor performance:

:observer.start()

Profile specific functions:

:fprof.apply(&Nasty.parse/2, [text, [language: :en]])

Check for memory leaks:

:recon.proc_count(:memory, 10)

Tune VM flags:

elixir --erl "+S 8:8" --erl "+sbwt very_long" yourscript.exs

Summary

  • Tokenization: Very fast (~50K tokens/sec)
  • POS Tagging: Fast to medium depending on model
  • Parsing: Medium speed (~500 sentences/sec)
  • Translation: Medium to slow depending on complexity
  • Optimization: Parallel processing gives best speedup
  • Production: Use HMM models with caching

For most applications, Nasty provides good throughput. For extreme performance needs, consider using rule-based models and aggressive caching.