Nasty Public API Reference

View Source

This document describes the public API of Nasty, the Natural Abstract Syntax Tree library for Elixir.

Core Functions

Parsing

Nasty.parse/2

Parses natural language text into an Abstract Syntax Tree (AST).

Parameters:

  • text (String.t()) - The text to parse
  • opts (keyword()) - Options:
    • :language - Language code (:en, :es, :ca, etc.) Required
    • :tokenize - Enable tokenization (default: true)
    • :pos_tag - Enable POS tagging (default: true)
    • :parse_dependencies - Parse dependency relationships (default: true)
    • :extract_entities - Extract named entities (default: false)
    • :resolve_coreferences - Resolve coreferences (default: false)

Returns:

  • {:ok, %Nasty.AST.Document{}} - Parsed AST document
  • {:error, reason} - Parse error

Examples:

# Basic parsing
{:ok, ast} = Nasty.parse("The cat sat on the mat.", language: :en)

# With entity recognition
{:ok, ast} = Nasty.parse("John lives in Paris.", 
  language: :en, 
  extract_entities: true
)

# With coreference resolution
{:ok, ast} = Nasty.parse("Mary loves her cat. She feeds it daily.", 
  language: :en, 
  resolve_coreferences: true
)

Nasty.render/2

Renders an AST back to natural language text.

Parameters:

  • ast (struct()) - AST node to render (Document, Sentence, etc.)
  • opts (keyword()) - Options (language determined from AST)

Returns:

  • {:ok, text} - Rendered text string
  • {:error, reason} - Render error

Examples:

{:ok, ast} = Nasty.parse("The cat sat.", language: :en)
{:ok, text} = Nasty.render(ast)
# => "The cat sat."

Translation

Nasty.Translation.Translator.translate_document/2

Translates an AST document from one language to another.

Parameters:

  • document - AST Document to translate
  • target_language - Target language code (:en, :es, :ca, etc.)

Returns:

  • {:ok, %Nasty.AST.Document{}} - Translated AST document
  • {:error, reason} - Translation error

Examples:

alias Nasty.Translation.Translator

# Translate English to Spanish
{:ok, doc_en} = Nasty.parse("The cat runs.", language: :en)
{:ok, doc_es} = Translator.translate_document(doc_en, :es)
{:ok, text_es} = Nasty.render(doc_es)
# => "El gato corre."

# Translate Spanish to English  
{:ok, doc_es} = Nasty.parse("La casa grande.", language: :es)
{:ok, doc_en} = Translator.translate_document(doc_es, :en)
{:ok, text_en} = Nasty.render(doc_en)
# => "The big house."

# Or translate text directly
{:ok, text_es} = Translator.translate("The cat runs.", :en, :es)
# => "El gato corre."

Summarization

Nasty.summarize/2

Summarizes a document by extracting important sentences.

Parameters:

  • text_or_ast - Text string or AST Document to summarize
  • opts (keyword()) - Options:
    • :language - Language code (required if text)
    • :ratio - Compression ratio (0.0 to 1.0), default 0.3
    • :max_sentences - Maximum number of sentences in summary
    • :method - Selection method: :greedy or :mmr (default: :greedy)
    • :min_sentence_length - Minimum sentence length in tokens (default: 3)
    • :mmr_lambda - MMR diversity parameter, 0-1 (default: 0.5)

Returns:

  • {:ok, [%Sentence{}]} - List of extracted sentences
  • {:error, reason} - Error

Examples:

# From text
{:ok, summary} = Nasty.summarize(long_text, 
  language: :en, 
  ratio: 0.3
)

# From AST
{:ok, ast} = Nasty.parse(long_text, language: :en)
{:ok, summary} = Nasty.summarize(ast, max_sentences: 3)

# Using MMR for diversity
{:ok, summary} = Nasty.summarize(text, 
  language: :en, 
  method: :mmr, 
  mmr_lambda: 0.7
)

Code Interoperability

Nasty.to_code/2

Converts natural language text to code.

Parameters:

  • text (String.t()) - Natural language description
  • opts (keyword()) - Options:
    • :source_language - Source natural language (:en, etc.) Required
    • :target_language - Target programming language (:elixir, etc.) Required

Returns:

  • {:ok, code_string} - Generated code
  • {:error, reason} - Error

Supported Language Pairs:

  • English → Elixir (:en:elixir)

Examples:

# List operations
{:ok, code} = Nasty.to_code("Sort the list", 
  source_language: :en, 
  target_language: :elixir
)
# => "Enum.sort(list)"

# Filter with constraints
{:ok, code} = Nasty.to_code("Filter users where age is greater than 18",
  source_language: :en,
  target_language: :elixir
)
# => "Enum.filter(users, fn item -> item > 18 end)"

# Arithmetic
{:ok, code} = Nasty.to_code("Add x and y",
  source_language: :en,
  target_language: :elixir
)
# => "x + y"

Nasty.explain_code/2

Generates natural language explanation from code.

Parameters:

  • code - Code string or AST to explain
  • opts (keyword()) - Options:
    • :source_language - Programming language (:elixir, etc.) Required
    • :target_language - Target natural language (:en, etc.) Required
    • :style - Explanation style: :concise or :verbose (default: :concise)

Returns:

  • {:ok, explanation_string} - Natural language explanation
  • {:error, reason} - Error

Supported Language Pairs:

  • Elixir → English (:elixir:en)

Examples:

{:ok, explanation} = Nasty.explain_code("Enum.sort(list)",
  source_language: :elixir,
  target_language: :en
)
# => "Sort list"

{:ok, explanation} = Nasty.explain_code(
  "list |> Enum.map(&(&1 * 2)) |> Enum.sum()",
  source_language: :elixir,
  target_language: :en
)
# => "Map list to double each element, then sum the results"

# Verbose style
{:ok, explanation} = Nasty.explain_code("x = 5",
  source_language: :elixir,
  target_language: :en,
  style: :verbose
)

Language Registry

Nasty.Language.Registry

Manages language implementations.

Nasty.Language.Registry.register/1

Registers a language implementation module.

Nasty.Language.Registry.register(Nasty.Language.English)
# => :ok

Nasty.Language.Registry.get/1

Gets the implementation module for a language code.

{:ok, module} = Nasty.Language.Registry.get(:en)
# => {:ok, Nasty.Language.English}

Nasty.Language.Registry.detect_language/1

Detects the language of the given text.

{:ok, language} = Nasty.Language.Registry.detect_language("Hello world")
# => {:ok, :en}

{:ok, language} = Nasty.Language.Registry.detect_language("Hola mundo")
# => {:ok, :es}

Nasty.Language.Registry.registered_languages/0

Returns all registered language codes.

Nasty.Language.Registry.registered_languages()
# => [:en, :es, :ca]

Nasty.Language.Registry.registered?/1

Checks if a language is registered.

Nasty.Language.Registry.registered?(:en)
# => true

AST Utilities

Query

Nasty.Utils.Query

Query and traverse AST structures.

alias Nasty.Utils.Query

# Find subject in a sentence
subject = Query.find_subject(sentence)

# Find all noun phrases
noun_phrases = Query.find_all(document, :noun_phrase)

# Find by POS tag
nouns = Query.find_by_pos(document, :noun)
verbs = Query.find_by_pos(document, :verb)

# Count nodes
token_count = Query.count(document, :token)

Validation

Nasty.Utils.Validator

Validate AST structure.

alias Nasty.Utils.Validator

case Validator.validate(document) do
  {:ok, _doc} -> IO.puts("Valid AST")
  {:error, reason} -> IO.puts("Invalid: #{reason}")
end

# Check if valid (boolean)
if Validator.valid?(document) do
  IO.puts("Document is valid")
end

Transformation

Nasty.Utils.Transform

Transform AST nodes.

alias Nasty.Utils.Transform

# Case normalization
lowercased = Transform.normalize_case(document, :lower)

# Remove punctuation
no_punct = Transform.remove_punctuation(document)

# Remove stop words
no_stops = Transform.remove_stop_words(document)

# Lemmatize all tokens
lemmatized = Transform.lemmatize(document)

Traversal

Nasty.Utils.Traversal

Traverse AST structure.

alias Nasty.Utils.Traversal

# Reduce over all nodes
token_count = Traversal.reduce(document, 0, fn
  %Nasty.AST.Token{}, acc -> acc + 1
  _, acc -> acc
end)

# Collect matching nodes
verbs = Traversal.collect(document, fn
  %Nasty.AST.Token{pos_tag: :verb} -> true
  _ -> false
end)

# Map over all nodes
transformed = Traversal.map(document, fn
  %Nasty.AST.Token{} = token ->
    %{token | text: String.downcase(token.text)}
  node -> node
end)

Rendering

Pretty Print

Nasty.Rendering.PrettyPrint

Format AST for human-readable inspection.

# Pretty print to stdout
Nasty.Rendering.PrettyPrint.inspect(ast)

# Get formatted string
formatted = Nasty.Rendering.PrettyPrint.format(ast)

Visualization

Nasty.Rendering.Visualization

Generate visualizations of AST structures.

# Generate DOT format for Graphviz
{:ok, dot} = Nasty.Rendering.Visualization.to_dot(ast)
File.write("ast.dot", dot)

# Generate JSON representation
{:ok, json} = Nasty.Rendering.Visualization.to_json(ast)

Text Rendering

Nasty.Rendering.Text

Render AST to text.

{:ok, text} = Nasty.Rendering.Text.render(document)

Statistical & Neural Models

Model Registry

Nasty.Statistics.ModelRegistry

Manage statistical and neural models.

# Register a model
Nasty.Statistics.ModelRegistry.register(:hmm_pos_tagger, model)
Nasty.Statistics.ModelRegistry.register(:neural_pos_tagger, neural_model)

# Get a model
{:ok, model} = Nasty.Statistics.ModelRegistry.get(:hmm_pos_tagger)
{:ok, neural} = Nasty.Statistics.ModelRegistry.get(:neural_pos_tagger)

# List models
models = Nasty.Statistics.ModelRegistry.list_models()

Model Loader

Nasty.Statistics.ModelLoader

Load and save statistical and neural models.

# Load HMM model from file
{:ok, model} = Nasty.Statistics.ModelLoader.load("path/to/model.model")

# Load neural model from file
{:ok, neural} = Nasty.Statistics.POSTagging.NeuralTagger.load("path/to/model.axon")

# Save model to file
:ok = Nasty.Statistics.ModelLoader.save(model, "path/to/model.model")
:ok = NeuralTagger.save(neural, "path/to/model.axon")

# Load from project
{:ok, model} = Nasty.Statistics.ModelLoader.load_from_priv("models/hmm.model")

Neural Models

Nasty.Statistics.POSTagging.NeuralTagger

Train and use BiLSTM-CRF neural models for POS tagging.

# Train a neural model
alias Nasty.Statistics.POSTagging.NeuralTagger

tagger = NeuralTagger.new(
  vocab: vocab,
  tag_vocab: tag_vocab,
  embedding_dim: 300,
  hidden_size: 256,
  num_layers: 2
)

{:ok, trained} = NeuralTagger.train(tagger, training_data,
  epochs: 10,
  batch_size: 32,
  learning_rate: 0.001
)

# Use neural model for prediction
{:ok, tags} = NeuralTagger.predict(trained, ["The", "cat", "sat"], [])

# Save/load neural models
NeuralTagger.save(trained, "model.axon")
{:ok, loaded} = NeuralTagger.load("model.axon")

Data Layer

CoNLL-U Parser

Nasty.Data.CoNLLU

Parse and generate CoNLL-U format data.

# Parse CoNLL-U file
{:ok, sentences} = Nasty.Data.CoNLLU.parse_file("corpus.conllu")

# Parse CoNLL-U string
{:ok, sentences} = Nasty.Data.CoNLLU.parse(conllu_string)

# Convert AST to CoNLL-U
conllu_string = Nasty.Data.CoNLLU.format(sentence)

Corpus Management

Nasty.Data.Corpus

Manage text corpora.

# Load corpus
{:ok, corpus} = Nasty.Data.Corpus.load("path/to/corpus")

# Get sentences
sentences = Nasty.Data.Corpus.sentences(corpus)

# Statistics
stats = Nasty.Data.Corpus.statistics(corpus)

NLP Operations (English)

These are language-specific operations available for English. Access through the English module.

Question Answering

alias Nasty.Language.English

# Analyze question
{:ok, analysis} = English.QuestionAnalyzer.analyze("What is the capital of France?")

# Extract answer
{:ok, answer} = English.AnswerExtractor.extract(document, analysis)

Text Classification

# Train classifier
classifier = English.TextClassifier.train(training_data)

# Classify text
{:ok, category} = English.TextClassifier.classify(classifier, text)

Information Extraction

# Extract relations
relations = English.RelationExtractor.extract(document)

# Extract events
events = English.EventExtractor.extract(document)

# Extract with templates
extracted = English.TemplateExtractor.extract(document, templates)

Semantic Role Labeling

# Label semantic roles
labeled = English.SemanticRoleLabeler.label(sentence)

Coreference Resolution

# Resolve coreferences
{:ok, resolved} = English.CoreferenceResolver.resolve(document)

Translation

Nasty.Translation.Translator

Translate documents between languages.

alias Nasty.Translation.Translator

# Translate document
{:ok, translated_doc} = Translator.translate(source_doc, :es)

# Translate with custom lexicons
{:ok, translated_doc} = Translator.translate(source_doc, :es, lexicon_path: "custom_lexicons/")

Nasty.Translation.TokenTranslator

Translate individual tokens with POS-aware lemma-to-lemma mapping.

alias Nasty.Translation.TokenTranslator

# Translate token
translated_token = TokenTranslator.translate_token(token, :en, :es)

# Translate with morphology
translated_token = TokenTranslator.translate_with_morphology(token, :en, :es)

Nasty.Translation.Agreement

Enforce morphological agreement rules.

alias Nasty.Translation.Agreement

# Apply gender/number agreement
adjusted_tokens = Agreement.apply_agreement(tokens, :es)

# Check agreement
valid? = Agreement.check_agreement(determiner, noun)

Nasty.Translation.WordOrder

Apply language-specific word order transformations.

alias Nasty.Translation.WordOrder

# Transform word order
ordered_phrase = WordOrder.apply_order(phrase, :es)

# Apply adjective position rules  
ordered_np = WordOrder.apply_adjective_order(noun_phrase, :es)

Nasty.AST.Renderer

Render AST back to natural language text.

alias Nasty.AST.Renderer

# Render document
{:ok, text} = Renderer.render_document(document)

# Render specific nodes
{:ok, text} = Renderer.render_sentence(sentence)
{:ok, text} = Renderer.render_phrase(phrase)

Error Handling

All public API functions return result tuples:

  • {:ok, result} on success
  • {:error, reason} on failure

Common error reasons:

  • :language_required - Language not specified
  • :language_not_found - Language not registered
  • :language_not_registered - Language code not in registry
  • :no_languages_registered - No languages available
  • :no_match - Language detection failed
  • :invalid_text - Invalid input text
  • :parse_error - Failed to parse text
  • :source_language_required - Source language not specified
  • :target_language_required - Target language not specified
  • :unsupported_language_pair - Language pair not supported
  • :summarization_not_supported - Summarization not available for language
  • :invalid_input - Invalid input type

See Also