Nasty.Language.Behaviour behaviour (Nasty v0.3.0)

View Source

Behaviour that all natural language implementations must implement.

This provides a language-agnostic interface for parsing, tagging, and rendering natural language text. Each language (English, Spanish, Catalan, etc.) implements this behaviour with language-specific rules and processing.

Example Implementation

defmodule Nasty.Language.English do
  @behaviour Nasty.Language.Behaviour

  @impl true
  def language_code, do: :en

  @impl true
  def tokenize(text, _opts) do
    # English-specific tokenization
    {:ok, tokens}
  end

  @impl true
  def tag_pos(tokens, _opts) do
    # English-specific POS tagging
    {:ok, tagged_tokens}
  end

  @impl true
  def parse(tokens, _opts) do
    # English-specific parsing
    {:ok, document_ast}
  end

  @impl true
  def render(ast, _opts) do
    # English-specific text generation
    {:ok, text}
  end
end

Summary

Types

Options passed to language processing functions.

Parse result containing the AST and optional metadata.

Render result.

Tokenization result.

Callbacks

Returns the ISO 639-1 language code for this implementation.

Returns metadata about the language implementation.

Parses tokens into a complete AST (Document structure).

Renders an AST back to natural language text.

Tags tokens with part-of-speech information.

Tokenizes text into a list of tokens.

Functions

Validates that a module implements the Language.Behaviour correctly.

Types

options()

@type options() :: keyword()

Options passed to language processing functions.

Common options:

  • :generate_embeddings - Generate semantic embeddings (default: false)
  • :parse_dependencies - Extract dependency relations (default: true)
  • :extract_entities - Perform named entity recognition (default: false)
  • :resolve_coreferences - Resolve coreferences (default: false)
  • Custom language-specific options

parse_result()

@type parse_result() :: {:ok, Nasty.AST.Document.t()} | {:error, term()}

Parse result containing the AST and optional metadata.

render_result()

@type render_result() :: {:ok, String.t()} | {:error, term()}

Render result.

tokenize_result()

@type tokenize_result() :: {:ok, [Nasty.AST.Token.t()]} | {:error, term()}

Tokenization result.

Callbacks

language_code()

@callback language_code() :: atom()

Returns the ISO 639-1 language code for this implementation.

Examples

iex> Nasty.Language.English.language_code()
:en

iex> Nasty.Language.Spanish.language_code()
:es

metadata()

(optional)
@callback metadata() :: map()

Returns metadata about the language implementation.

Optional callback providing information about the implementation:

  • Version
  • Supported features
  • Performance characteristics
  • Dependencies

Examples

iex> Nasty.Language.English.metadata()
%{
  version: "1.0.0",
  features: [:tokenization, :pos_tagging, :parsing, :ner],
  parser_type: :nimble_parsec
}

parse(tokens, opts)

@callback parse(tokens :: [Nasty.AST.Token.t()], opts :: options()) :: parse_result()

Parses tokens into a complete AST (Document structure).

Parsing includes:

  • Phrase structure building (NP, VP, PP, etc.)
  • Clause and sentence identification
  • Dependency relation extraction (if enabled)
  • Semantic analysis (if enabled)

Parameters

  • tokens - POS-tagged tokens
  • opts - Parsing options
    • :parse_dependencies - Extract dependency relations (default: true)
    • :extract_entities - Perform NER (default: false)
    • :resolve_coreferences - Resolve references (default: false)

Returns

  • {:ok, document} - Complete Document AST
  • {:error, reason} - Parse error with details

Examples

iex> tokens = [tagged_tokens...]
iex> Nasty.Language.English.parse(tokens, parse_dependencies: true)
{:ok, %Document{paragraphs: [...], ...}}

render(ast, opts)

@callback render(ast :: struct(), opts :: options()) :: render_result()

Renders an AST back to natural language text.

Rendering includes:

  • Surface realization (choosing word forms)
  • Agreement (subject-verb, determiner-noun, etc.)
  • Word order (language-specific ordering rules)
  • Punctuation insertion
  • Formatting (capitalization, spacing)

Parameters

  • ast - AST node to render (Document, Sentence, Phrase, etc.)
  • opts - Rendering options

Returns

  • {:ok, text} - Rendered natural language text
  • {:error, reason} - Rendering error

Examples

iex> doc = %Document{...}
iex> Nasty.Language.English.render(doc, [])
{:ok, "The cat sat on the mat."}

tag_pos(tokens, opts)

@callback tag_pos(tokens :: [Nasty.AST.Token.t()], opts :: options()) :: tokenize_result()

Tags tokens with part-of-speech information.

POS tagging assigns Universal Dependencies tags to each token and extracts morphological features.

Parameters

  • tokens - List of tokens from tokenization
  • opts - Tagging options

Returns

  • {:ok, tagged_tokens} - Tokens with pos_tag and morphology filled
  • {:error, reason} - Error during tagging

Examples

iex> tokens = [%Token{text: "cat", ...}]
iex> Nasty.Language.English.tag_pos(tokens, [])
{:ok, [%Token{text: "cat", pos_tag: :noun, ...}]}

tokenize(text, opts)

@callback tokenize(text :: String.t(), opts :: options()) :: tokenize_result()

Tokenizes text into a list of tokens.

Tokenization includes:

  • Sentence boundary detection
  • Word segmentation
  • Handling of contractions, hyphenation, compounds
  • Position tracking for each token

Parameters

  • text - Raw text to tokenize
  • opts - Tokenization options

Returns

  • {:ok, tokens} - List of Token structs with position information
  • {:error, reason} - Error during tokenization

Examples

iex> Nasty.Language.English.tokenize("Hello world.", [])
{:ok, [
  %Token{text: "Hello", ...},
  %Token{text: "world", ...},
  %Token{text: ".", ...}
]}

Functions

validate_implementation!(module)

@spec validate_implementation!(module()) :: :ok | no_return()

Validates that a module implements the Language.Behaviour correctly.

Examples

iex> Nasty.Language.Behaviour.validate_implementation!(Nasty.Language.English)
:ok