Nasty.Language.Behaviour behaviour (Nasty v0.3.0)

Behaviour that all natural language implementations must implement.

This provides a language-agnostic interface for parsing, tagging, and rendering natural language text. Each language (English, Spanish, Catalan, etc.) implements this behaviour with language-specific rules and processing.

Example Implementation

defmodule Nasty.Language.English do
  @behaviour Nasty.Language.Behaviour

  @impl true
  def language_code, do: :en

  @impl true
  def tokenize(text, _opts) do
    # English-specific tokenization
    {:ok, tokens}
  end

  @impl true
  def tag_pos(tokens, _opts) do
    # English-specific POS tagging
    {:ok, tagged_tokens}
  end

  @impl true
  def parse(tokens, _opts) do
    # English-specific parsing
    {:ok, document_ast}
  end

  @impl true
  def render(ast, _opts) do
    # English-specific text generation
    {:ok, text}
  end
end

Summary

Types

options()

Options passed to language processing functions.

parse_result()

Parse result containing the AST and optional metadata.

render_result()

Render result.

tokenize_result()

Tokenization result.

Callbacks

language_code()

Returns the ISO 639-1 language code for this implementation.

metadata()

Returns metadata about the language implementation.

parse(tokens, opts)

Parses tokens into a complete AST (Document structure).

render(ast, opts)

Renders an AST back to natural language text.

tag_pos(tokens, opts)

Tags tokens with part-of-speech information.

tokenize(text, opts)

Tokenizes text into a list of tokens.

Functions

validate_implementation!(module)

Validates that a module implements the Language.Behaviour correctly.

Types

options()

@type options() :: keyword()

Options passed to language processing functions.

Common options:

:generate_embeddings - Generate semantic embeddings (default: false)
:parse_dependencies - Extract dependency relations (default: true)
:extract_entities - Perform named entity recognition (default: false)
:resolve_coreferences - Resolve coreferences (default: false)
Custom language-specific options

parse_result()

@type parse_result() :: {:ok, Nasty.AST.Document.t()} | {:error, term()}

Parse result containing the AST and optional metadata.

render_result()

@type render_result() :: {:ok, String.t()} | {:error, term()}

Render result.

tokenize_result()

@type tokenize_result() :: {:ok, [Nasty.AST.Token.t()]} | {:error, term()}

Tokenization result.

Callbacks

language_code()

@callback language_code() :: atom()

Returns the ISO 639-1 language code for this implementation.

Examples

iex> Nasty.Language.English.language_code()
:en

iex> Nasty.Language.Spanish.language_code()
:es

metadata()

(optional)

@callback metadata() :: map()

Returns metadata about the language implementation.

Optional callback providing information about the implementation:

Version
Supported features
Performance characteristics
Dependencies

Examples

iex> Nasty.Language.English.metadata()
%{
  version: "1.0.0",
  features: [:tokenization, :pos_tagging, :parsing, :ner],
  parser_type: :nimble_parsec
}

parse(tokens, opts)

@callback parse(tokens :: [Nasty.AST.Token.t()], opts :: options()) :: parse_result()

Parses tokens into a complete AST (Document structure).

Parsing includes:

Phrase structure building (NP, VP, PP, etc.)
Clause and sentence identification
Dependency relation extraction (if enabled)
Semantic analysis (if enabled)

Parameters

tokens - POS-tagged tokens
opts - Parsing options
- :parse_dependencies - Extract dependency relations (default: true)
- :extract_entities - Perform NER (default: false)
- :resolve_coreferences - Resolve references (default: false)

Returns

{:ok, document} - Complete Document AST
{:error, reason} - Parse error with details

Examples

iex> tokens = [tagged_tokens...]
iex> Nasty.Language.English.parse(tokens, parse_dependencies: true)
{:ok, %Document{paragraphs: [...], ...}}

render(ast, opts)

@callback render(ast :: struct(), opts :: options()) :: render_result()

Renders an AST back to natural language text.

Rendering includes:

Surface realization (choosing word forms)
Agreement (subject-verb, determiner-noun, etc.)
Word order (language-specific ordering rules)
Punctuation insertion
Formatting (capitalization, spacing)

Parameters

ast - AST node to render (Document, Sentence, Phrase, etc.)
opts - Rendering options

Returns

{:ok, text} - Rendered natural language text
{:error, reason} - Rendering error

Examples

iex> doc = %Document{...}
iex> Nasty.Language.English.render(doc, [])
{:ok, "The cat sat on the mat."}

tag_pos(tokens, opts)

@callback tag_pos(tokens :: [Nasty.AST.Token.t()], opts :: options()) :: tokenize_result()

Tags tokens with part-of-speech information.

POS tagging assigns Universal Dependencies tags to each token and extracts morphological features.

Parameters

tokens - List of tokens from tokenization
opts - Tagging options

Returns

{:ok, tagged_tokens} - Tokens with pos_tag and morphology filled
{:error, reason} - Error during tagging

Examples

iex> tokens = [%Token{text: "cat", ...}]
iex> Nasty.Language.English.tag_pos(tokens, [])
{:ok, [%Token{text: "cat", pos_tag: :noun, ...}]}

tokenize(text, opts)

@callback tokenize(text :: String.t(), opts :: options()) :: tokenize_result()

Tokenizes text into a list of tokens.

Tokenization includes:

Sentence boundary detection
Word segmentation
Handling of contractions, hyphenation, compounds
Position tracking for each token

Parameters

text - Raw text to tokenize
opts - Tokenization options

Returns

{:ok, tokens} - List of Token structs with position information
{:error, reason} - Error during tokenization

Examples

iex> Nasty.Language.English.tokenize("Hello world.", [])
{:ok, [
  %Token{text: "Hello", ...},
  %Token{text: "world", ...},
  %Token{text: ".", ...}
]}

Functions

validate_implementation!(module)

@spec validate_implementation!(module()) :: :ok | no_return()

Validates that a module implements the Language.Behaviour correctly.

Examples

iex> Nasty.Language.Behaviour.validate_implementation!(Nasty.Language.English)
:ok