Nasty (Nasty v0.3.0)

View Source

Nasty - Natural Abstract Syntax Treey

A language-agnostic NLP library for Elixir that treats natural language with the same rigor as programming languages.

Overview

Nasty provides a comprehensive Abstract Syntax Tree (AST) representation for natural languages, enabling:

  • Grammar-First Parsing: Parse text into formal linguistic structures
  • Multi-Language Support: Language-agnostic architecture (English first)
  • Bidirectional Code Conversion: Natural Language ↔ Programming Language AST
  • NLP Operations: Summarization, question answering, classification
  • Pure Elixir: Zero external NLP dependencies

Architecture

Nasty uses a layered, behaviour-based architecture:

Text  Tokenization  POS Tagging  Parsing  AST
                                                 
                              Semantic Analysis  Enhanced AST
                                                 
                        NLP Operations / Code Interop

Each natural language implements the Nasty.Language.Behaviour behaviour, providing language-specific tokenization, tagging, parsing, and rendering.

Usage

# Parse text to AST
{:ok, ast} = Nasty.parse("The cat sat on the mat.", language: :en)

# Query the AST
subject = Nasty.Query.find_subject(ast)

# Convert natural language to code
{:ok, code} = Nasty.to_code("Sort the list", 
  source_language: :en, 
  target_language: :elixir
)

# Summarize text
summary = Nasty.summarize(text, 
  language: :en, 
  method: :extractive, 
  sentences: 3
)

Implementation Status

🚧 Early development - see PLAN.md for roadmap.

Current focus:

  • Phase 0: Language abstraction layer with @behaviour
  • Phase 1: Universal AST schema and English implementation

Summary

Functions

Generates natural language explanation from code.

Returns the version and implementation status.

Parse natural language text into an AST.

Renders an AST back to natural language text.

Summarizes a document by extracting important sentences.

Converts natural language text to code.

Functions

explain_code(code, opts \\ [])

@spec explain_code(
  String.t() | Macro.t(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}

Generates natural language explanation from code.

Parameters

  • code: Code string or AST to explain
  • opts: Keyword options
    • :source_language - Programming language (:elixir, etc.) (required)
    • :target_language - Target natural language (:en, etc.) (required)
    • :style - Explanation style: :concise or :verbose (default: :concise)

Examples

{:ok, explanation} = Nasty.explain_code("Enum.sort(list)",
  source_language: :elixir,
  target_language: :en
)
# => "Sort list"

Returns

  • {:ok, explanation_string} - Natural language explanation
  • {:error, reason} - Error

hello()

Returns the version and implementation status.

Examples

iex> Nasty.hello()
{:ok, "Nasty v0.1.0 - Early Development"}

parse(text, opts \\ [])

Parse natural language text into an AST.

Parameters

  • text: The text to parse
  • opts: Keyword options
    • :language - Language code (:en, :es, :ca, etc.) Required for now
    • :tokenize - Enable tokenization (default: true)
    • :pos_tag - Enable POS tagging (default: true)
    • :parse_dependencies - Parse dependency relationships (default: true)
    • :extract_entities - Extract named entities (default: false)
    • :resolve_coreferences - Resolve coreferences (default: false)

Examples

{:ok, ast} = Nasty.parse("The cat sat.", language: :en)

Returns

  • {:ok, %Nasty.AST.Document{}} - Parsed AST
  • {:error, reason} - Parse error

render(ast, opts \\ [])

@spec render(struct(), keyword()) :: {:ok, String.t()} | {:error, term()}

Renders an AST back to natural language text.

The language is determined from the AST's language field.

Examples

{:ok, text} = Nasty.render(ast)

Returns

  • {:ok, text} - Rendered text
  • {:error, reason} - Render error

summarize(text_or_ast, opts \\ [])

@spec summarize(
  String.t() | struct(),
  keyword()
) :: {:ok, [struct()]} | {:error, term()}

Summarizes a document by extracting important sentences.

Parameters

  • text_or_ast: Text string or AST Document to summarize
  • opts: Keyword options
    • :language - Language code (:en, :es, :ca, etc.) (required if text)
    • :ratio - Compression ratio (0.0 to 1.0), default 0.3
    • :max_sentences - Maximum number of sentences in summary
    • :method - Selection method: :greedy or :mmr (default: :greedy)

Examples

{:ok, summary} = Nasty.summarize(text, language: :en, ratio: 0.3)

# Or with AST directly
{:ok, ast} = Nasty.parse(text, language: :en)
{:ok, summary} = Nasty.summarize(ast, max_sentences: 3)

Returns

  • {:ok, [%Sentence{}]} - List of extracted sentences
  • {:error, reason} - Error

to_code(text, opts \\ [])

@spec to_code(
  String.t(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}

Converts natural language text to code.

Parameters

  • text: Natural language description of what the code should do
  • opts: Keyword options
    • :source_language - Source natural language (:en, etc.) (required)
    • :target_language - Target programming language (:elixir, etc.) (required)

Examples

{:ok, code} = Nasty.to_code("Sort the list", 
  source_language: :en, 
  target_language: :elixir
)
# => "Enum.sort(list)"

Returns

  • {:ok, code_string} - Generated code
  • {:error, reason} - Error