Changelog

View Source

[Released] - 0.3.0 - Add replace_content/3 operation

Added

  • replace_content/3 operation: Replace all elements matching a CSS selector with HTML content
    • Finds elements using CSS selector and replaces them with new HTML content
    • The matched elements are removed from the document (as siblings are inserted before, then target is deleted)
    • Matched elements are destroyed to free memory (ModestEx is stateless, no undo needed)
    • Returns count of replaced elements
    • C implementation for optimal performance
    • Erlang wrapper with comprehensive documentation

Technical Details

  • Uses lexbor's CSS selector engine for matching
  • Parses HTML fragments using lxb_html_element_inner_html_set()
  • Inserts new content before matched element using lxb_dom_node_insert_before()
  • Removes and destroys matched element using lxb_dom_node_remove() and lxb_dom_node_destroy_deep()
  • Elements without a parent (document root) are skipped
  • Replace with empty string effectively removes elements
  • O(n + m + k×m') complexity where n=DOM nodes, m=HTML length, k=matches, m'=parsed nodes

[Unreleased] - Add insert_after_content/3 operation

Added

  • insert_after_content/3 operation: Insert HTML content AFTER all elements matching a CSS selector
    • Inserts as SIBLINGS (like insert_before), positioned AFTER the matched element
    • Content is inserted in the parent's child list just after each matched element
    • Combines CSS selector matching, HTML parsing, and DOM manipulation in a single atomic operation
    • Returns count of processed elements
    • C implementation for optimal performance
    • Erlang wrapper with comprehensive documentation

Technical Details

  • Uses lexbor's CSS selector engine for matching
  • Parses HTML fragments using lxb_html_element_inner_html_set()
  • Uses lxb_dom_node_insert_after() with target element as reference
  • Important: Inserts nodes in REVERSE order to maintain correct document order
  • Elements without a parent (document root) are skipped
  • O(n + m + k×m') complexity where n=DOM nodes, m=HTML length, k=matches, m'=parsed nodes

[Unreleased] - Add insert_before_content/3 operation

Added

  • insert_before_content/3 operation: Insert HTML content BEFORE all elements matching a CSS selector
    • Key difference from append/prepend: Inserts as SIBLINGS, not as children
    • Content is inserted in the parent's child list just before each matched element
    • Combines CSS selector matching, HTML parsing, and DOM manipulation in a single atomic operation
    • Returns count of processed elements
    • C implementation for optimal performance
    • Erlang wrapper with comprehensive documentation

Technical Details

  • Uses lexbor's CSS selector engine for matching
  • Parses HTML fragments using lxb_html_element_inner_html_set()
  • Uses lxb_dom_node_insert_before() with target element as reference
  • Elements without a parent (document root) are skipped
  • Maintains document order when inserting multiple nodes
  • O(n + m + k×m') complexity where n=DOM nodes, m=HTML length, k=matches, m'=parsed nodes

[Released] - 0.2.0 - Add prepend_content/3 operation

Added

  • prepend_content/3 operation: Prepend HTML content to all elements matching a CSS selector
    • Inserts content as first child (before existing children)
    • Combines CSS selector matching, HTML parsing, and DOM manipulation in a single atomic operation
    • Returns count of modified elements
    • C implementation for optimal performance
    • Erlang wrapper with comprehensive documentation

Technical Details

  • Uses lexbor's CSS selector engine for matching (same as append_content)
  • Parses HTML fragments using lxb_html_element_inner_html_set()
  • Inserts nodes before first child using lxb_dom_node_insert_before()
  • Maintains document order when prepending multiple nodes
  • Comprehensive error handling
  • O(n + m + k×m') complexity where n=DOM nodes, m=HTML length, k=matches, m'=parsed nodes

[Unreleased] - Add append_content/3 operation

Added

  • append_content/3 operation: Append HTML content to all elements matching a CSS selector
    • Combines CSS selector matching, HTML parsing, and DOM manipulation in a single atomic operation
    • Returns count of modified elements
    • Erlang wrapper with comprehensive documentation

Technical Details

  • Uses lexbor's CSS selector engine for matching
  • Parses HTML fragments using lxb_html_element_inner_html_set()
  • O(n + m + k×m') complexity where n=DOM nodes, m=HTML length, k=matches, m'=parsed nodes

[Unreleased] - Demo application

Added

  • Demo application in demo/ directory that verifies the published hex.pm package works correctly
  • Makefile for convenient demo execution (make in demo directory)
  • Comprehensive README in demo/ explaining why escripts don't work with port-based applications

Technical Details

  • Demo fetches lexbor_erl v0.1.0 from hex.pm (not local source)
  • Uses erl -pa execution instead of escript (required for port-based applications)
  • Provides verification that the published package compiles and runs correctly

[Released] - 0.1.0

[Unreleased] - Improve document serialization to preserve DOCTYPE

Improved

  • Enhanced serialize_full_doc implementation: Refactored to serialize from document node instead of document element, which now preserves DOCTYPE declarations. This simplifies the code and aligns with all lexbor examples, and produces complete HTML5 documents.

Added

  • DOCTYPE declarations are now preserved during serialization
  • Complete HTML5 document structure maintained in round-trip parsing
  • Better HTML5 compliance with proper document structure
  • New verification test suite to validate serialization behavior

Changed

  • Serialization output now includes DOCTYPE when present in parsed HTML
  • Example: Input <!DOCTYPE html><html>...</html> now serializes with DOCTYPE preserved
  • Matches idiomatic lexbor pattern used in all official examples
  • Simpler implementation

Technical Details

  • Changed from lxb_dom_document_element() to direct document node serialization

[Unreleased] - Refactor get_attribute to use convenience API

Changed

  • Refactored get_attribute implementation: Replaced two-step attribute retrieval (get attribute object, then extract value) with lexbor's convenience function lxb_dom_element_get_attribute(). This simplifies the code, improves readability, and aligns with lexbor examples. All 51 tests continue to pass.

Improved

  • Simpler attribute retrieval logic
  • Better code readability with clearer intent
  • Follows idiomatic lexbor pattern from official examples
  • Simpler error handling (single NULL check instead of two checks)

[Unreleased] - Refactor set_inner_html to use official lexbor API

Changed

  • Refactored set_inner_html implementation: Replaced manual node importation with lexbor's official lxb_html_element_inner_html_set() API improves maintainability, and adds context-aware HTML parsing (e.g., proper handling of innerHTML on <table> elements)

Improved

  • set_inner_html now uses the standard lexbor approach for innerHTML operations
  • Better alignment with lexbor best practices and examples
  • Simplified code maintenance
  • Context-aware parsing follows HTML5 specification more accurately

[Unreleased] - Bug in DOM maniplation

Fixed

  • Refactored set_inner_html/3 to use lxb_dom_document_import_node API to properly copy node data to target document's memory pool, preventing use-after-free memory leak when temporary document is destroyed

[Unreleased] - Chunk Based Streaming Parser

Added

  • Streaming HTML parser for incremental document processing
  • Three-phase streaming API:
    • parse_stream_begin/0 - Initialize parse session
    • parse_stream_chunk/2 - Feed HTML chunks incrementally
    • parse_stream_end/1 - Finalize and get document
  • Parse session registry in C port with independent session tracking
  • Support for arbitrary chunk boundaries (can split mid-tag, mid-attribute)
  • streaming parser integration tests covering:
    • Basic streaming with multiple chunks
    • Splitting in middle of tags and attributes
    • Large document streaming
    • Equivalence with normal parsing
    • Invalid session handling
    • Parallel streaming sessions
  • C unit tests for streaming operations covering:
    • Basic begin/end sequence
    • Multiple chunks processing
    • Tag boundary splitting
    • Invalid session handling
    • Session reuse prevention
    • Large document streaming
    • Empty chunk handling
  • chunk_based_streaming_example.erl with examples
  • Session ID encoding with worker affinity for proper routing

[Unreleased] - DOM Manipulation

Added

  • DOM manipulation API with 11 new functions:
    • Attributes: get_attribute/3, set_attribute/4, remove_attribute/3
    • Text/HTML: get_text/2, set_text/3, inner_html/2, set_inner_html/3, serialize/1
    • Nodes: create_element/2, append_child/3, insert_before/4, remove_node/2
  • C unit test suite with 38 tests
  • example programs: attribute_example, text_example, node_example, select_example, unicode_example

[Unreleased] - Parallelism

Added

  • Worker pool architecture with configurable parallelism
  • Multiple independent port workers for true concurrent processing
  • Time-based hash distribution for stateless operations
  • DocId encoding for deterministic routing of stateful operations
  • Worker isolation with independent supervision
  • Fault tolerance with automatic worker recovery
  • Parallel processing tests
  • Fault tolerance tests
  • Worker pool coordinator (lexbor_erl_pool)
  • Individual worker processes (lexbor_erl_worker)

Fixed

  • Corrected routing strategy documentation from "round-robin" to "time-based hash distribution"

[Unreleased] - single-threaded

Added

  • HTML5-tolerant parsing with Lexbor C library
  • CSS selector queries (class, ID, tag, attributes, combinators, pseudo-classes)
  • Stateless operations: parse_serialize/1, select_html/2
  • Stateful operations: parse/1, release/1, select/2, outer_html/2
  • Port-based architecture for BEAM VM safety
  • Worker pool with time-based hash distribution
  • Document lifecycle management with DocId encoding
  • Application and supervisor structure
  • Common Test suite
    • Lifecycle management
    • Stateless and stateful operations
    • Error handling and edge cases
    • Unicode support
    • Large documents
  • Comprehensive EDoc documentation
  • CMake-based C build system
  • Example programs demonstrating API usage

Security

  • Port isolation prevents C crashes from affecting BEAM VM
  • No atom leaks - all user input stays as binaries