Changelog
View SourceAll notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
0.1.11 - 2025-12-03
Added
- Trailing dots truncation handling: Layer 1 now detects and strips trailing dots from truncated JSON content
- Pattern:
{"key": "value.............โ{"key": "value - Handles Gemini API
max_output_tokenstruncation pattern where remaining tokens are filled with dots - Threshold: 10+ consecutive trailing dots trigger cleanup (preserves legitimate ellipsis)
- Works with mixed whitespace/newlines between dots
- Test status: โ 14/14 new tests passing
- Pattern:
- New public API function:
ContentCleaning.strip_trailing_dots/1for direct access to trailing dots removal
Technical Details
- Module: Enhanced
JsonRemedy.Layer1.ContentCleaning - New functions:
strip_trailing_dots/1- Public API for trailing dots removalstrip_trailing_dots_internal/1- Pipeline integration functionfind_trailing_dots_start/1- Position detectionfind_dots_at_end/1- End-of-string dot detectioncount_trailing_dots/4- Dot counting with whitespace tolerance
- Integration: Runs in Layer 1 pipeline after content extraction, before encoding normalization
- Performance: Grapheme-based processing for proper UTF-8 handling
Documentation
- Test case reference: Based on real Gemini API response in
docs/20251203/test_case/gemini_max_tokens_trailing_dots.md - Layer 1 enhancements: Added trailing dots stripping to content cleaning pipeline
0.1.10 - 2025-10-28
Fixed
- Windows structural repairs: Treat CRLF pairs as whitespace when scanning for the next significant character so Layer 2 keeps nested braces open on Windows-style inputs reported by CI.
0.1.9 - 2025-10-28
Fixed
- Multi-item arrays: Layer 2 now closes missing braces and brackets between array elements, avoiding mismatched delimiter swaps during comma handling and restoring proper validation (#8).
Added
- Regression coverage: Added LF and CRLF variants of the reporter's samples to ensure multi-element arrays with missing terminators stay repaired.
0.1.8 - 2025-10-27
Added
- HTML metadata:
HtmlHandlers.extract_html_content/2now returns byte counts alongside grapheme totals so every caller gets accurate offsets without recomputing (#6). - Examples: New
html_metadata_examples.exsandwindows_ci_examples.exsscripts document the metadata workflow and Windows CI parity (#6, #7).
Fixed
- Syntax normalization: Layer 3 consumes the shared metadata directly, removing duplicated byte math around HTML quoting (#6).
CI
- Windows coverage: Introduced a
windows-2022PowerShell job to the GitHub Actions matrix to runmix deps.get,mix compile --warnings-as-errors, andmix test, ensuring CRLF regressions are caught ahead of releases (#7).
0.1.7 - 2025-10-27
Fixed
- Windows HTML payloads: Layer 3's binary optimization now trims HTML fragments using byte-accurate offsets, ensuring CRLF-terminated bodies are quoted without leaving stray delimiters.
- Regression coverage: Added a direct regression test mirroring the reporter's
CRLF_html.jsonsample so Windows-style newlines stay guarded.
Added
- Examples: Extended
examples/html_content_examples.exswith a Windows newline scenario to demonstrate the repaired behaviour viamix run. - Documentation: HexDocs now surfaces grouped README, changelog, and license pages for easier navigation.
0.1.6 - 2025-10-24
Added
๐ข Advanced Number Edge Case Handling - Critical Pattern Enhancement
Comprehensive support for non-standard number formats commonly found in real-world malformed JSON, inspired by json_repair Python library.
New Number Patterns Supported:
- Fractions:
{"ratio": 1/3}โ{"ratio": "1/3"}(convert to string) - Ranges:
{"years": 1990-2020}โ{"years": "1990-2020"}(convert to string) - Invalid decimals:
{"version": 1.1.1}โ{"version": "1.1.1"}(convert to string) - Leading decimals:
{"probability": .25}โ{"probability": 0.25}(prepend zero) - Text-number hybrids:
{"code": 1notanumber}โ{"code": "1notanumber"}(convert to string) - Trailing operators:
{"value": 1e}โ{"value": 1}(remove incomplete exponent) - Trailing decimals:
{"num": 1.}โ{"num": 1.0}(complete decimal) - Currency symbols:
{"price": $100}โ{"price": "$100"}(quote as string) - Thousands separators:
{"population": 1,234,567}โ{"population": 1234567}(already supported, now enhanced)
Implementation Details:
- Module: Enhanced
JsonRemedy.Layer3.BinaryProcessors - New functions:
consume_number_with_edge_cases/3- Extended number consumption with special character supportanalyze_and_normalize_number/2- Intelligent pattern detection and conversion
- Character support: Handles
/,-,., currency symbols ($,โฌ,ยฃ,ยฅ), commas, and text - Smart detection: Distinguishes negative numbers from ranges, thousands separators from delimiters
- Test status: โ 42/43 tests passing (98% success rate)
๐ Pattern Investigation & Documentation
- Comprehensive analysis: Deep investigation of json_repair Python library patterns
- Test infrastructure: Created
test/missing_patterns/directory for pattern validation - Layer 5 roadmap: Documented patterns requiring state machine implementation:
- Doubled quotes detection (
""value""โ"value") - Misplaced quote detection with lookahead
- Stream stability mode for incomplete JSON
- Unicode escape normalization
- Object merge patterns
- Array extension patterns
- Doubled quotes detection (
Enhanced
- Layer 3 Syntax Normalization: Expanded number detection to include
.and$triggers - Binary Processors: Character-by-character number consumption with edge case awareness
- Pipeline Architecture: Early hardcoded pattern preprocessing (before Layer 2) to prevent structural misinterpretation
- Test organization: New
:layer5_targettag for deferred features - Documentation: Comprehensive rationale for architectural decisions
Fixed
- Leading decimal numbers:
.25now correctly normalized to0.25 - Negative leading decimals:
-.5now correctly normalized to-0.5 - Fraction detection:
1/3properly detected and quoted as string - Range vs negative:
10-20(range) distinguished from-20(negative number) - Scientific notation edge cases: Incomplete exponents (
1e,1e-) handled gracefully - Number-text hybrids:
123abcproperly detected and quoted - Multiple decimal points:
1.1.1correctly identified as invalid and quoted - Thousands separator parsing: Only consumes commas followed by exactly 3 digits
Technical Details
- Pattern consumption: Enhanced binary pattern matching in
consume_number_with_edge_cases/3 - Context-aware normalization:
analyze_and_normalize_number/2with 9 distinct pattern checks - Repair tracking: Detailed repair actions for all number normalizations
- UTF-8 safe: Proper handling of unicode characters in number-like values
- Zero regressions: All 82 critical tests remain passing
Deferred to Layer 5 (Tolerant Parsing)
The following patterns require full JSON state machine with position tracking and lookahead:
- Doubled quotes: Context-sensitive quote repair (21 tests tagged
:layer5_target) - Misplaced quotes: Lookahead analysis for quote-in-quote detection
- Stream stability: Handling incomplete streaming JSON from LLMs
- Complex structural issues: Severe malformations requiring aggressive heuristics
Documentation
- Pattern analysis: Documented 12 missing pattern categories from json_repair comparison
- Test coverage: Added 64 new tests (43 number edge cases + 21 doubled quotes)
- Architectural insights: Documented regex limitations and Layer 5 requirements
- Known limitations: Clear documentation of deferred features with rationale
Test Suite Status
- Total tests: 618 tests, 0 failures (100% pass rate)
- Excluded: 63 tests (38 existing + 25 deferred Layer 5 targets)
- Critical tests: 82/82 passing (100%)
- Number edge cases: 42/43 passing (98%)
- New test infrastructure:
test/missing_patterns/directory established
0.1.5 - 2025-10-24
Added
๐ Pre-processing Pipeline - Major Architectural Enhancement
A new pre-processing stage now runs before the main layer pipeline to handle complex patterns that would otherwise be broken by subsequent layers. This is inspired by the json_repair Python library.
New Pre-processing Modules:
MultipleJsonDetectorutility: Detects and aggregates consecutive JSON values- Pattern:
[]{}โ[[], {}] - Prevents Layer 1 from treating subsequent JSON as "wrapper text"
- Runs first in the pipeline before any layer processing
- Test status: โ 10/10 tests passing
- Pattern:
ObjectMerger(Layer 3): Merges key-value pairs after premature closing braces- Pattern:
{"a":"b"},"c":"d"}โ{"a":"b","c":"d"} - Handles malformed objects with extra closing braces
- Merges additional pairs erroneously placed outside objects
- Test status: โ 10/10 tests passing
- Pattern:
New Layer 3 Filters:
EllipsisFilter: Removes unquoted ellipsis (...) placeholders- Pattern:
[1,2,3,...]โ[1,2,3] - Common in LLM-generated content to indicate truncation
- Preserves quoted
"..."as valid string values - Test status: โ 10/10 tests passing
- Pattern:
KeywordFilter: Removes unquoted comment-like keywords- Pattern:
{"a":1, COMMENT "b":2}โ{"a":1,"b":2} - Filters:
COMMENT,SHOULD_NOT_EXIST,DEBUG_INFO,PLACEHOLDER,TODO,FIXME, etc. - Test status: โ 10/10 tests passing
- Pattern:
๐ HTML Content Handling in Layer 3
HtmlHandlersmodule: Intelligent detection and quoting of unquoted HTML values- DOCTYPE declarations:
<!DOCTYPE HTML ...>properly detected and quoted - HTML comments:
<!-- ... -->handled correctly without breaking tag depth tracking - Void elements: Self-closing tags (
<meta>,<br>,<hr>,<img>, etc.) tracked without expecting closing tags - Self-closing syntax:
<tag />properly recognized and depth-adjusted - Nested structures: HTML with JSON-like content in attributes handled correctly
- Smart extraction: Tracks HTML tag depth to determine end of HTML block
- Proper escaping: Quotes, newlines, tabs, backslashes all escaped for valid JSON
- Array support: HTML values in arrays work correctly
- Test status: โ 15/15 tests passing
- DOCTYPE declarations:
๐ Documentation & Examples
- Professional hex-shaped logo: New
assets/json_remedy_logo.svgwith modern design featuring medical cross and Elixir drop - Example documentation:
examples/html_content_examples.exswith 5 challenging real-world scenarios - Comprehensive test suite: 65 new tests total (40 pattern tests + 15 HTML tests + 10 pre-processing tests)
Enhanced
- Pre-processing architecture: New stage before layer pipeline prevents pattern interference
- Layer 1 (ContentCleaning): Smarter trailing wrapper text removal - checks if trailing content is valid JSON before removing
- Pipeline orchestration: Integrated pre-processing with main repair pipeline for seamless operation
- Documentation: Logo integrated in README.md and HexDocs
- Package assets: Logo included in hex package for professional documentation display
- README.md: Added pre-processing pipeline documentation, HTML handling, and updated pattern status
- Test coverage: All critical tests passing (82 + 65 new tests = 147 tests, 100% success rate)
Fixed
- Multiple JSON values: Consecutive JSON values like
[]{}now properly aggregated - Object boundary issues: Extra key-value pairs after closing braces now merged correctly
- Ellipsis placeholders: Unquoted
...in arrays removed while preserving quoted ellipsis - Debug keywords: Comment-like keywords (COMMENT, DEBUG_INFO, etc.) filtered from output
- HTML in JSON values: Unquoted HTML after colons (e.g.,
"body":<!DOCTYPE HTML>) now properly quoted and escaped - API error pages: Full HTML error responses from APIs (503, 404, etc.) now handled correctly
- Complex HTML: Nested tags, attributes with quotes, special entities all work properly
Technical Details
- Pre-processing stage: Runs before Layer 1 to handle patterns that would otherwise break
- Smart JSON detection: Parses multiple consecutive JSON values with proper position tracking
- Object boundary analysis: Tracks brace balance to identify and merge split objects
- Context-aware filtering: Preserves quoted ellipsis and keywords while removing unquoted ones
- HTML depth tracking: Monitors both HTML tag depth and JSON-like structure depth
- Context awareness: Only stops at JSON delimiters when all HTML tags are closed
- Void element list: 15 HTML5 void elements recognized (
area,base,br,col,embed,hr,img,input,link,meta,param,source,track,wbr) - Binary optimization: HTML detection integrated into Layer 3's binary processing pipeline
- Zero regressions: All existing tests remain passing
Cleanup
- Removed temporary test scripts:
test_boolean.exs,test_weiss.exs
0.1.4 - 2025-10-07
Added
- Hardcoded Patterns Module: New
JsonRemedy.Layer3.HardcodedPatternsmodule with battle-tested cleanup patterns ported from Python'sjson_repairlibrary- Smart quotes normalization: Converts curly quotes (
""), guillemets (ยซยป), and angle quotes (โนโบ) to standard JSON quotes - Doubled quotes repair: Fixes
""value""โ"value"while preserving empty strings"" - Number format normalization: Removes thousands separators from numbers:
1,234,567โ1234567 - Unicode escape sequences: Converts
\u263aโโบ(opt-in via:enable_escape_normalization) - Hex escape sequences: Converts
\x41โA(opt-in via:enable_escape_normalization)
- Smart quotes normalization: Converts curly quotes (
- Comprehensive examples: New
examples/hardcoded_patterns_examples.exswith 8 detailed examples demonstrating:- Smart quotes with international text (French, Japanese, German)
- Doubled quotes edge cases
- Number format cleaning while preserving string content
- Unicode/hex escape handling
- Combined patterns for real-world LLM output
- Full Layer 3 pipeline integration
- Feature flags: Configurable pattern processing with safe defaults
:enable_hardcoded_patterns(default:true):enable_escape_normalization(default:falsefor safety)
Enhanced
- Layer 3 integration: Hardcoded patterns run as pre-processing step before main syntax normalization
- Context-aware processing: Number format normalization preserves commas in string values
- International support: Full UTF-8 support with smart quotes from multiple languages
- Documentation: README updated with dedicated hardcoded patterns section and attribution to source library
Technical Details
- Test coverage: 47 new tests for hardcoded patterns (100% pass rate)
- Source attribution: Patterns ported from json_repair by Stefano Baccianella
- Architecture: Cleanly organized as Layer 3 subsection with proper separation of concerns
- Type safety: Full Dialyzer compliance with zero warnings
- Performance: Regex-based optimizations with minimal overhead
Performance
- Test suite: 499 total tests, 0 failures (added 47 new tests)
- Zero regressions: All existing functionality preserved
- Efficient processing: Smart quotes and number normalization use optimized regex patterns
0.1.3 - 2025-07-05
Fixed
- Fixed issue where wrapper text following JSON blocks was not recognized (#1)
- Added dedicated
remove_trailing_wrapper_text/1function in Layer 1 - Now properly removes trailing text after valid JSON structures
- Example:
[{"id": 1}]\n1 Volume(s) createdโ[{"id": 1}]
- Added dedicated
0.1.2 - 2025-06-08
Added
- Advanced debugging capabilities: New
repair_with_debug/2function with comprehensive step-by-step instrumentation - Multi-word unquoted value support: Enhanced Layer 3 to handle complex cases like
Weiss Savageโ"Weiss Savage" - Detailed repair reporting: Enhanced logging with position tracking, original values, and replacement details
- Layer 3 architecture refactor: Modularized syntax normalization into specialized processors:
BinaryProcessors: High-performance binary pattern matchingCharacterParsers: Context-aware character-by-character parsingQuoteProcessors: Advanced quote normalizationRuleProcessors: Rule-based transformation engineSyntaxDetectors: Pattern detection and classificationPostProcessors: Final cleanup and validation
Enhanced
- Performance optimization: Improved Layer 3 processing with sophisticated multi-word value detection
- Type safety: Added comprehensive Dialyzer type specifications for all debug functions
- Test coverage: Added integration tests for real-world JSON repair scenarios
- Error handling: Enhanced context sanitization for layer module compatibility
Fixed
- Complex unquoted values: Fixed handling of multi-word unquoted string values with spaces
- Boolean preservation: Maintained proper boolean/null normalization while adding quote support
- Structural repair: Improved missing bracket and brace detection
- Type specifications: Resolved all Dialyzer warnings for enhanced type safety
Technical Details
- Debug instrumentation: Added
process_through_pipeline_with_debug/3andprocess_layer_with_debug/5 - Context management: Enhanced repair context with debug step tracking
- Performance metrics: Added processing time tracking and repair counting
- Real-world testing: Added 271-line test file with complex JSON repair scenarios
Performance
- Efficient processing: ~48ms for 9KB malformed JSON files
- Minimal repairs: Optimized to make only necessary changes (2 repairs vs 8+ in previous versions)
- Layer 3 efficiency: Enhanced binary processing with reduced repair overhead
0.1.1 - 2025-06-07
Changed
- BREAKING: Complete architectural rewrite - Brand new 5-layer pipeline design
- New layered approach: Regex โ State Machine โ Character Parsing โ Validation โ Tolerant Parsing
- Improved performance: Significantly faster with intelligent fast-path optimization
- Better reliability: More robust handling of complex malformed JSON
- Enhanced API: More intuitive function signatures and options
- Superior test coverage: Comprehensive test suite with real-world scenarios
Added
- Layer 1 - Content Cleaning: Advanced code fence removal, comment stripping, encoding normalization
- Layer 2 - Structural Repair: Sophisticated state machine for delimiter repair and object concatenation
- Layer 3 - Syntax Normalization: Context-aware quote, boolean, and comma normalization
- Layer 4 - Fast Validation: Jason.decode optimization with early exit for valid JSON
- Implementation Status Documentation: Clear roadmap and current capabilities
- Real-world Examples: Comprehensive examples for LLM output, legacy systems, streaming data
- Advanced Benchmarking: Performance testing suite with memory profiling
Technical Details
- Complete codebase rewrite: All modules redesigned from ground up
- New design patterns: LayerBehaviour protocol, Context tracking, Pipeline architecture
- Enhanced maintainability: Modular design with clear separation of concerns
- Production readiness: Comprehensive error handling and edge case coverage
Future
- Layer 5 - Tolerant Parsing: Planned for next major release (aggressive error recovery)
Note
This is a 100% rewrite - all previous code has been replaced with the new layered architecture. While the API maintains compatibility, the internal implementation is entirely new.
0.1.0 - 2025-06-06
Added
- Initial release of JsonRemedy
- Binary pattern matching JSON parser with repair capabilities
- Support for repairing common JSON malformations:
- Missing quotes around keys and values
- Single quotes instead of double quotes
- Trailing commas in arrays and objects
- Missing commas between elements
- Incomplete objects and arrays
- Boolean variants (True/False/TRUE/FALSE)
- Null variants (None/NULL/Null)
- Code fence removal (triple-backtick
jsonfences) - Comment stripping (// and / /)
- Core API functions:
JsonRemedy.repair/2- Parse and repair JSON to Elixir termsJsonRemedy.repair_to_string/2- Repair and return JSON stringJsonRemedy.from_file/2- Repair JSON from file
- Optional repair logging with detailed action tracking
- Multiple parsing strategies (binary patterns, combinators, streaming)
- Comprehensive test suite with 10 doctests and 25 unit tests
- Performance benchmarking suite
- CLI tool with escript support
- Complete documentation with examples
- MIT license
Performance
- 4.32M operations/sec for valid JSON parsing
- 90,000+ operations/sec for malformed JSON repair
- Sub-microsecond parsing for small valid JSON
- Minimal memory overhead (< 8KB for repairs)
- All operations pass performance thresholds