JsonRemedy.Layer1.ContentCleaning (JsonRemedy v0.1.11)

View Source

Layer 1: Content Cleaning - Removes non-JSON content and normalizes encoding.

This layer handles:

  • Code fence removal (json ...)
  • Comment stripping (// and / /)
  • Wrapper text extraction (HTML, prose)
  • Encoding normalization

Uses direct string methods instead of regex for better performance and clearer code.

Summary

Functions

Extract JSON from wrapper text (HTML, prose, etc.). Public API version that takes string input directly.

Extract JSON from wrapper text (HTML, prose, etc.).

Return a human-readable name for this layer.

Normalize text encoding to UTF-8. Public API version that takes string input directly.

Normalize text encoding to UTF-8.

Return the priority order for this layer. Layer 1 (Content Cleaning) should run first in the pipeline.

Process input string and apply Layer 1 content cleaning repairs.

Remove code fences from input while preserving fence content in strings.

Strip comments while preserving comment-like content in strings.

Strip comments while preserving comment-like content in strings. Public API version that takes string input directly.

Strip trailing dots from input.

Strip trailing dots from truncated content (Gemini max_output_tokens pattern).

Check if this layer can handle the given input. Layer 1 can handle any text input that may contain JSON with wrapping content.

Validate layer configuration and options. Layer 1 accepts options for enabling/disabling specific cleaning features.

Types

layer_result()

@type layer_result() :: JsonRemedy.LayerBehaviour.layer_result()

repair_action()

@type repair_action() :: JsonRemedy.LayerBehaviour.repair_action()

repair_context()

@type repair_context() :: JsonRemedy.LayerBehaviour.repair_context()

Functions

extract_json_content(input)

@spec extract_json_content(input :: String.t()) :: {String.t(), [repair_action()]}

Extract JSON from wrapper text (HTML, prose, etc.). Public API version that takes string input directly.

extract_json_content_internal(arg)

@spec extract_json_content_internal(input :: {String.t(), [repair_action()]}) ::
  {String.t(), [repair_action()]}

Extract JSON from wrapper text (HTML, prose, etc.).

name()

@spec name() :: String.t()

Return a human-readable name for this layer.

normalize_encoding(input)

@spec normalize_encoding(input :: String.t()) :: {String.t(), [repair_action()]}

Normalize text encoding to UTF-8. Public API version that takes string input directly.

normalize_encoding_internal(arg)

@spec normalize_encoding_internal(input :: {String.t(), [repair_action()]}) ::
  {String.t(), [repair_action()]}

Normalize text encoding to UTF-8.

priority()

@spec priority() :: 1

Return the priority order for this layer. Layer 1 (Content Cleaning) should run first in the pipeline.

process(input, context)

@spec process(input :: String.t(), context :: repair_context()) :: layer_result()

Process input string and apply Layer 1 content cleaning repairs.

Returns:

  • {:ok, processed_input, updated_context} - Layer completed successfully
  • {:continue, input, context} - Layer doesn't apply, pass to next layer
  • {:error, reason} - Layer failed, stop pipeline

remove_code_fences(input)

@spec remove_code_fences(input :: String.t()) :: {String.t(), [repair_action()]}

Remove code fences from input while preserving fence content in strings.

remove_comments(arg)

@spec remove_comments(input :: {String.t(), [repair_action()]}) ::
  {String.t(), [repair_action()]}

Strip comments while preserving comment-like content in strings.

strip_comments(input)

@spec strip_comments(input :: String.t()) :: {String.t(), [repair_action()]}

Strip comments while preserving comment-like content in strings. Public API version that takes string input directly.

strip_trailing_dots(input)

@spec strip_trailing_dots(input :: String.t()) :: {String.t(), [repair_action()]}

Strip trailing dots from input.

Public API version that takes string input directly. Detects and removes trailing dots that indicate truncation (10+ consecutive dots).

strip_trailing_dots_internal(arg)

@spec strip_trailing_dots_internal(input :: {String.t(), [repair_action()]}) ::
  {String.t(), [repair_action()]}

Strip trailing dots from truncated content (Gemini max_output_tokens pattern).

When LLMs like Gemini hit max_output_tokens, they sometimes fill remaining tokens with dots instead of stopping cleanly. This results in truncated JSON followed by thousands of trailing dots.

This function detects and strips these trailing dots while preserving:

  • Dots inside string values
  • Legitimate ellipsis (...) in content
  • Valid JSON structure

supports?(input)

@spec supports?(input :: String.t()) :: boolean()

Check if this layer can handle the given input. Layer 1 can handle any text input that may contain JSON with wrapping content.

validate_options(options)

@spec validate_options(options :: keyword()) :: :ok | {:error, String.t()}

Validate layer configuration and options. Layer 1 accepts options for enabling/disabling specific cleaning features.