oaspec/openapi/parser

Types

Configuration for parse_string_with_limits.

Each field caps a parser-side resource that an attacker-controlled or accidentally-pathological spec could exhaust. The defaults returned by default_limits are sized for real-world specs (Stripe / GitHub / AsyncAPI all fit comfortably) and are tight enough that a CI runner targeting an attacker-supplied spec is not a denial-of-service surface.

Currently enforced:

  • max_input_bytes: the size of content in bytes. Checked before any parser work begins, so a 100 MB pathological input is rejected before yamerl or json:decode/3 allocates a tree.

Documented but not yet enforced (future work — issue #553 tracks the rest):

  • max_schema_depth, max_allof_chain, max_external_ref_hops, max_paths, max_parameters_per_op. Constructing these limits in the type today lets callers pin the contract; the parser will start enforcing them in follow-up PRs.
pub type ParseLimits {
  ParseLimits(
    max_input_bytes: Int,
    max_schema_depth: Int,
    max_allof_chain: Int,
    max_external_ref_hops: Int,
    max_paths: Int,
    max_parameters_per_op: Int,
  )
}

Constructors

  • ParseLimits(
      max_input_bytes: Int,
      max_schema_depth: Int,
      max_allof_chain: Int,
      max_external_ref_hops: Int,
      max_paths: Int,
      max_parameters_per_op: Int,
    )

Values

pub fn default_limits() -> ParseLimits

Project-default limits sized for real-world specs.

  • max_input_bytes: 16 MiB — Stripe’s full OpenAPI is ~6 MB, GitHub’s REST API is ~12 MB; 16 MiB clears both with headroom.
  • max_schema_depth: 100. Real specs rarely nest beyond ~12.
  • max_allof_chain: 32.
  • max_external_ref_hops: 16.
  • max_paths: 4096. Stripe (~1k operations), GitHub (~1k), and AsyncAPI (~50) all fit comfortably.
  • max_parameters_per_op: 64. The largest real-world operation the audit found has ~20 parameters.
pub fn parse_error_to_string(
  error: diagnostic.Diagnostic,
) -> String

Parse a schema reference (either $ref or inline schema). Convert a parse error to a human-readable string.

pub fn parse_file(
  path: String,
) -> Result(
  @internal OpenApiSpec(@internal Unresolved),
  diagnostic.Diagnostic,
)

Parse an OpenAPI spec from a file path. Supports both YAML (.yaml, .yml) and JSON (.json) files. After parsing, resolves relative-file $ref values across schemas, parameters, request bodies, responses, and path items — including nested object/array properties and composition branches — by loading the referenced files from disk and merging their definitions into the main spec. Cyclic external ref graphs (A.yaml → B.yaml → A.yaml) fail fast with a dedicated diagnostic. HTTP/HTTPS URLs are not followed.

pub fn parse_file_with_progress_and_locations(
  path: String,
  reporter: @internal Reporter,
) -> Result(
  #(
    @internal OpenApiSpec(@internal Unresolved),
    @internal LocationIndex,
  ),
  diagnostic.Diagnostic,
)

Combined parse_file entry point that accepts a Reporter and also returns the top-level YAML LocationIndex. Issue #411 + #352. The CLI uses this so capability-check diagnostics surface path:line:column: and progress lines on big specs at the same time. Library callers that don’t need progress should pass progress.noop(); callers that don’t need locations can discard the second tuple element.

pub fn parse_json_string(
  content: String,
) -> Result(
  @internal OpenApiSpec(@internal Unresolved),
  diagnostic.Diagnostic,
)

Parse an OpenAPI spec from a JSON string using OTP’s native JSON decoder instead of yamerl. Roughly two orders of magnitude faster than parse_string on large specs because the YAML pre-processing and constructor passes are skipped (issue #352). Behaves like parse_string once the tree is built — same OpenApiSpec shape, same downstream pipeline. Diagnostics from this path do not carry source line/column info because OTP json:decode/3 does not expose decoder positions; the caller still gets the path-prefixed error message that downstream tooling relies on.

pub fn parse_json_string_with_locations(
  content: String,
) -> Result(
  #(
    @internal OpenApiSpec(@internal Unresolved),
    @internal LocationIndex,
  ),
  diagnostic.Diagnostic,
)

JSON variant of parse_string_with_locations.

OTP’s json:decode/3 does not expose token positions, so the returned LocationIndex is always empty (location_index.empty()). Capability-check diagnostics from a JSON-only spec therefore carry NoSourceLoc, while diagnostics from the YAML path can carry line/column info via the index. Downstream tooling that wants to dispatch over both formats with one signature should reach for parse_string_or_json_with_locations, which inspects the first non-whitespace byte to pick between the two parsers.

Returning the empty index instead of an Option(LocationIndex) keeps the type identical to parse_string_with_locations, so callers do not need a separate code path — they only lose location-aware diagnostics on the JSON branch.

pub fn parse_string(
  content: String,
) -> Result(
  @internal OpenApiSpec(@internal Unresolved),
  diagnostic.Diagnostic,
)

Parse an OpenAPI spec from a YAML/JSON string. The default path runs the input through yamerl, which preserves YAML semantics and source locations but is too slow on large JSON specs (the GitHub REST OpenAPI is ~12 MB and yamerl effectively hangs — see issue #352). Use parse_json_string directly when the content is known to be JSON.

parse_string does not apply the DoS limits documented in ParseLimits. Reach for parse_string_with_limits when the input is attacker-controlled or sourced from an untrusted file system (admin-uploaded specs, contract-validation pipelines, CI runners over user-supplied specs) — see issue #553.

YAML 1.1 type coercion: parse_string vs parse_json_string. yamerl applies YAML 1.1 implicit-type rules to scalars before they reach metamon’s tree walker. The OTP json:decode/3 frontend used by parse_json_string does not. The two parsers therefore diverge on the same JSON bytes whenever a value matches a YAML 1.1 implicit-type pattern:

JSON literalparse_string (yamerl, YAML 1.1)parse_json_string (OTP)
"version": "Yes"bool Truestring "Yes"
"role": "No"bool Falsestring "No"
"flag": "On" / "Off"bool True / Falsestring "On" / "Off"
"version": 1.10float 1.1 (trailing zero lost)float 1.10
"hex": 0x10int 16 (yamerl extension)parse error (not valid JSON)

For JSON OpenAPI documents — Stripe, GitHub, AsyncAPI, etc. — prefer parse_json_string (or parse_string_or_json_with_locations, which auto-routes by inspecting the first non-whitespace byte). parse_string remains correct for YAML input and for JSON inputs whose values do not collide with YAML 1.1 implicit-type patterns.

pub fn parse_string_or_json_with_locations(
  content: String,
) -> Result(
  #(
    @internal OpenApiSpec(@internal Unresolved),
    @internal LocationIndex,
  ),
  diagnostic.Diagnostic,
)

Auto-dispatch over parse_string_with_locations (YAML) and parse_json_string_with_locations (JSON) based on the first non-whitespace byte of content.

{ and [ route to the JSON parser (orders of magnitude faster on large specs — see parse_json_string); anything else routes to the YAML parser. The dispatch covers the conventional OpenAPI document shapes (object root for full specs, array root for the rare top-level component lists). Whitespace prefixes (BOM, leading spaces, blank lines) are skipped before the discriminator byte is inspected.

Use this when downstream tooling needs a single entry point for both formats — LSP-style features, error-hint generators, source-map producers — without writing the dispatch wrapper at every call site.

pub fn parse_string_with_limits(
  content: String,
  limits: ParseLimits,
) -> Result(
  @internal OpenApiSpec(@internal Unresolved),
  diagnostic.Diagnostic,
)

Parse an OpenAPI spec with DoS-aware resource limits. Currently enforces limits.max_input_bytes before parsing begins; the other fields on ParseLimits are reserved for future enforcement (see issue #553).

The byte cap is checked via string.byte_size so the function returns immediately on oversized input rather than handing it to yamerl / json:decode/3 (both of which allocate proportional tree memory before the size could be discovered downstream).

Returns the same Diagnostic-bearing Result as parse_string when the limit is satisfied; returns a structured parse_limit_exceeded diagnostic when the limit is exceeded.

pub fn parse_string_with_locations(
  content: String,
) -> Result(
  #(
    @internal OpenApiSpec(@internal Unresolved),
    @internal LocationIndex,
  ),
  diagnostic.Diagnostic,
)

Same as parse_string but also returns the YAML LocationIndex built from the input. Caller-side companion to parse_file_with_locations (Issue #411).

Search Document