oaspec/openapi/parser
Types
Configuration for parse_string_with_limits.
Each field caps a parser-side resource that an attacker-controlled
or accidentally-pathological spec could exhaust. The defaults
returned by default_limits are sized for real-world specs
(Stripe / GitHub / AsyncAPI all fit comfortably) and are tight
enough that a CI runner targeting an attacker-supplied spec is
not a denial-of-service surface.
Currently enforced:
max_input_bytes: the size ofcontentin bytes. Checked before any parser work begins, so a 100 MB pathological input is rejected before yamerl orjson:decode/3allocates a tree.
Documented but not yet enforced (future work — issue #553 tracks the rest):
max_schema_depth,max_allof_chain,max_external_ref_hops,max_paths,max_parameters_per_op. Constructing these limits in the type today lets callers pin the contract; the parser will start enforcing them in follow-up PRs.
pub type ParseLimits {
ParseLimits(
max_input_bytes: Int,
max_schema_depth: Int,
max_allof_chain: Int,
max_external_ref_hops: Int,
max_paths: Int,
max_parameters_per_op: Int,
)
}
Constructors
-
ParseLimits( max_input_bytes: Int, max_schema_depth: Int, max_allof_chain: Int, max_external_ref_hops: Int, max_paths: Int, max_parameters_per_op: Int, )
Values
pub fn default_limits() -> ParseLimits
Project-default limits sized for real-world specs.
max_input_bytes: 16 MiB — Stripe’s full OpenAPI is ~6 MB, GitHub’s REST API is ~12 MB; 16 MiB clears both with headroom.max_schema_depth: 100. Real specs rarely nest beyond ~12.max_allof_chain: 32.max_external_ref_hops: 16.max_paths: 4096. Stripe (~1k operations), GitHub (~1k), and AsyncAPI (~50) all fit comfortably.max_parameters_per_op: 64. The largest real-world operation the audit found has ~20 parameters.
pub fn parse_error_to_string(
error: diagnostic.Diagnostic,
) -> String
Parse a schema reference (either $ref or inline schema). Convert a parse error to a human-readable string.
pub fn parse_file(
path: String,
) -> Result(
@internal OpenApiSpec(@internal Unresolved),
diagnostic.Diagnostic,
)
Parse an OpenAPI spec from a file path.
Supports both YAML (.yaml, .yml) and JSON (.json) files.
After parsing, resolves relative-file $ref values across schemas,
parameters, request bodies, responses, and path items — including
nested object/array properties and composition branches — by loading
the referenced files from disk and merging their definitions into the
main spec. Cyclic external ref graphs (A.yaml → B.yaml → A.yaml)
fail fast with a dedicated diagnostic. HTTP/HTTPS URLs are not
followed.
pub fn parse_file_with_progress_and_locations(
path: String,
reporter: @internal Reporter,
) -> Result(
#(
@internal OpenApiSpec(@internal Unresolved),
@internal LocationIndex,
),
diagnostic.Diagnostic,
)
Combined parse_file entry point that accepts a Reporter and
also returns the top-level YAML LocationIndex. Issue #411 +
#352. The CLI uses this so capability-check diagnostics surface
path:line:column: and progress lines on big specs at the same
time. Library callers that don’t need progress should pass
progress.noop(); callers that don’t need locations can discard
the second tuple element.
pub fn parse_json_string(
content: String,
) -> Result(
@internal OpenApiSpec(@internal Unresolved),
diagnostic.Diagnostic,
)
Parse an OpenAPI spec from a JSON string using OTP’s native JSON
decoder instead of yamerl. Roughly two orders of magnitude faster
than parse_string on large specs because the YAML pre-processing
and constructor passes are skipped (issue #352). Behaves like
parse_string once the tree is built — same OpenApiSpec shape,
same downstream pipeline. Diagnostics from this path do not carry
source line/column info because OTP json:decode/3 does not
expose decoder positions; the caller still gets the path-prefixed
error message that downstream tooling relies on.
pub fn parse_json_string_with_locations(
content: String,
) -> Result(
#(
@internal OpenApiSpec(@internal Unresolved),
@internal LocationIndex,
),
diagnostic.Diagnostic,
)
JSON variant of parse_string_with_locations.
OTP’s json:decode/3 does not expose token positions, so the
returned LocationIndex is always empty (location_index.empty()).
Capability-check diagnostics from a JSON-only spec therefore carry
NoSourceLoc, while diagnostics from the YAML path can carry
line/column info via the index. Downstream tooling that wants to
dispatch over both formats with one signature should reach for
parse_string_or_json_with_locations, which inspects the first
non-whitespace byte to pick between the two parsers.
Returning the empty index instead of an Option(LocationIndex) keeps
the type identical to parse_string_with_locations, so callers do
not need a separate code path — they only lose location-aware
diagnostics on the JSON branch.
pub fn parse_string(
content: String,
) -> Result(
@internal OpenApiSpec(@internal Unresolved),
diagnostic.Diagnostic,
)
Parse an OpenAPI spec from a YAML/JSON string. The default path
runs the input through yamerl, which preserves YAML semantics and
source locations but is too slow on large JSON specs (the GitHub
REST OpenAPI is ~12 MB and yamerl effectively hangs — see issue
#352). Use parse_json_string directly when the content is known
to be JSON.
parse_string does not apply the DoS limits documented in
ParseLimits. Reach for parse_string_with_limits when the input
is attacker-controlled or sourced from an untrusted file system
(admin-uploaded specs, contract-validation pipelines, CI runners
over user-supplied specs) — see issue #553.
YAML 1.1 type coercion: parse_string vs parse_json_string.
yamerl applies YAML 1.1 implicit-type rules to scalars before they
reach metamon’s tree walker. The OTP json:decode/3 frontend used
by parse_json_string does not. The two parsers therefore diverge
on the same JSON bytes whenever a value matches a YAML 1.1
implicit-type pattern:
| JSON literal | parse_string (yamerl, YAML 1.1) | parse_json_string (OTP) |
|---|---|---|
"version": "Yes" | bool True | string "Yes" |
"role": "No" | bool False | string "No" |
"flag": "On" / "Off" | bool True / False | string "On" / "Off" |
"version": 1.10 | float 1.1 (trailing zero lost) | float 1.10 |
"hex": 0x10 | int 16 (yamerl extension) | parse error (not valid JSON) |
For JSON OpenAPI documents — Stripe, GitHub, AsyncAPI, etc. — prefer
parse_json_string (or parse_string_or_json_with_locations,
which auto-routes by inspecting the first non-whitespace byte).
parse_string remains correct for YAML input and for JSON inputs
whose values do not collide with YAML 1.1 implicit-type patterns.
pub fn parse_string_or_json_with_locations(
content: String,
) -> Result(
#(
@internal OpenApiSpec(@internal Unresolved),
@internal LocationIndex,
),
diagnostic.Diagnostic,
)
Auto-dispatch over parse_string_with_locations (YAML) and
parse_json_string_with_locations (JSON) based on the first
non-whitespace byte of content.
{ and [ route to the JSON parser (orders of magnitude faster on
large specs — see parse_json_string); anything else routes to
the YAML parser. The dispatch covers the conventional OpenAPI
document shapes (object root for full specs, array root for the
rare top-level component lists). Whitespace prefixes (BOM, leading
spaces, blank lines) are skipped before the discriminator byte
is inspected.
Use this when downstream tooling needs a single entry point for both formats — LSP-style features, error-hint generators, source-map producers — without writing the dispatch wrapper at every call site.
pub fn parse_string_with_limits(
content: String,
limits: ParseLimits,
) -> Result(
@internal OpenApiSpec(@internal Unresolved),
diagnostic.Diagnostic,
)
Parse an OpenAPI spec with DoS-aware resource limits. Currently
enforces limits.max_input_bytes before parsing begins; the other
fields on ParseLimits are reserved for future enforcement (see
issue #553).
The byte cap is checked via string.byte_size so the function
returns immediately on oversized input rather than handing it to
yamerl / json:decode/3 (both of which allocate proportional
tree memory before the size could be discovered downstream).
Returns the same Diagnostic-bearing Result as parse_string
when the limit is satisfied; returns a structured
parse_limit_exceeded diagnostic when the limit is exceeded.
pub fn parse_string_with_locations(
content: String,
) -> Result(
#(
@internal OpenApiSpec(@internal Unresolved),
@internal LocationIndex,
),
diagnostic.Diagnostic,
)
Same as parse_string but also returns the YAML LocationIndex
built from the input. Caller-side companion to
parse_file_with_locations (Issue #411).