# `SignCore.PDF.Reader`
[🔗](https://github.com/utaladriz/pkcs11ex/blob/v0.1.0/lib/sign_core/pdf/reader.ex#L1)

Minimal PDF trailer / xref scanner for the PAdES adapter.

Scope is the file-level structure only — the four primitives the
Phase 4 plan calls out:

  1. Locate `startxref` and the most-recent xref offset.
  2. Parse the text-format xref subsections at that offset.
  3. Extract `/Size`, `/Root`, `/Prev` from the trailer dict.
  4. Walk the `/Prev` chain across revisions.

Out of scope (deliberately): content streams, encoded streams, page
resources, font dictionaries, and any indirect-object body. None of
those are required for incremental signature emission or for
recomputing the byte-range covered by a `/Sig`.

Cross-reference streams (PDF 1.5+, `/Type /XRef`) are not handled in
v1 and surface as `{:error, {:malformed_pdf, :xref_stream_unsupported}}`.
Per the Phase 4 plan, the writer always emits the legacy text-format
xref (still legal in PDF 1.7+); the reader is the side that needs to
tolerate vendor variation, and we accept the limitation until a real
corpus argues for `lopdf` on the verify path.

# `error`

```elixir
@type error() :: {:malformed_pdf, atom()}
```

Reader error. Always carries `:malformed_pdf` as the class atom.

# `merged_xref_offsets`

```elixir
@spec merged_xref_offsets(binary()) ::
  {:ok, %{required(non_neg_integer()) =&gt; non_neg_integer()}} | {:error, error()}
```

Returns the merged xref offsets across every revision in the PDF —
newest entry per object number wins (incremental updates override).

Used by the verify path to enumerate indirect objects by number
without picking older revisions of objects that were superseded.

# `next_object_number`

```elixir
@spec next_object_number(binary()) :: {:ok, non_neg_integer()} | {:error, error()}
```

Returns the next free indirect-object number, derived from the
most-recent revision's `/Size`. PAdES incremental updates allocate
fresh object numbers starting here.

# `parse`

```elixir
@spec parse(binary()) :: {:ok, SignCore.PDF.Reader.Revision.t()} | {:error, error()}
```

Convenience: locate the most-recent xref and read it.

# `read_catalog_body`

```elixir
@spec read_catalog_body(binary()) :: {:ok, binary()} | {:error, error()}
```

Returns the catalog dict body (the bytes between `<<` and `>>` of the
object pointed at by `/Root`). The catalog is what an incremental
update must re-emit when adding a `/Sig` field — its `/AcroForm` and
`/Pages` entries need to be preserved.

Returns `{:error, {:malformed_pdf, :catalog_not_indirect}}` if the
catalog body isn't a plain dict (rare; would only happen if /Root
pointed at an object stream).

# `read_dict_at`

```elixir
@spec read_dict_at(binary(), non_neg_integer()) ::
  {:ok, binary()} | {:error, error() | :not_a_dict}
```

Returns the dict body (the bytes between the object's outer `<<`
and matching `>>`) for the object at `offset`. `:not_a_dict` for
objects that don't begin with a dict (streams, primitives).

# `read_object_body`

```elixir
@spec read_object_body(binary(), non_neg_integer()) ::
  {:ok, binary()} | {:error, error()}
```

Reads the textual body of the indirect object at the given offset.
Returns the bytes between `obj` and `endobj`, trimmed.

Used by the Writer to extract the catalog dict so an incremental
update can re-emit it with a merged `/AcroForm` entry. Does not
parse stream contents; the bytes are returned verbatim.

# `read_revision`

```elixir
@spec read_revision(binary(), non_neg_integer()) ::
  {:ok, SignCore.PDF.Reader.Revision.t()} | {:error, error()}
```

Reads the xref table + trailer at the given offset and returns a
`Revision` describing this PDF revision.

# `revisions`

```elixir
@spec revisions(binary()) ::
  {:ok, [SignCore.PDF.Reader.Revision.t()]} | {:error, error()}
```

Walks the `/Prev` chain newest-first. The first element is the most
recent revision (the one `startxref` points at); the last is the
original.

# `signature_dicts`

```elixir
@spec signature_dicts(binary()) ::
  {:ok, [{non_neg_integer(), binary()}]} | {:error, error()}
```

Returns the list of `{object_number, dict_body}` pairs for every
indirect object whose body is a dictionary containing `/Type /Sig`.

This is the canonical way to locate signature dicts: it ignores
comments, content-stream text that happens to mention `/Type /Sig`,
and superseded older revisions of the same object number. Each
returned dict body is bounded — only the dict content between its
outer `<<` and matching `>>`, suitable for whitespace-tolerant
regex extraction of `/ByteRange` and `/Contents`.

# `startxref`

```elixir
@spec startxref(binary()) :: {:ok, non_neg_integer()} | {:error, error()}
```

Returns the byte offset stored in the file's terminating `startxref`
marker. Searches the last 8192 bytes — PDF 1.7 §7.5.5
requires it within the last 1 KiB but real-world authoring tools emit
trailing whitespace that pushes the marker further back.

---

*Consult [api-reference.md](api-reference.md) for complete listing*