View Source Pdf.Reader.Page (ExPDF v1.0.1)
Page tree walker for Pdf.Reader.
Spec reference: PDF 1.7 § 7.7.3 (Page Tree), § 7.7.3.4 (Inheritance of Page Attributes).
Page tree structure
The Catalog's /Pages entry points to the root of the page tree.
A node with /Type /Pages is an intermediate node containing a /Kids
array of refs to child nodes (either /Pages or /Page).
A node with /Type /Page is a leaf — one actual page.
API
list_refs(doc) :: {:ok, [ref], updated_doc} | {:error, reason}Walks the tree recursively, collecting leaf /Page refs in document order.
Threads doc forward so that resolved objects accumulate in the cache.
Catalog/Pages tree fallback (R-4)
When doc.recover_mode is true and the normal tree walk fails (missing
/Root, dangling /Pages ref, or other catalog resolution error), the
recovery branch scans the xref table directly for objects that match ALL of:
/Type /Pagein the object dict- Either
/ContentsOR/Parentpresent (disambiguates from Form XObjects which also carry/Type /XObject /Subtype /Form)
The recovered list is in xref-insertion order, NOT document order. This
known limitation is by design — reconstruction from corrupt trees is
unreliable. A {:page_tree_recovered, n} event is appended to the
recovery_log so callers know page order may differ.
Known limitations (R-4)
Page order loss — catalog-fallback page order follows xref-insertion order, not the original document order.
/Parentchain reconstruction is not attempted (unreliable on corrupt trees). The{:page_tree_recovered, n}event explicitly signals this to callers.Encrypted AND corrupted PDFs — when both the xref table and the catalog are corrupt, the R-3 linear scan reconstructs the xref but cannot include
/Encryptin the synthetic trailer. Without/Encrypt, decryption cannot proceed and the PDF is non-decryptable even withrecover: true.
Spec citations:
- PDF 1.7 § 7.7.2 — Document catalog (Catalog dict, /Pages entry)
- PDF 1.7 § 7.7.3 — Page tree (/Pages /Kids traversal)
- PDF 1.7 § 7.7.3.4 — Inheritance of page attributes
Summary
Functions
Walks the page tree and returns a list of leaf /Page object refs in order.
Functions
@spec list_refs(Pdf.Reader.Document.t()) :: {:ok, [Pdf.Reader.Document.ref()], Pdf.Reader.Document.t()} | {:error, term()}
Walks the page tree and returns a list of leaf /Page object refs in order.
Returns {:ok, refs, updated_doc} where:
refsis[{obj_num, gen_num}]in page order (or xref order in fallback)updated_dochas cache populated from the traversal
Returns {:error, reason} if the page tree cannot be traversed and
recover_mode is false.
When recover_mode is true and traversal fails, falls back to xref scan
and appends {:page_tree_recovered, n} to recovery_log.