View Source Pdf.Reader.CID.Decoder (ExPDF v1.0.1)

CID font decoder for Type0/Identity-H and Identity-V composite fonts.

Returns a decoder_fn() closure with the same contract as the simple-font decoder: (binary() -> {String.t(), [{non_neg_integer(), binary()}]}).

Resolution cascade (per CID)

  1. ToUnicode CMap — if the font has a /ToUnicode stream, its bf_char/ bf_range entries are checked first (most specific).
  2. Adobe registry table/CIDSystemInfo /Ordering maps to one of the four bundled collection modules (AdobeJapan1, AdobeCNS1, AdobeKorea1, AdobeGB1). O(1) pattern-match dispatch.
  3. U+FFFD fallback — unresolved CIDs yield U+FFFD plus a sentinel tuple {idx, "cid:0xHHHH"} appended to the unresolved list.

__test_cmap__ shortcut

For unit tests, a pre-parsed %Pdf.Reader.CMap{} can be injected by storing it in the font dict under the key "__test_cmap__". This bypasses stream resolution. (Mirrors the same shortcut in Pdf.Reader.Font.)

Width / advance computation

This module handles character decoding only (bytes → Unicode text). Glyph advance widths (/W and /DW entries on the DescendantFonts[0] dict) are read separately by Pdf.Reader.Font.Widths (§ 9.7.4.3). The two concerns are intentionally kept in separate modules: decoding and advance computation are independent of each other.

Spec references

Summary

Functions

Build a CID decoder closure from a Type0 font dict.

Build a predefined CMap decoder closure from a Type0 font dict whose /Encoding names a bundled predefined CMap (e.g. UniJIS-UTF16-H).

Types

@type decoder_fn() :: (binary() -> {String.t(), [{non_neg_integer(), binary()}]})

Functions

@spec build(map(), Pdf.Reader.Document.t()) ::
  {:ok, decoder_fn(), Pdf.Reader.Document.t()} | {:error, term()}

Build a CID decoder closure from a Type0 font dict.

font_dict is the top-level Type0 font dictionary (already resolved). Reads DescendantFonts, CIDSystemInfo, CIDToGIDMap, and ToUnicode.

Returns {:ok, decoder_fn, updated_doc}.

Link to this function

build_predefined(font_dict, doc)

View Source
@spec build_predefined(map(), Pdf.Reader.Document.t()) ::
  {:ok, decoder_fn(), Pdf.Reader.Document.t()} | {:error, term()}

Build a predefined CMap decoder closure from a Type0 font dict whose /Encoding names a bundled predefined CMap (e.g. UniJIS-UTF16-H).

Resolution cascade per code token (PDF 1.7 § 9.7.5, D9):

  1. ToUnicode CMap — if present, checked first (most specific).
  2. Predefined CMapcidcharcidrangenotdef lookup.
  3. Adobe registry table — CID → Unicode via AdobeJapan1/CNS1/Korea1/GB1.
  4. U+FFFD fallback — unresolved codes yield U+FFFD + sentinel.

Returns {:ok, decoder_fn, updated_doc}.