View Source Pdf.Reader.CMap (ExPDF v1.0.1)

Parser for the ToUnicode CMap subset used in PDF fonts.

Spec reference: PDF 1.7 ยง 9.10.3 and Adobe Tech Note 5099 (CMap and CIDFont Files Specification).

Supported subset

Only beginbfchar/endbfchar and beginbfrange/endbfrange sections are parsed. Everything else (codespacerange, cidchar, cidrange, notdefchar, notdefrange, and PostScript prologue/epilogue) is silently skipped.

Data shape

%Pdf.Reader.CMap{
  bf_char: %{integer => String.t()},       # O(log n) map lookup
  bf_range: [{lo, hi, dst}]                # linear scan, dst is String.t() or [String.t()]
}

Lookup order

  1. bf_char (O(log n) map) โ€” checked first.
  2. bf_range (linear, typically < 10 entries) โ€” checked on miss.

Returns nil if not mapped by either table.

UTF-16BE decoding

Hex strings in the CMap (<HHHH...>) are UTF-16BE encoded codepoint sequences. Erlang's :unicode.characters_to_binary/3 converts them to UTF-8 (Elixir String.t()).

Summary

Functions

Looks up a character code in the CMap.

Parses a ToUnicode CMap binary into a %Pdf.Reader.CMap{} struct.

Types

@type t() :: %Pdf.Reader.CMap{
  bf_char: %{required(non_neg_integer()) => String.t()},
  bf_range: [{non_neg_integer(), non_neg_integer(), String.t() | [String.t()]}]
}

Functions

@spec lookup(t(), non_neg_integer()) :: String.t() | nil

Looks up a character code in the CMap.

Returns the corresponding UTF-8 String.t() or nil if not mapped. Lookup order: bf_char first (O(log n)), then bf_range (linear scan).

@spec parse(binary()) :: t()

Parses a ToUnicode CMap binary into a %Pdf.Reader.CMap{} struct.

Only bfchar and bfrange sections are extracted. All other PostScript CMap constructs are skipped silently.