View Source Pdf.Reader.CMap (ExPDF v1.0.1)

Parser for the ToUnicode CMap subset used in PDF fonts.

Spec reference: PDF 1.7 § 9.10.3 and Adobe Tech Note 5099 (CMap and CIDFont Files Specification).

Supported subset

Only beginbfchar/endbfchar and beginbfrange/endbfrange sections are parsed. Everything else (codespacerange, cidchar, cidrange, notdefchar, notdefrange, and PostScript prologue/epilogue) is silently skipped.

Data shape

%Pdf.Reader.CMap{
  bf_char: %{integer => String.t()},       # O(log n) map lookup
  bf_range: [{lo, hi, dst}]                # linear scan, dst is String.t() or [String.t()]
}

Lookup order

bf_char (O(log n) map) — checked first.
bf_range (linear, typically < 10 entries) — checked on miss.

Returns nil if not mapped by either table.

UTF-16BE decoding

Hex strings in the CMap (<HHHH...>) are UTF-16BE encoded codepoint sequences. Erlang's :unicode.characters_to_binary/3 converts them to UTF-8 (Elixir String.t()).

Summary

Types

t()

Functions

lookup(c_map, code)

Looks up a character code in the CMap.

parse(binary)

Parses a ToUnicode CMap binary into a %Pdf.Reader.CMap{} struct.