View Source Pdf.Reader.Line (ExPDF v1.0.1)

Logical text line reconstructed from individual TextRuns.

Many PDFs (particularly machine-generated ones such as government forms and tax documents) place glyphs individually with the TJ operator and per-glyph kerning, producing one TextRun per character. Working with that flat run list is awkward — Line coalesces those runs into the structure a human reader sees: lines and, within each line, tokens separated by visible whitespace.

Shape

:page — 1-indexed page number
:y — baseline Y of the line (PDF user-space, origin bottom-left)
:x — leftmost X of the first token on the line
:text — joined text, tokens separated by single spaces
:tokens — ordered list of token/0 maps, sorted by X ascending

Each token carries its own :x so callers can detect column layouts (e.g. table rows where every line has tokens at the same X positions).

Spec references

PDF 1.7 § 9.4 — Text objects: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
PDF 1.7 § 9.4.4 — Text-showing operators (Tj, TJ, ', ")

Summary

Types

t()

token()

token_kind()

Types

t()

@type t() :: %Pdf.Reader.Line{
  page: pos_integer(),
  text: String.t(),
  tokens: [token()],
  x: float(),
  y: float()
}

token()

@type token() :: %{
  :x => float(),
  :text => String.t(),
  :width => float(),
  optional(:kind) => token_kind(),
  optional(:shape) => Pdf.Reader.Shape.t() | nil
}

token_kind()

@type token_kind() ::
  :text | :link | :email | :button | :form_field | :table_cell | atom()