View Source Pdf.Reader.Line (ExPDF v1.0.1)

Logical text line reconstructed from individual TextRuns.

Many PDFs (particularly machine-generated ones such as government forms and tax documents) place glyphs individually with the TJ operator and per-glyph kerning, producing one TextRun per character. Working with that flat run list is awkward — Line coalesces those runs into the structure a human reader sees: lines and, within each line, tokens separated by visible whitespace.

Shape

  • :page — 1-indexed page number
  • :y — baseline Y of the line (PDF user-space, origin bottom-left)
  • :x — leftmost X of the first token on the line
  • :text — joined text, tokens separated by single spaces
  • :tokens — ordered list of token/0 maps, sorted by X ascending

Each token carries its own :x so callers can detect column layouts (e.g. table rows where every line has tokens at the same X positions).

Spec references

Summary

Types

@type t() :: %Pdf.Reader.Line{
  page: pos_integer(),
  text: String.t(),
  tokens: [token()],
  x: float(),
  y: float()
}
@type token() :: %{
  :x => float(),
  :text => String.t(),
  :width => float(),
  optional(:kind) => token_kind(),
  optional(:shape) => Pdf.Reader.Shape.t() | nil
}
@type token_kind() ::
  :text | :link | :email | :button | :form_field | :table_cell | atom()