PhoenixKit.Modules.Storage.PdfProcessor (phoenix_kit v1.7.71)

Copy Markdown View Source

Poppler-based PDF processing module.

Handles PDF operations using poppler-utils command-line tools:

  • pdftoppm - Convert PDF pages to images (JPEG)
  • pdfinfo - Extract PDF metadata (page count, author, title)

Dependencies

Requires poppler-utils to be installed:

  • Debian/Ubuntu: apt-get install poppler-utils
  • macOS: brew install poppler

Summary

Functions

Extract metadata from a PDF file using pdfinfo.

Convert the first page of a PDF to a JPEG image.

Functions

extract_metadata(pdf_path)

Extract metadata from a PDF file using pdfinfo.

Parameters

  • pdf_path - Path to the PDF file

Returns

  • {:ok, metadata} - Map with extracted metadata
  • {:ok, %{}} - Empty map on failure (graceful degradation)

first_page_to_jpeg(pdf_path, output_prefix, opts \\ [])

Convert the first page of a PDF to a JPEG image.

Uses pdftoppm to render the first page at the specified DPI.

Parameters

  • pdf_path - Path to the input PDF file
  • output_prefix - Prefix for the output JPEG file (e.g., "/tmp/my_pdf")
  • opts - Options
    • :dpi - Resolution in DPI (default: 150)

Returns

  • {:ok, jpeg_path} - Path to the generated JPEG file
  • {:error, reason} - If conversion fails