Popplex

An Elixir NIF (Native Implemented Function) wrapper for the Poppler PDF library, providing fast and efficient PDF processing capabilities.

View Changelog | View Contributing Guidelines

Features

Get page count - Quickly determine the number of pages in a PDF
Extract text - Extract text content from entire documents or specific pages
Combine PDFs - Merge multiple PDF files into one

Prerequisites

Before using Popplex, you need to have Poppler installed on your system:

macOS

brew install poppler pkg-config

Ubuntu/Debian

sudo apt-get install libpoppler-cpp-dev pkg-config

Fedora/RHEL

sudo dnf install poppler-cpp-devel pkgconfig

Arch Linux

sudo pacman -S poppler pkgconf

Installation

Add popplex to your list of dependencies in mix.exs:

def deps do
  [
    {:popplex, "~> 0.1.0"}
  ]
end

Then run:

mix deps.get
mix compile

The NIF will be automatically compiled during the build process.

Usage

Get Page Count

# Get the number of pages in a PDF
{:ok, count} = Popplex.get_page_count("document.pdf")
IO.puts("The PDF has #{count} pages")

Extract Text

# Extract text from all pages
{:ok, text} = Popplex.get_text("document.pdf")

# Extract text from a specific page (0-indexed)
{:ok, first_page} = Popplex.get_text("document.pdf", page: 0)
{:ok, second_page} = Popplex.get_text("document.pdf", page: 1)

# Explicitly extract all pages
{:ok, all_text} = Popplex.get_text("document.pdf", all: true)

Combine PDFs

# Merge multiple PDFs into one
{:ok, output} = Popplex.combine_pdfs(
  ["file1.pdf", "file2.pdf", "file3.pdf"],
  "combined.pdf"
)

# Verify the combined PDF
{:ok, count} = Popplex.get_page_count("combined.pdf")
IO.puts("Combined PDF has #{count} pages")

Error Handling

All functions return {:ok, result} on success or {:error, reason} on failure:

case Popplex.get_page_count("document.pdf") do
  {:ok, count} ->
    IO.puts("Success! Page count: #{count}")
    
  {:error, reason} ->
    IO.puts("Error: #{reason}")
end

Common error scenarios:

File doesn't exist: "Failed to open PDF document"
PDF is password protected: "PDF document is locked"
Invalid page number: "Page number out of range"

Development

Building from Source

# Clone the repository
git clone https://github.com/yourusername/popplex.git
cd popplex

# Get dependencies
mix deps.get

# Compile (including the NIF)
mix compile

# Run tests
mix test

# Run integration tests (requires sample PDF files)
mix test --include integration

Testing

Unit tests can be run without any PDF files:

mix test --exclude integration

For integration tests, place sample PDF files in test/fixtures/ and run:

mix test --include integration

Continuous Integration

The project uses GitHub Actions for CI, which:

Tests against multiple Elixir/OTP version combinations
Runs both unit and integration tests
Performs static analysis and code formatting checks
Automatically installs Poppler and dependencies

The CI workflow runs on:

Every push to main/master branch
Every pull request

You can view the CI status in the badge at the top of this README.

How It Works

Popplex uses Erlang's NIF (Native Implemented Function) interface to call C++ code that wraps the Poppler library. This provides:

Performance: Near-native speed for PDF operations
Direct library access: Full access to Poppler's capabilities
Memory efficiency: Minimal copying between Erlang and C++

The architecture consists of:

C++ NIF layer (c_src/popplex_nif.cpp) - Interfaces with Poppler
NIF loader (lib/popplex/nif.ex) - Loads the compiled NIF
Public API (lib/popplex.ex) - User-friendly Elixir interface

Limitations

Password-protected PDFs are not currently supported for text extraction
Some PDF features (forms, annotations, etc.) are not exposed in the API
PDF combining uses the pdfunite command-line tool rather than a NIF (spawns external process)

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

License

This project is available under the MIT License.

Acknowledgments

Built on top of the Poppler PDF rendering library
Uses elixir_make for NIF compilation

Next Page → Contributing to Popplex