ChromicPDF v0.5.2 ChromicPDF View Source

ChromicPDF is a fast HTML-to-PDF/A renderer based on Chrome & Ghostscript.

Usage

Start

Start ChromicPDF as part of your supervision tree:

def MyApp.Application do
  def start(_type, _args) do
    children = [
      # other apps...
      {ChromicPDF, chromic_pdf_opts()}
    ]

    Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
  end

  defp chromic_pdf_opts do
    []
  end
end

Print a PDF or PDF/A

ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: "output.pdf")

See ChromicPDF.print_to_pdf/2 and ChromicPDF.convert_to_pdfa/2.

Options

Worker pools

ChromicPDF spawns two worker pools, the session pool and the ghostscript pool. By default, it will create 5 workers with no overflow. To change these options, you can pass configuration to the supervisor. Please note that these are only worker pools. If you intend to max them out, you will need a job queue as well.

Please see https://github.com/devinus/poolboy for available options.

defp chromic_pdf_opts do
  [
    session_pool: [
      size: 3,
      max_overflow: 0
    ],
    ghostscript_pool: [
      size: 10,
      max_overflow: 2
    ]
  ]
end

Automatic session restarts to avoid memory drain

By default, ChromicPDF will restart sessions within the Chrome process after 1000 operations. This helps to prevent infinite growth in Chrome's overall memory consumption. This "max age" of a session can be configured by setting the :max_session_uses option.

defp chromic_pdf_opts do
  [max_session_uses: 1000]
end

Security Considerations

Before adding a browser to your application's (perhaps already long) list of dependencies, you may want consider the security hints below.

Escape user-supplied data

If you can, make sure to escape any data provided by users with something like Phoenix.HTML.escape_html. Chrome is designed to make displaying HTML pages relatively safe, in terms of preventing undesired access of a page to the host operating system. However, the attack surface of your application is still increased. Running this in a contained application with a small HTTP interface creates an additional barrier (and has other benefits).

Running in online mode

Browser targets will be spawned in "offline mode" by default (using the DevTools command Network.emulateNetworkConditions. Users are required to take this extra step (basically reading this paragraph) to re-consider whether remote printing is a requirement.

However, there are a lot of valid use-cases for printing from a URL, particularly from a webserver on localhost. To switch to "online mode", pass the offline: false parameter.

def chromic_pdf_opts do
  [offline: false]
end

Chrome Sandbox

By default, ChromicPDF will run Chrome targets in a sandboxed OS process. If you absolutely must run Chrome as root, you can turn of its sandbox by passing the no_sandbox: true option.

defp chromic_pdf_opts do
  [no_sandbox: true]
end

How it works

PDF Printing

  • ChromicPDF spawns an instance of Chromium/Chrome (an OS process) and connects to its "DevTools" channel via file descriptors.
  • The Chrome process is supervised and the connected processes will automatically recover if it crashes.
  • A number of "targets" in Chrome are spawned, 1 per worker process in the SessionPool. By default, ChromicPDF will spawn each session in a new browser context (i.e., a profile).
  • When a PDF print is requested, a session will instruct its assigned "target" to navigate to the given URL, then wait until it receives a "frameStoppedLoading" event, and proceed to call the printToPDF function.
  • The printed PDF will be sent to the session as Base64 encoded chunks.

Link to this section Summary

Functions

Captures a screenshot.

Returns a specification to start this module under a supervisor.

Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).

Prints a PDF and converts it to PDF/A in a single call.

Link to this section Functions

Link to this function

capture_screenshot(input, opts \\ [])

View Source
capture_screenshot(url :: ChromicPDF.Processor.source(), opts :: keyword()) ::
  ChromicPDF.Processor.return()

Captures a screenshot.

This call blocks until the screenshot has been created.

Print and return Base64-encoded PNG

{:ok, blob} = ChromicPDF.capture_screenshot({:url, "file:///example.html"})

Options

Options can be passed by passing a map to the :capture_screenshot key.

ChromicPDF.capture_screenshot(
  {:url, "file:///example.html"},
  capture_screenshot: %{
    format: "jpeg"
  }
)

Please see docs for details:

https://chromedevtools.github.io/devtools-protocol/tot/Page#method-captureScreenshot

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

convert_to_pdfa(pdf_path, opts \\ [])

View Source
convert_to_pdfa(
  pdf_path :: ChromicPDF.Processor.path(),
  opts :: [ChromicPDF.Processor.pdfa_option()]
) :: ChromicPDF.Processor.return()

Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).

Convert an input PDF and return a Base64-encoded blob

{:ok, blob} = ChromicPDF.convert_to_pdfa("some_pdf_file.pdf")

Convert and write to file

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", output: "output.pdf")

PDF/A versions & levels

Ghostscript supports both PDF/A-2 and PDF/A-3 versions, both in their b (basic) level. By default, ChromicPDF generates version PDF/A-3b files. Set the pdfa_version option for version 2.

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", pdfa_version: "2")

Specifying PDF metadata

The converter is able to transfer PDF metadata (the Info dictionary) from the original PDF file to the output file. However, files printed by Chrome do not contain any metadata information (except "Creator" being "Chrome").

The :info option of the PDF/A converter allows to specify metatadata for the output file directly.

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", info: %{creator: "ChromicPDF"})

The converter understands the following keys, all of which accept only String values:

  • :title
  • :author
  • :subject
  • :keywords
  • :creator
  • :creation_date
  • :mod_date

By specification, date values in :creation_date and :mod_date do not need to follow a specific syntax. However, Ghostscript inserts date strings like "D:20200208153049+00'00'" and Info extractor tools might rely on this or another specific format. The converter will automatically format given DateTime values like this.

Both :creation_date and :mod_date are filled with the current date automatically (by Ghostscript), if the original file did not contain any.

Adding more PostScript to the conversion

The pdfa_def_ext option can be used to feed more PostScript code into the final conversion step. This can be useful to add additional features to the generated PDF-A file, for instance a ZUGFeRD invoice.

ChromicPDF.convert_to_pdfa(
  "some_pdf_file.pdf",
  pdfa_def_ext: "[/Title (OverriddenTitle) /DOCINFO pdfmark",
)
Link to this function

start_link(config \\ [])

View Source