View Source ChromicPDF (ChromicPDF v1.7.0)

ChromicPDF is a fast HTML-to-PDF/A renderer based on Chrome & Ghostscript.

usage

Usage

start

Start

Start ChromicPDF as part of your supervision tree:

def MyApp.Application do
  def start(_type, _args) do
    children = [
      # other apps...
      {ChromicPDF, chromic_pdf_opts()}
    ]

    Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
  end

  defp chromic_pdf_opts do
    []
  end
end

print-a-pdf

Print a PDF

ChromicPDF.print_to_pdf({:file, "example.html"}, output: "output.pdf")

This tells Chrome to open the example.html file from your current directory and save the rendered page as output.pdf. PDF printing comes with a ton of options. Please see ChromicPDF.print_to_pdf/2 for details.

print-a-pdf-a

Print a PDF/A

ChromicPDF.print_to_pdfa({:file, "example.html"}, output: "output.pdf")

This prints the same PDF with Chrome and afterwards passes it to Ghostscript to convert it to a PDF/A. Please see ChromicPDF.print_to_pdfa/2 or ChromicPDF.convert_to_pdfa/2 for details.

security-considerations

Security Considerations

By default, ChromicPDF will allow Chrome to make use of its own "sandbox" process jail. The sandbox tries to limit system resource access of the renderer processes to the minimum resources they require to perform their task. It is designed to make displaying HTML pages relatively safe, in terms of preventing undesired access of a page to the host operating system.

Nevertheless, running a browser as part of your application, especially when used to process user-supplied content, significantly increases your attack surface. Hence, before adding ChromicPDF to your application's (perhaps already long) list of dependencies, you may want to consider the security hints below.

architectural-isolation

Architectural isolation

A great, if not the best option to mitigate security risks due to the use of ChromicPDF / a Browser in your stack, is to turn your "document renderer" component into a containerized service with a small RPC interface. This will create a nice barrier between Chrome and the rest of your application, so that even if an attacker manages to escape Chrome's sandbox, they will still be jailed within the container. It also has other benefits like better control of resources, e.g. how much CPU you want to dedicate to PDF rendering.

escape-user-supplied-data

Escape user-supplied data

Make sure to always escape user-provided data with something like Phoenix.HTML.html_escape. This should prevent an attacker from injecting malicious scripts into your template.

disabling-scripts

Disabling scripts

If your template allows, you can disable JavaScript execution altogether (using the DevTools command Emulation.setScriptExecutionDisabled) with the :disable_scripts option:

def chromic_pdf_opts do
  [disable_scripts: true]
end

Note that this doesn't prevent other features like the evaluate option from working, it solely applies to scripts being supplied by the rendered page itself.

running-in-offline-mode

Running in offline mode

To prevent your templates from accessing any remote hosts, the browser targets can be spawned in "offline mode" (using the DevTools command Network.emulateNetworkConditions). Chrome targets with network conditions set to offline can't resolve any external URLs (e.g. https://), neither entered as navigation URL nor contained within the HTML body.

def chromic_pdf_opts do
  [offline: true]
end

chrome-sandbox-in-docker-containers

Chrome Sandbox in Docker containers

In Docker containers running Linux images (e.g. images based on Alpine), and which are configured to run their main job as a non-root user, the sandbox may cause Chrome to crash on startup as it requires root privileges.

The error output (discard_stderr: false option) looks as follows:

Failed to move to new namespace: PID namespaces supported, Network namespace supported,
but failed: errno = Operation not permitted

The best way to resolve this issue is to configure your Docker container to use seccomp rules that grant Chrome access to the relevant system calls. See the excellent Zenika/alpine-chrome repository for details on how to make this work.

Alternatively, you may choose to disable Chrome's sandbox with the no_sandbox option.

defp chromic_pdf_opts do
  [no_sandbox: true]
end

ssl-connections

SSL connections

In you are fetching your print source from a https:// URL, as usual Chrome verifies the remote host's SSL certificate when establishing the secure connection, and errors out of navigation if the certificate has expired or is not signed by a known certificate authority (i.e. no self-signed certificates).

For production systems, this security check is essential and should not be circumvented. However, if for some reason you need to bypass certificate verification in development or test, you can do this with the :ignore_certificate_errors option.

defp chromic_pdf_opts do
  [ignore_certificate_errors: true]
end

worker-pools

Worker pools

ChromicPDF spawns two worker pools, the session pool and the ghostscript pool. By default, it will create as many sessions (browser tabs) as schedulers are online, and allow the same number of concurrent Ghostscript processes to run.

concurrency

Concurrency

To increase or limit the number of concurrent workers, you can pass pool configuration to the supervisor. Please note that these are non-queueing worker pools. If you intend to max them out, you will need a job queue as well.

defp chromic_pdf_opts do
  [
    session_pool: [size: 3]
    ghostscript_pool: [size: 10]
  ]
end

operation-timeouts

Operation timeouts

By default, ChromicPDF allows the print process to take 5 seconds to finish. In case you are printing large PDFs and run into timeouts, these can be configured configured by passing the timeout option to the session pool.

defp chromic_pdf_opts do
  [
    session_pool: [timeout: 10_000]   # in milliseconds
  ]
end

In addition, there is the init_timeout option, which controls the timeout when the session pool initializes (defaults also to 5 seconds).

defp chromic_pdf_opts do
  [
    session_pool: [init_timeout: 10_000]   # in milliseconds
  ]
end

automatic-session-restarts-to-avoid-memory-drain

Automatic session restarts to avoid memory drain

By default, ChromicPDF will restart sessions within the Chrome process after 1000 operations. This helps to prevent infinite growth in Chrome's memory consumption. The "max age" of a session can be configured with the :max_session_uses option.

defp chromic_pdf_opts do
  [max_session_uses: 1000]
end

chrome-zombies

Chrome zombies

Help, a Chrome army tries to take over my memory!

ChromicPDF tries its best to gracefully close the external Chrome process when its supervisor is terminated. Unfortunately, when the BEAM is not shutdown gracefully, Chrome processes will keep running. While in a containerized production environment this is unlikely to be of concern, in development it can lead to unpleasant performance degradation of your operation system.

In particular, the BEAM is not shutdown properly…

  • when you exit your application or iex console with the Ctrl+C abort mechanism (see issue #56),
  • and when you run your tests. No, after an ExUnit run your application's supervisor is not terminated cleanly.

There are a few ways to mitigate this issue.

on-demand-mode

"On Demand" mode

In case you habitually end your development server with Ctrl+C, you should consider enabling "On Demand" mode which disables the session pool, and instead starts and stops Chrome instances as needed. If multiple PDF operations are requested simultaneously, multiple Chrome processes will be launched (each with a pool size of 1, disregarding the pool configuration).

defp chromic_pdf_opts do
  [on_demand: true]
end

To enable it only for development, you can load the option from the application environment.

# config/config.exs
config :my_app, ChromicPDF, on_demand: false

# config/dev.exs
config :my_app, ChromicPDF, on_demand: true

# application.ex
@chromic_pdf_opts Application.compile_env!(:my_app, ChromicPDF)
defp chromic_pdf_opts do
  @chromic_pdf_opts ++ [... other opts ...]
end

terminating-your-supervisor-after-your-test-suite

Terminating your supervisor after your test suite

You can enable "On Demand" mode for your tests, as well. However, please be aware that each test that prints a PDF will have an increased runtime (plus about 0.5s) due to the added Chrome boot time cost. Luckily, ExUnit provides a method to run code at the end of your test suite.

# test/test_helper.exs
ExUnit.after_suite(fn _ -> Supervisor.stop(MyApp.Supervisor) end)
ExUnit.start()

only-start-chromicpdf-in-production

Only start ChromicPDF in production

The easiest way to prevent Chrome from spawning in development is to only run ChromicPDF in the prod environment. However, obviously you won't be able to print PDFs in development or test then.

chrome-options

Chrome Options

custom-command-line-switches

Custom command line switches

The :chrome_args option allows to pass arbitrary options to the Chrome/Chromium executable.

defp chromic_pdf_opts do
  [chrome_args: "--font-render-hinting=none"]
end

The :chrome_executable option allows to specify a custom Chrome/Chromium executable.

defp chromic_pdf_opts do
  [chrome_executable: "/usr/bin/google-chrome-beta"]
end

debugging-chrome-errors

Debugging Chrome errors

Chrome's stderr logging is silently discarded to not obscure your logfiles. In case you would like to take a peek, add the discard_stderr: false option.

defp chromic_pdf_opts do
  [discard_stderr: false]
end

telemetry-support

Telemetry support

To provide insights into PDF and PDF/A generation performance, ChromicPDF executes the following telemetry events:

  • [:chromic_pdf, :print_to_pdf, :start | :stop | exception]

  • [:chromic_pdf, :capture_screenshot, :start | :stop | :exception]

  • [:chromic_pdf, :convert_to_pdfa, :start | :stop | exception]

Please see :telemetry.span/3 for details on their payloads, and :telemetry.attach/4 for how to attach to them.

Each of the corresponding functions accepts a telemetry_metadata option which is passed to the attached event handler. This can, for instance, be used to mark events with custom tags such as the type of the print document.

ChromicPDF.print_to_pdf(..., telemetry_metadata: %{template: "invoice"})

The print_to_pdfa/2 function emits both the print_to_pdf and convert_to_pdfa event series, in that order.

Last but not least, the print_to_pdf/2 function emits :join_pdfs events when concatenating multiple input sources.

  • [:chromic_pdf, :join_pdfs, :start | :stop | exception]

on-accessibility-pdf-ua

On Accessibility / PDF/UA

Since its version 85, Chrome generates "Tagged PDF" files by default. These files contain structural information about the document, i.e. type information about the nodes (headings, paragraph, etc.), as well as metadata like node attributes (e.g., image alt texts). This information allows assistive tools like screen readers to do their job, at the cost of (at times significantly) increasing the file size. To check whether a PDF file is tagged, you can use the pdfinfo utility, it reports these files as Tagged: yes. You can review some of the contained information with the pdfinfo -struct-text <file> command. Tagging may be disabled by passing the --disable-pdf-tagging argument to Chrome via the chrome_args option.

However, at the time of writing, Chrome's most recent beta version 109 does not generate files compliant to the PDF/UA standard (ISO 14289-1:2014). Both the "PAC 2021" accessibility checker and the VeraPDF validator (capable of validating a subset of the PDF/UA rules since version 1.18 from April 2021) report rule violations concerning mandatory metadata.

So, if your use-case requires you to generate fully PDF/UA-compliant files, at the moment Chrome

  • and by extension, ChromicPDF - is not going fulfill your needs.

Furthermore, any operation that involves running the Chrome-generated file through Ghostscript (PDF/A conversion, concatenation) will remove all structural information, so that pdfinfo reports Tagged: no, and thereby prevent assistive tools from proper functioning.

Link to this section Summary

Functions

Returns a specification to start this module as part of a supervision tree.

Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).

Retrieves the currently set name (set using put_dynamic_name/1) or the default name.

Prints a PDF and converts it to PDF/A in a single call.

Activate a particular ChromicPDF instance, which was started with the name option. After calling this function, all calls in the current process will use this instance of ChromicPDF.

Starts ChromicPDF.

Runs a one-off Chrome process to allow Chrome to initialize its caches.

Link to this section Types

@type blob() :: iodata()
Link to this type

capture_screenshot_option()

View Source
@type capture_screenshot_option() :: {:capture_screenshot, map()} | navigate_option()
Link to this type

chrome_runner_option()

View Source
@type chrome_runner_option() ::
  {:no_sandbox, boolean()}
  | {:discard_stderr, boolean()}
  | {:chrome_args, binary()}
  | {:chrome_executable, binary()}
@type evaluate_option() :: {:evaluate, %{expression: binary()}}
@type export_option() :: output_option() | telemetry_metadata_option()
@type export_return() :: :ok | {:ok, binary()} | {:ok, output_function_result()}
Link to this type

ghostscript_pool_option()

View Source
@type ghostscript_pool_option() :: {:size, non_neg_integer()}
@type global_option() ::
  {:offline, boolean()}
  | {:disable_scripts, boolean()}
  | {:max_session_uses, non_neg_integer()}
  | {:session_pool, [session_pool_option()]}
  | {:ignore_certificate_errors, boolean()}
  | {:ghostscript_pool, [ghostscript_pool_option()]}
  | {:on_demand, boolean()}
  | chrome_runner_option()
@type info_option() ::
  {:info,
   %{
     optional(:title) => binary(),
     optional(:author) => binary(),
     optional(:subject) => binary(),
     optional(:keywords) => binary(),
     optional(:creator) => binary(),
     optional(:creation_date) => binary() | DateTime.t(),
     optional(:mod_date) => binary() | DateTime.t()
   }}
@type navigate_option() ::
  {:set_cookie, map()} | evaluate_option() | wait_for_option()
@type output_function() :: (blob() -> output_function_result())
Link to this type

output_function_result()

View Source
@type output_function_result() :: any()
@type output_option() :: {:output, binary()} | {:output, output_function()}
@type path() :: binary()
@type pdf_option() :: {:print_to_pdf, map()} | navigate_option()
@type pdfa_option() ::
  {:pdfa_version, binary()}
  | {:pdfa_def_ext, binary()}
  | {:permit_read, binary()}
  | info_option()
@type session_pool_option() ::
  {:size, non_neg_integer()}
  | {:init_timeout, timeout()}
  | {:timeout, timeout()}
@type source() :: source() | source_and_options()
@type source_and_options() :: %{source: source_tuple(), opts: [pdf_option()]}
@type source_tuple() :: {:url, url()} | {:html, blob()}
Link to this type

telemetry_metadata_option()

View Source
@type telemetry_metadata_option() :: {:telemetry_metadata, map()}
@type url() :: binary()
@type wait_for_option() :: {:wait_for, %{selector: binary(), attribute: binary()}}

Link to this section Functions

Link to this function

capture_screenshot(source, opts \\ [])

View Source
@spec capture_screenshot(source(), [capture_screenshot_option() | export_option()]) ::
  export_return()

Captures a screenshot.

This call blocks until the screenshot has been created.

print-and-return-base64-encoded-png

Print and return Base64-encoded PNG

{:ok, blob} = ChromicPDF.capture_screenshot({:url, "file:///example.html"})

custom-options-for-page-capturescreenshot

Custom options for Page.captureScreenshot

Custom options for the Page.captureScreenshot call can be specified by passing a map to the :capture_screenshot option.

ChromicPDF.capture_screenshot(
  {:url, "file:///example.html"},
  capture_screenshot: %{
    format: "jpeg"
  }
)

For navigational options (source, cookies, evaluating scripts) see print_to_pdf/2.

@spec child_spec([global_option()]) :: Supervisor.child_spec()

Returns a specification to start this module as part of a supervision tree.

Link to this function

convert_to_pdfa(pdf_path, opts \\ [])

View Source
@spec convert_to_pdfa(path(), [pdfa_option()]) :: export_return()

Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).

convert-an-input-pdf-and-return-a-base64-encoded-blob

Convert an input PDF and return a Base64-encoded blob

{:ok, blob} = ChromicPDF.convert_to_pdfa("some_pdf_file.pdf")

convert-and-write-to-file

Convert and write to file

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", output: "output.pdf")

pdf-a-versions-levels

PDF/A versions & levels

Ghostscript supports both PDF/A-2 and PDF/A-3 versions, both in their b (basic) level. By default, ChromicPDF generates version PDF/A-3b files. Set the pdfa_version option for version 2.

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", pdfa_version: "2")

Generated files pass the verapdf validation. When you verify this, please pass the corresponding profile arguments (-f 2b or -f 3b).

specifying-pdf-metadata

Specifying PDF metadata

The converter is able to transfer PDF metadata (the Info dictionary) from the original PDF file to the output file. However, files printed by Chrome do not contain any metadata information (except "Creator" being "Chrome").

The :info option of the PDF/A converter allows to specify metadata for the output file directly.

ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", info: %{creator: "ChromicPDF"})

The converter understands the following keys, all of which accept String values:

  • :title
  • :author
  • :subject
  • :keywords
  • :creator
  • :creation_date
  • :mod_date

By specification, date values in :creation_date and :mod_date do not need to follow a specific syntax. However, Ghostscript inserts date strings like "D:20200208153049+00'00'" and Info extractor tools might rely on this or another specific format. The converter will automatically format given DateTime values like this.

Both :creation_date and :mod_date are filled with the current date automatically (by Ghostscript), if the original file did not contain any.

adding-more-postscript-to-the-conversion

Adding more PostScript to the conversion

The pdfa_def_ext option can be used to feed more PostScript code into the final conversion step.

ChromicPDF.convert_to_pdfa(
  "some_pdf_file.pdf",
  pdfa_def_ext: "[/Title (OverriddenTitle) /DOCINFO pdfmark",
)

If your extra Postscript requires read permissions for additional files, pass the :permit_read option.

ChromicPDF.convert_to_pdfa(
  "some_pdf_file.pdf",
  pdfa_def_ext: "custom-postscript",
  permit_read: "/some/path",
  permit_read: "/some/other/path"
)

embedded-color-scheme

Embedded color scheme

Since it is required to embed a color scheme into PDF/A files, ChromicPDF ships with a copy of the royalty-free eciRGB_V2 scheme by the European Color Initiative. If you need to to use a different color scheme, please open an issue.

accessibility

Accessibility

Please note that running a PDF file through Ghostscript removes all structural annotations ("Tags") and hence disables accessibility features of assistive technologies. See On Accessibility / PDF/UA section for details.

@spec get_dynamic_name() :: atom()

Retrieves the currently set name (set using put_dynamic_name/1) or the default name.

@spec put_dynamic_name(atom()) :: atom()

Activate a particular ChromicPDF instance, which was started with the name option. After calling this function, all calls in the current process will use this instance of ChromicPDF.

You can use this function if you need to run ChromicPDF as part of a supervision tree with a particular name, for example:

defmodule MySupervisor do
  use Supervisor

  @impl true
  def init(opts) do
    children = [
      # other apps...
      {ChromicPDF, name: MyName}
    ]

    Supervisor.init(children, strategy: :one_for_one, name: MyApp.Supervisor)
  end
end

Returns the previously set name or the default name.

Link to this function

start_link(config \\ [])

View Source
@spec start_link([global_option()]) :: Supervisor.on_start() | Agent.on_start()

Starts ChromicPDF.

If the given config includes the on_demand: true flag, this will instead spawn an Agent process that holds this configuration until a PDF operation is triggered which will then launch a supervisor temporarily, process the operation, and proceed to perform a graceful shutdown.

@spec warm_up([chrome_runner_option()]) :: {:ok, binary()}

Runs a one-off Chrome process to allow Chrome to initialize its caches.

On some infrastructure (notably, Github Actions), Chrome occasionally takes a long nap between process launch and first replying to DevTools commands. If meanwhile you happen to print a PDF (so, before any sessions have been spawned by the session pool), the session checkout will fail with a timeout error:

Caught EXIT signal from NimblePool.checkout!/4

      ** (EXIT) time out

This function mitigates the issue by launching a Chrome process via a shell command, bypassing ChromicPDF's internals.

usage

Usage

# in your test_helper.exs
{:ok, _} = ChromicPDF.warm_up()
...
ExUnit.start()

options

Options

This function accepts all options of print_to_pdf/2 related to external Chrome process.

If you pass discard_stderr: false, Chrome's standard error is returned.

{:ok, stderr} = ChromicPDF.warm_up(discard_stderr: false)
IO.inspect(stderr, label: "chrome stderr")

mix-task

Mix Task

Alternatively, you can choose to run a mix task as part of your CI script, see Mix.Tasks.ChromicPdf.WarmUp. The task currently does not accept any options.

...
$ mix chromic_pdf.warm_up
$ mix test