PDFInfo v0.1.14 PDFInfo View Source

Extracts all /Info and /Metadata objects from a PDF binary using Regex and without any dependencies.

Limitations: If the PDF is encrypted or the metadata is compressed you have to first decrypt and uncompress:

qpdf --stream-data=uncompress --compress-streams=n --decrypt --password='' myfile.pdf myfile_out.pdf

Link to this section Summary

Functions

Returns a list of /Encrypt reference strings.

Maps /Info reference strings to objects and parses the objects.

Returns a list of /Info reference strings.

Returns true if PDF has at least one /Encrypt reference. Returns false if PDF has no /Encrypt reference.

Checks if the binary starts with the PDF header.

Maps /Metadata reference strings to objects and parses the objects.

Returns a list of /Metadata reference strings.

Extracts PDF version from the PDF header.

Maps the /Info reference strings to the raw objects.

Maps the /Metadata reference strings to the raw objects.

Link to this section Functions

Specs

encrypt_refs(binary()) :: list()

Returns a list of /Encrypt reference strings.

Examples

iex> PDFInfo.encrypt_refs(binary)
["/Encrypt 52 0 R"]

Specs

info_objects(binary()) :: map()

Maps /Info reference strings to objects and parses the objects.

Examples

iex> PDFInfo.info_objects(binary)
%{
  "/Info 1 0 R" => [
      %{
          "Author" => "The PostgreSQL Global Development Group",
          "CreationDate" => "D:20200212212756Z",
          ...
      }
  ]
}

Specs

info_refs(binary()) :: list()

Returns a list of /Info reference strings.

Examples

iex> PDFInfo.info_refs(binary)
["/Info 1 0 R"]

Specs

is_encrypted?(binary()) :: boolean()

Returns true if PDF has at least one /Encrypt reference. Returns false if PDF has no /Encrypt reference.

Specs

is_pdf?(binary()) :: boolean()

Checks if the binary starts with the PDF header.

The PDF header can be anywhere in the first 1024 bytes.

Returns true if the binary starts with the PDF header. Returns false otherwise.

Link to this function

metadata_objects(binary)

View Source

Specs

metadata_objects(binary()) :: list()

Maps /Metadata reference strings to objects and parses the objects.

Examples

iex> PDFInfo.metadata_objects(binary)
%{
    "/Metadata 285 0 R" => [
      %{
        {"dc", "format"} => "application/pdf",
        {"pdf", "Producer"} => "Adobe PDF Library 15.0",
        {"xmp", "CreateDate"} => "2018-06-06T17:02:53+02:00",
        {"xmp", "CreatorTool"} => "Acrobat PDFMaker 17 für Word",
        {"xmp", "MetadataDate"} => "2018-06-06T17:03:13+02:00",
        {"xmp", "ModifyDate"} => "2018-06-06T17:03:13+02:00",
        ...
      }
    ]
}

Specs

metadata_refs(binary()) :: list()

Returns a list of /Metadata reference strings.

Examples

iex> PDFInfo.metadata_refs(binary)
["/Metadata 5 0 R"]

Specs

pdf_version(binary()) :: {:ok, binary()} | :error

Extracts PDF version from the PDF header.

The PDF header can be anywhere in the first 1024 bytes.

Returns {:ok, version} if the PDF header is correct. Returns :error if the PDF header is incorrect.

Examples

iex> PDFInfo.pdf_version(binary)
{:ok, "1.5"}
iex> PDFInfo.pdf_version("not a pdf")
:error
Link to this function

raw_info_objects(binary)

View Source

Specs

raw_info_objects(binary()) :: map()

Maps the /Info reference strings to the raw objects.

Examples

iex> PDFInfo.raw_info_objects(binary)
%{"/Info 1 0 R" => ["1 0 obj <<..."]}
Link to this function

raw_metadata_objects(binary)

View Source

Specs

raw_metadata_objects(binary()) :: list()

Maps the /Metadata reference strings to the raw objects.

Examples

iex> PDFInfo.raw_metadata_objects(binary)
["<x:xmpmeta" <> ...]