View Source PDFInfo (PDFInfo v0.1.17)

Extracts all /Info and /Metadata objects from a PDF binary using Regex and with zero dependencies.

Limitations: If the PDF is encrypted or the metadata is compressed you have to first decrypt and uncompress:

qpdf --stream-data=uncompress --compress-streams=n --decrypt --password='' myfile.pdf myfile_out.pdf

Summary

Functions

Returns a list of /Encrypt reference strings.

Maps /Info reference strings to objects and parses the objects.

Returns a list of /Info reference strings.

Returns true if PDF has at least one /Encrypt reference. Returns false if PDF has no /Encrypt reference.

Checks if the binary starts with the PDF header.

Maps /Metadata reference strings to objects and parses the objects.

Returns a list of /Metadata reference strings.

Extracts PDF version from the PDF header.

Maps the /Info reference strings to the raw objects.

Maps the /Metadata reference strings to the raw objects.

Functions

@spec encrypt_refs(binary()) :: list()

Returns a list of /Encrypt reference strings.

Examples

iex> PDFInfo.encrypt_refs(binary)
["/Encrypt 52 0 R"]
@spec info_objects(binary()) :: map()

Maps /Info reference strings to objects and parses the objects.

Examples

iex> PDFInfo.info_objects(binary)
%{
  "/Info 1 0 R" => [
      %{
          "Author" => "The PostgreSQL Global Development Group",
          "CreationDate" => "D:20200212212756Z",
          ...
      }
  ]
}
@spec info_refs(binary()) :: list()

Returns a list of /Info reference strings.

Examples

iex> PDFInfo.info_refs(binary)
["/Info 1 0 R"]
@spec is_encrypted?(binary()) :: boolean()

Returns true if PDF has at least one /Encrypt reference. Returns false if PDF has no /Encrypt reference.

@spec is_pdf?(binary()) :: boolean()

Checks if the binary starts with the PDF header.

The PDF header can be anywhere in the first 1024 bytes.

Returns true if the binary starts with the PDF header. Returns false otherwise.

Link to this function

metadata_objects(binary)

View Source
@spec metadata_objects(binary()) :: [map()]

Maps /Metadata reference strings to objects and parses the objects.

Examples

iex> PDFInfo.metadata_objects(binary)
[
  %{
    {"dc", "format"} => "application/pdf",
    {"pdf", "Producer"} => "Adobe PDF Library 15.0",
    {"xmp", "CreateDate"} => "2018-06-06T17:02:53+02:00",
    {"xmp", "CreatorTool"} => "Acrobat PDFMaker 17 für Word",
    {"xmp", "MetadataDate"} => "2018-06-06T17:03:13+02:00",
    {"xmp", "ModifyDate"} => "2018-06-06T17:03:13+02:00",
    ...
  }
]
@spec metadata_refs(binary()) :: list()

Returns a list of /Metadata reference strings.

Examples

iex> PDFInfo.metadata_refs(binary)
["/Metadata 5 0 R"]
@spec pdf_version(binary()) :: {:ok, binary()} | :error

Extracts PDF version from the PDF header.

The PDF header can be anywhere in the first 1024 bytes.

Returns {:ok, version} if the PDF header is correct. Returns :error if the PDF header is incorrect.

Examples

iex> PDFInfo.pdf_version(binary)
{:ok, "1.5"}
iex> PDFInfo.pdf_version("not a pdf")
:error
Link to this function

raw_info_objects(binary)

View Source
@spec raw_info_objects(binary()) :: map()

Maps the /Info reference strings to the raw objects.

Examples

iex> PDFInfo.raw_info_objects(binary)
%{"/Info 1 0 R" => ["1 0 obj <<..."]}
Link to this function

raw_metadata_objects(binary)

View Source
@spec raw_metadata_objects(binary()) :: list()

Maps the /Metadata reference strings to the raw objects.

Examples

iex> PDFInfo.raw_metadata_objects(binary)
["<x:xmpmeta" <> ...]