packkit

Package Version Downloads Hex Docs CI License

Archive, compression, and container workflows for Gleam — pure Gleam, zero runtime dependencies, runs on both the Erlang and JavaScript targets. Full API reference at https://hexdocs.pm/packkit/.

packkit keeps three concepts separate, so each is testable in isolation and reusable in any combination:

zip and 7z stay in the archive family. They are not modelled as recipes just because they may compress members internally — see ZIP per-entry methods for that knob.

Every example below is checked by test/packkit/readme_examples_test.gleam, so if it appears here it compiles and round-trips.

Install

gleam add packkit

Quick start: pack and unpack a tar.gz

The shortest end-to-end path. Build a logical archive, hand it to packkit.pack with a recipe, get bytes back. packkit.unpack reverses the recipe — gunzip, then tar-decode — and returns the same logical archive.

import packkit
import packkit/archive
import packkit/recipe
import packkit/tar

pub fn build_and_read_tar_gz() -> Int {
  let archive_value =
    tar.new()
    |> tar.add_file(path: "hello.txt", body: <<"hello":utf8>>)
    |> tar.add_file(path: "world.txt", body: <<"world":utf8>>)

  let assert Ok(bytes) =
    packkit.pack(archive_value: archive_value, using: recipe.tar_gzip())

  let assert Ok(decoded) =
    packkit.unpack(bytes: bytes, using: recipe.tar_gzip())

  archive.entry_count(decoded)
  // -> 2
}

Compressing and decompressing a single byte stream

For raw byte-to-byte work, skip the archive layer and call the codec facade directly. The codec value carries its level and optional preset dictionary — unsupported combinations surface as CodecOptionUnsupported, never as a silent drop.

import packkit
import packkit/codec

pub fn gzip_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(compressed) =
    packkit.compress(bytes: payload, with: codec.gzip())
  let assert Ok(restored) =
    packkit.decompress(bytes: compressed, with: codec.gzip())
  restored
}

The same call shape works for every supported codec. Pick the one that matches the input or the producer:

import packkit
import packkit/codec

pub fn zstd_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.zstd())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.zstd())
  plain
}

pub fn bzip2_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.bzip2())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.bzip2())
  plain
}

pub fn brotli_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.brotli())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.brotli())
  plain
}

codec.identity() is a no-op codec — useful when a recipe needs to be parameterised over “compress or not” without branching at the call site.

Building archives

tar, cpio, ar, zip, and 7z share one logical Archive value. The format-specific module (packkit/tar, packkit/zip, …) exposes a new/0 constructor; from there, archive.add_file / add_directory / add_symlink / add_hardlink work identically across formats. Format-side limitations (e.g. ar only carries flat files) surface at encode time as a typed ArchiveError.

Tar with directories, symlinks, and metadata

import packkit
import packkit/archive
import packkit/entry
import packkit/tar

pub fn build_tar_with_metadata() -> BitArray {
  let archive_value =
    tar.new()
    |> tar.add_directory(path: "etc")
    |> tar.add_file(path: "etc/motd", body: <<"welcome":utf8>>)
    |> tar.add_symlink(path: "etc/banner", target: "motd")
    |> archive.add(
      entry: entry.file(path: "bin/run", body: <<"#!/bin/sh\n":utf8>>)
        |> entry.with_mode(mode: 0o755)
        |> entry.with_owner(user_id: 1000, group_id: 1000)
        |> entry.with_modified_at(unix_seconds: 1_700_000_000),
    )

  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: tar.format())
  bytes
}

entry.with_mode / with_owner / with_modified_at mutate an opaque Entry value. The checked variants (with_mode_checked, with_owner_checked, with_modified_at_checked) return Result(_, MetadataError) instead of panicking when the value is out of range; reach for them in code that touches user input.

Path validation

Entry paths are validated up-front. Absolute paths, .. traversal, embedded NUL, Windows separators, empty / . segments all surface as typed EntryError variants — there’s no way to construct an Entry value that would silently extract outside its archive root.

import packkit/entry
import packkit/tar

pub fn rejects_traversal() -> Result(_, entry.EntryError) {
  tar.add_file_checked(
    archive: tar.new(),
    path: "../etc/passwd",
    body: <<"x":utf8>>,
  )
  // -> Error(entry.PathTraversal("../etc/passwd"))
}

CPIO, ar, 7z

The same archive.add_* helpers work for every format. Use packkit.write to serialise.

import packkit
import packkit/archive
import packkit/cpio
import packkit/ar
import packkit/seven_z

pub fn build_cpio() -> BitArray {
  let archive_value =
    cpio.new()
    |> archive.add_file(path: "lib/libfoo.so", body: <<"…":utf8>>)
    |> archive.add_file(path: "lib/libbar.so", body: <<"…":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: cpio.format())
  bytes
}

pub fn build_ar() -> BitArray {
  let archive_value =
    ar.new()
    |> archive.add_file(path: "main.o", body: <<"obj":utf8>>)
    |> archive.add_file(path: "debian-binary", body: <<"2.0\n":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: ar.format())
  bytes
}

pub fn build_seven_z() -> BitArray {
  let archive_value =
    seven_z.new()
    |> archive.add_file(path: "doc/spec.txt", body: <<"hello 7z":utf8>>)
    |> archive.add_file(path: "doc/notes.txt", body: <<"more":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: seven_z.format())
  bytes
}

Recipe composition

A Recipe is one archive plus zero or more outer codecs in outer-to-inner order. packkit/recipe ships convenience constructors for the common combinations:

ConstructorDescription
recipe.tar()uncompressed tar (same API surface as the compressed variants)
recipe.zip()ZIP archive (per-entry compression — see below)
recipe.seven_z()7z archive
recipe.cpio()uncompressed cpio (newc)
recipe.ar()BSD ar
recipe.tar_gzip()tar.gz
recipe.tar_zstd()tar.zst
recipe.tar_xz()tar.xz
recipe.tar_bzip2()tar.bz2
recipe.tar_lz4()tar.lz4
recipe.tar_snappy()tar.snappy
recipe.tar_lzw()tar.Z
recipe.tar_zlib()tar.zlib
recipe.tar_brotli()tar.br
recipe.cpio_gzip() / cpio_bzip2() / cpio_xz() / cpio_zstd()matching cpio variants

Need a recipe that isn’t in the table? Compose one with recipe.wrap. The wrapper adds an outer codec layer on top of an existing recipe.

import packkit/archive
import packkit/codec
import packkit/recipe

pub fn cpio_lz4_then_zstd() -> recipe.Recipe {
  // Inner-to-outer order: cpio → lz4 → zstd
  recipe.archive_with(format: archive.cpio_newc(), wrapped_by: codec.lz4())
  |> recipe.wrap(with: codec.zstd())
}

recipe.description returns the canonical dotted name ("cpio-newc.lz4.zstd" for the recipe above), which is handy for test snapshots and logs.

Detecting a format

Three entry points return an opaque Detected value, inspected through detect.codec / detect.archive / detect.recipe / detect.extension. Compound extensions (.tar.gz, .cpio.zst, …) take precedence over their inner counterparts.

import gleam/option.{type Option}
import packkit
import packkit/detect
import packkit/recipe

pub fn recipe_for_filename(path: String) -> Option(recipe.Recipe) {
  let assert Ok(info) = packkit.detect_filename(path)
  detect.recipe(info)
}
// recipe_for_filename("backup-2026-05-22.tar.gz")
//   -> Some(recipe.tar_gzip())
// recipe_for_filename("logs.tar.zst")
//   -> Some(recipe.tar_zstd())

For incoming data of unknown origin (uploads, stdin) prefer detect.from_path_or_bytes — it tries the filename first and falls back to magic-byte sniffing on the supplied content.

import gleam/option.{type Option}
import packkit/codec
import packkit/detect

/// Pick the right codec for a downloaded blob even when the URL has no
/// useful extension (`/dev/stdin`, `download.bin`, …).
pub fn pick_codec(path: String, leading_bytes: BitArray) -> Option(codec.Codec) {
  detect.from_path_or_bytes(path: path, bytes: leading_bytes)
  |> option.from_result
  |> option.then(detect.codec)
}

packkit.detect_filename / detect_bytes / detect_path_or_bytes re-export the packkit/detect entrypoints from the top-level facade so most CLI integrations only need to import packkit.

Inspecting an archive

The decoded Archive is iterated through archive.entries; each Entry is opaque and inspected through accessors. archive.entry_by_path short-circuits the “fetch one named member” use case.

import gleam/list
import gleam/option.{None, Some}
import packkit
import packkit/archive
import packkit/entry
import packkit/recipe

pub fn extract_one_file(bytes: BitArray) -> Result(BitArray, Nil) {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
  case archive.entry_by_path(decoded, path: "hello.txt") {
    Ok(found) -> Ok(entry.body(found))
    Error(_) -> Error(Nil)
  }
}

pub fn list_files(bytes: BitArray) -> List(String) {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
  archive.entries(decoded)
  |> list.filter(entry.is_file)
  |> list.map(fn(e) { entry.to_string(entry.path(e)) })
}

ZIP per-entry methods

ZIP is an archive family, not a recipe — each entry can carry its own compression method. zip.encode_with_method applies the chosen method to every entry; mix-and-match per entry is not (yet) exposed. The supported methods are store, deflate, bzip2, zstd, xz, and lzma (PKWARE method 14).

import packkit
import packkit/archive
import packkit/level
import packkit/recipe
import packkit/zip

pub fn write_deflated_zip() -> BitArray {
  let archive_value =
    zip.new()
    |> archive.add_file(path: "report.csv", body: <<"a,b,c\n1,2,3\n":utf8>>)
    |> archive.add_file(path: "notes.txt", body: <<"keep me":utf8>>)
  let assert Ok(bytes) =
    zip.encode_with_method(
      archive: archive_value,
      method: zip.deflate(level: level.default()),
    )
  bytes
}

pub fn write_zstd_zip() -> BitArray {
  let archive_value =
    zip.new()
    |> archive.add_file(path: "blob.bin", body: <<"…":utf8>>)
  let assert Ok(bytes) =
    zip.encode_with_method(archive: archive_value, method: zip.zstd())
  bytes
}

pub fn read_zip(bytes: BitArray) -> Int {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.zip())
  archive.entry_count(decoded)
}

zip.decode_with_password reads PKWARE traditional (“ZipCrypto”) and WinZip AES (AE-1 / AE-2) entries through the same logical-archive API once the password is supplied — see the docs for the supported method matrix.

gzip header metadata round-trip

packkit/gzip exposes the full RFC 1952 header (member name, comment, mtime, optional extra subfields). The top-level facade hides the header, but for tooling that needs to read or set those fields, use the gzip module directly.

import gleam/option.{Some}
import packkit/gzip

pub fn gzip_with_header_metadata(payload: BitArray) -> #(BitArray, Result(gzip.Decoded, _)) {
  let header =
    gzip.default_header()
    |> gzip.with_name(name: "report.csv")
    |> gzip.with_comment(comment: "generated by packkit")
    |> gzip.with_modified_at(unix_seconds: 1_700_000_000)

  let assert Ok(bytes) =
    gzip.encode_with_header(bytes: payload, header: header)
  #(bytes, gzip.decode(bytes: bytes))
}

gzip.decode returns a Decoded record carrying both the original header and the decoded payload, so callers can replay metadata from one gzip stream into another.

Streaming chunks via packkit/stream

packkit/stream exposes opaque incremental decoder and encoder states. push / push_encoder buffer one chunk at a time and enforce max_input_bytes as the chunks arrive; finish / finish_encoder runs the actual codec once.

import packkit
import packkit/codec
import packkit/stream

pub fn streamed_gzip_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream_bytes) =
    packkit.compress(bytes: payload, with: codec.gzip())

  // Split the compressed stream into two arbitrary chunks; the decoder
  // doesn't care how the producer carved them up.
  let chunks = [stream_bytes, <<>>]

  let assert Ok(plain) =
    stream.decode_chunks(decoder: stream.new_gzip_decoder(), chunks: chunks)
  plain
}

Every codec gets a matching constructor — new_deflate_decoder, new_zlib_decoder, new_lz4_decoder, new_snappy_decoder, new_bzip2_decoder, new_lzw_decoder, new_xz_decoder, new_zstd_decoder, new_brotli_decoder — plus the encoder twins (new_gzip_encoder, …, encode_chunks).

Resource limits

packkit/limit carries a budget that every decode entry point honours: input size, output size, member count, name length, entry depth, and maximum window bits. The facade variants (compress, decompress, pack, unpack) ship *_with_limits twins that thread a custom Limits value through the codec chain and the archive decoder.

import packkit
import packkit/codec
import packkit/error
import packkit/limit

/// Reject any gzip stream whose ciphertext is larger than 4 bytes.
/// Useful only as an illustration — production budgets live in the
/// megabytes.
pub fn refuse_oversized_gzip(stream: BitArray) -> Bool {
  let tight = limit.default() |> limit.with_max_input_bytes(bytes: 4)
  case
    packkit.decompress_with_limits(
      bytes: stream,
      with: codec.gzip(),
      limits: tight,
    )
  {
    Error(error.CodecLimitExceeded(limit: "max_input_bytes", actual: _)) -> True
    _ -> False
  }
}

The default budget is conservative (64 MiB in, 256 MiB out, 10 000 entries) — explicit limits in shared / multi-tenant code paths are strongly recommended.

Checksums

packkit/checksum ships the same checksum families the codec engines use internally, exposed as standalone helpers.

import packkit/checksum

pub fn checksums() -> #(Int, Int, BitArray) {
  let payload = <<"packkit":utf8>>
  #(
    checksum.adler32(data: payload),
    checksum.crc32(data: payload),
    checksum.sha256(data: payload),
  )
}

adler32_continue and crc32_continue let callers chain rolling checksums across multiple chunks without re-hashing the prefix. sha256_init / sha256_update / sha256_finalize expose the same streaming shape for SHA-256.

Error handling

Every public entry point returns Result(_, e) with a typed error. packkit/error.format_*_error emits a single user-facing line for each family so CLI integrations can surface them as-is.

import packkit
import packkit/archive
import packkit/error
import packkit/recipe
import packkit/tar
import packkit/zip
import packkit/entry

pub fn refuses_format_mismatch() -> String {
  // An `Archive` is bound to one format at construction time; asking
  // `pack` to write it as a different format is rejected up-front.
  let zip_archive_value =
    zip.new()
    |> archive.add(entry: entry.file(path: "x", body: <<"x":utf8>>))

  case packkit.pack(archive_value: zip_archive_value, using: recipe.tar_gzip()) {
    Error(err) -> error.format_archive_error(err)
    Ok(_) -> "ok"
  }
  // -> "archive: format mismatch (archive was built as \"zip\" but \"tar\" was requested)"
}

The full error families are:

Supported formats

Implemented codecs:

Implemented archive families:

Checksum primitives shared across codecs and exposed directly:

For the full coverage matrix — including which encoder strategies are currently exposed for each codec — see CHANGELOG.md.

Targets

Both the Erlang and JavaScript targets are exercised in CI on every push. Pure-Gleam internals mean no NIF / native binary is needed.

Development

See CONTRIBUTING.md for the local workflow.

just ci         # format-check + lint + typecheck + test
just test       # gleam test on the default target

License

MIT

Search Document