packkit

Archive, compression, and container workflows for Gleam — pure Gleam, zero runtime dependencies, runs on both the Erlang and JavaScript targets. Full API reference at https://hexdocs.pm/packkit/.

packkit keeps three concepts separate, so each is testable in isolation and reusable in any combination:

codec — bytes in, bytes out (gzip, zlib, zstd, xz, bzip2, lz4, snappy, lzw, brotli, deflate).
archive — entries in, bytes out (tar, zip, cpio, ar, 7z).
recipe — one archive plus zero or more outer codecs (tar.gz, tar.zst, cpio.xz, …).

zip and 7z stay in the archive family. They are not modelled as recipes just because they may compress members internally — see ZIP per-entry methods for that knob.

Every example below is checked by test/packkit/readme_examples_test.gleam, so if it appears here it compiles and round-trips.

Install

gleam add packkit

Quick start: pack and unpack a tar.gz

The shortest end-to-end path. Build a logical archive, hand it to packkit.pack with a recipe, get bytes back. packkit.unpack reverses the recipe — gunzip, then tar-decode — and returns the same logical archive.

import packkit
import packkit/archive
import packkit/recipe
import packkit/tar

pub fn build_and_read_tar_gz() -> Int {
  let archive_value =
    tar.new()
    |> tar.add_file(path: "hello.txt", body: <<"hello":utf8>>)
    |> tar.add_file(path: "world.txt", body: <<"world":utf8>>)

  let assert Ok(bytes) =
    packkit.pack(archive_value: archive_value, using: recipe.tar_gzip())

  let assert Ok(decoded) =
    packkit.unpack(bytes: bytes, using: recipe.tar_gzip())

  archive.entry_count(decoded)
  // -> 2
}

Compressing and decompressing a single byte stream

For raw byte-to-byte work, skip the archive layer and call the codec facade directly. The codec value carries its level and optional preset dictionary — unsupported combinations surface as CodecOptionUnsupported, never as a silent drop.

import packkit
import packkit/codec

pub fn gzip_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(compressed) =
    packkit.compress(bytes: payload, with: codec.gzip())
  let assert Ok(restored) =
    packkit.decompress(bytes: compressed, with: codec.gzip())
  restored
}

The same call shape works for every supported codec. Pick the one that matches the input or the producer:

import packkit
import packkit/codec

pub fn zstd_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.zstd())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.zstd())
  plain
}

pub fn bzip2_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.bzip2())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.bzip2())
  plain
}

pub fn brotli_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.brotli())
  let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.brotli())
  plain
}

codec.identity() is a no-op codec — useful when a recipe needs to be parameterised over “compress or not” without branching at the call site.

Building archives

tar, cpio, ar, zip, and 7z share one logical Archive value. The format-specific module (packkit/tar, packkit/zip, …) exposes a new/0 constructor; from there, archive.add_file / add_directory / add_symlink / add_hardlink work identically across formats. Format-side limitations (e.g. ar only carries flat files) surface at encode time as a typed ArchiveError.

Tar with directories, symlinks, and metadata

import packkit
import packkit/archive
import packkit/entry
import packkit/tar

pub fn build_tar_with_metadata() -> BitArray {
  let archive_value =
    tar.new()
    |> tar.add_directory(path: "etc")
    |> tar.add_file(path: "etc/motd", body: <<"welcome":utf8>>)
    |> tar.add_symlink(path: "etc/banner", target: "motd")
    |> archive.add(
      entry: entry.file(path: "bin/run", body: <<"#!/bin/sh\n":utf8>>)
        |> entry.with_mode(mode: 0o755)
        |> entry.with_owner(user_id: 1000, group_id: 1000)
        |> entry.with_modified_at(unix_seconds: 1_700_000_000),
    )

  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: tar.format())
  bytes
}

entry.with_mode / with_owner / with_modified_at mutate an opaque Entry value. The checked variants (with_mode_checked, with_owner_checked, with_modified_at_checked) return Result(_, MetadataError) instead of panicking when the value is out of range; reach for them in code that touches user input.

Path validation

Entry paths are validated up-front. Absolute paths, .. traversal, embedded NUL, Windows separators, empty / . segments all surface as typed EntryError variants — there’s no way to construct an Entry value that would silently extract outside its archive root.

import packkit/entry
import packkit/tar

pub fn rejects_traversal() -> Result(_, entry.EntryError) {
  tar.add_file_checked(
    archive: tar.new(),
    path: "../etc/passwd",
    body: <<"x":utf8>>,
  )
  // -> Error(entry.PathTraversal("../etc/passwd"))
}

CPIO, ar, 7z

The same archive.add_* helpers work for every format. Use packkit.write to serialise.

import packkit
import packkit/archive
import packkit/cpio
import packkit/ar
import packkit/seven_z

pub fn build_cpio() -> BitArray {
  let archive_value =
    cpio.new()
    |> archive.add_file(path: "lib/libfoo.so", body: <<"…":utf8>>)
    |> archive.add_file(path: "lib/libbar.so", body: <<"…":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: cpio.format())
  bytes
}

pub fn build_ar() -> BitArray {
  let archive_value =
    ar.new()
    |> archive.add_file(path: "main.o", body: <<"obj":utf8>>)
    |> archive.add_file(path: "debian-binary", body: <<"2.0\n":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: ar.format())
  bytes
}

pub fn build_seven_z() -> BitArray {
  let archive_value =
    seven_z.new()
    |> archive.add_file(path: "doc/spec.txt", body: <<"hello 7z":utf8>>)
    |> archive.add_file(path: "doc/notes.txt", body: <<"more":utf8>>)
  let assert Ok(bytes) =
    packkit.write(archive_value: archive_value, format: seven_z.format())
  bytes
}

Recipe composition

A Recipe is one archive plus zero or more outer codecs in outer-to-inner order. packkit/recipe ships convenience constructors for the common combinations:

Constructor	Description
`recipe.tar()`	uncompressed tar (same API surface as the compressed variants)
`recipe.zip()`	ZIP archive (per-entry compression — see below)
`recipe.seven_z()`	7z archive
`recipe.cpio()`	uncompressed cpio (newc)
`recipe.ar()`	BSD ar
`recipe.tar_gzip()`	`tar.gz`
`recipe.tar_zstd()`	`tar.zst`
`recipe.tar_xz()`	`tar.xz`
`recipe.tar_bzip2()`	`tar.bz2`
`recipe.tar_lz4()`	`tar.lz4`
`recipe.tar_snappy()`	`tar.snappy`
`recipe.tar_lzw()`	`tar.Z`
`recipe.tar_zlib()`	`tar.zlib`
`recipe.tar_brotli()`	`tar.br`
`recipe.cpio_gzip()` / `cpio_bzip2()` / `cpio_xz()` / `cpio_zstd()`	matching cpio variants

Need a recipe that isn’t in the table? Compose one with recipe.wrap. The wrapper adds an outer codec layer on top of an existing recipe.

import packkit/archive
import packkit/codec
import packkit/recipe

pub fn cpio_lz4_then_zstd() -> recipe.Recipe {
  // Inner-to-outer order: cpio → lz4 → zstd
  recipe.archive_with(format: archive.cpio_newc(), wrapped_by: codec.lz4())
  |> recipe.wrap(with: codec.zstd())
}

recipe.description returns the canonical dotted name ("cpio-newc.lz4.zstd" for the recipe above), which is handy for test snapshots and logs.

Detecting a format

Three entry points return an opaque Detected value, inspected through detect.codec / detect.archive / detect.recipe / detect.extension. Compound extensions (.tar.gz, .cpio.zst, …) take precedence over their inner counterparts.

import gleam/option.{type Option}
import packkit
import packkit/detect
import packkit/recipe

pub fn recipe_for_filename(path: String) -> Option(recipe.Recipe) {
  let assert Ok(info) = packkit.detect_filename(path)
  detect.recipe(info)
}
// recipe_for_filename("backup-2026-05-22.tar.gz")
//   -> Some(recipe.tar_gzip())
// recipe_for_filename("logs.tar.zst")
//   -> Some(recipe.tar_zstd())

For incoming data of unknown origin (uploads, stdin) prefer detect.from_path_or_bytes — it tries the filename first and falls back to magic-byte sniffing on the supplied content.

import gleam/option.{type Option}
import packkit/codec
import packkit/detect

/// Pick the right codec for a downloaded blob even when the URL has no
/// useful extension (`/dev/stdin`, `download.bin`, …).
pub fn pick_codec(path: String, leading_bytes: BitArray) -> Option(codec.Codec) {
  detect.from_path_or_bytes(path: path, bytes: leading_bytes)
  |> option.from_result
  |> option.then(detect.codec)
}

packkit.detect_filename / detect_bytes / detect_path_or_bytes re-export the packkit/detect entrypoints from the top-level facade so most CLI integrations only need to import packkit.

Inspecting an archive

The decoded Archive is iterated through archive.entries; each Entry is opaque and inspected through accessors. archive.entry_by_path short-circuits the “fetch one named member” use case.

import gleam/list
import gleam/option.{None, Some}
import packkit
import packkit/archive
import packkit/entry
import packkit/recipe

pub fn extract_one_file(bytes: BitArray) -> Result(BitArray, Nil) {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
  case archive.entry_by_path(decoded, path: "hello.txt") {
    Ok(found) -> Ok(entry.body(found))
    Error(_) -> Error(Nil)
  }
}

pub fn list_files(bytes: BitArray) -> List(String) {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
  archive.entries(decoded)
  |> list.filter(entry.is_file)
  |> list.map(fn(e) { entry.to_string(entry.path(e)) })
}

ZIP per-entry methods

ZIP is an archive family, not a recipe — each entry can carry its own compression method. zip.encode_with_method applies the chosen method to every entry; mix-and-match per entry is not (yet) exposed. The supported methods are store, deflate, bzip2, zstd, xz, and lzma (PKWARE method 14).

import packkit
import packkit/archive
import packkit/level
import packkit/recipe
import packkit/zip

pub fn write_deflated_zip() -> BitArray {
  let archive_value =
    zip.new()
    |> archive.add_file(path: "report.csv", body: <<"a,b,c\n1,2,3\n":utf8>>)
    |> archive.add_file(path: "notes.txt", body: <<"keep me":utf8>>)
  let assert Ok(bytes) =
    zip.encode_with_method(
      archive: archive_value,
      method: zip.deflate(level: level.default()),
    )
  bytes
}

pub fn write_zstd_zip() -> BitArray {
  let archive_value =
    zip.new()
    |> archive.add_file(path: "blob.bin", body: <<"…":utf8>>)
  let assert Ok(bytes) =
    zip.encode_with_method(archive: archive_value, method: zip.zstd())
  bytes
}

pub fn read_zip(bytes: BitArray) -> Int {
  let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.zip())
  archive.entry_count(decoded)
}

zip.decode_with_password reads PKWARE traditional (“ZipCrypto”) and WinZip AES (AE-1 / AE-2) entries through the same logical-archive API once the password is supplied — see the docs for the supported method matrix.

gzip header metadata round-trip

packkit/gzip exposes the full RFC 1952 header (member name, comment, mtime, optional extra subfields). The top-level facade hides the header, but for tooling that needs to read or set those fields, use the gzip module directly.

import gleam/option.{Some}
import packkit/gzip

pub fn gzip_with_header_metadata(payload: BitArray) -> #(BitArray, Result(gzip.Decoded, _)) {
  let header =
    gzip.default_header()
    |> gzip.with_name(name: "report.csv")
    |> gzip.with_comment(comment: "generated by packkit")
    |> gzip.with_modified_at(unix_seconds: 1_700_000_000)

  let assert Ok(bytes) =
    gzip.encode_with_header(bytes: payload, header: header)
  #(bytes, gzip.decode(bytes: bytes))
}

gzip.decode returns a Decoded record carrying both the original header and the decoded payload, so callers can replay metadata from one gzip stream into another.

Streaming chunks via packkit/stream

packkit/stream exposes opaque incremental decoder and encoder states. push / push_encoder buffer one chunk at a time and enforce max_input_bytes as the chunks arrive; finish / finish_encoder runs the actual codec once.

import packkit
import packkit/codec
import packkit/stream

pub fn streamed_gzip_roundtrip(payload: BitArray) -> BitArray {
  let assert Ok(stream_bytes) =
    packkit.compress(bytes: payload, with: codec.gzip())

  // Split the compressed stream into two arbitrary chunks; the decoder
  // doesn't care how the producer carved them up.
  let chunks = [stream_bytes, <<>>]

  let assert Ok(plain) =
    stream.decode_chunks(decoder: stream.new_gzip_decoder(), chunks: chunks)
  plain
}

Every codec gets a matching constructor — new_deflate_decoder, new_zlib_decoder, new_lz4_decoder, new_snappy_decoder, new_bzip2_decoder, new_lzw_decoder, new_xz_decoder, new_zstd_decoder, new_brotli_decoder — plus the encoder twins (new_gzip_encoder, …, encode_chunks).

Resource limits

packkit/limit carries a budget that every decode entry point honours: input size, output size, member count, name length, entry depth, and maximum window bits. The facade variants (compress, decompress, pack, unpack) ship *_with_limits twins that thread a custom Limits value through the codec chain and the archive decoder.

import packkit
import packkit/codec
import packkit/error
import packkit/limit

/// Reject any gzip stream whose ciphertext is larger than 4 bytes.
/// Useful only as an illustration — production budgets live in the
/// megabytes.
pub fn refuse_oversized_gzip(stream: BitArray) -> Bool {
  let tight = limit.default() |> limit.with_max_input_bytes(bytes: 4)
  case
    packkit.decompress_with_limits(
      bytes: stream,
      with: codec.gzip(),
      limits: tight,
    )
  {
    Error(error.CodecLimitExceeded(limit: "max_input_bytes", actual: _)) -> True
    _ -> False
  }
}

The default budget is conservative (64 MiB in, 256 MiB out, 10 000 entries) — explicit limits in shared / multi-tenant code paths are strongly recommended.

Checksums

packkit/checksum ships the same checksum families the codec engines use internally, exposed as standalone helpers.

import packkit/checksum

pub fn checksums() -> #(Int, Int, BitArray) {
  let payload = <<"packkit":utf8>>
  #(
    checksum.adler32(data: payload),
    checksum.crc32(data: payload),
    checksum.sha256(data: payload),
  )
}

adler32_continue and crc32_continue let callers chain rolling checksums across multiple chunks without re-hashing the prefix. sha256_init / sha256_update / sha256_finalize expose the same streaming shape for SHA-256.

Error handling

Every public entry point returns Result(_, e) with a typed error. packkit/error.format_*_error emits a single user-facing line for each family so CLI integrations can surface them as-is.

import packkit
import packkit/archive
import packkit/error
import packkit/recipe
import packkit/tar
import packkit/zip
import packkit/entry

pub fn refuses_format_mismatch() -> String {
  // An `Archive` is bound to one format at construction time; asking
  // `pack` to write it as a different format is rejected up-front.
  let zip_archive_value =
    zip.new()
    |> archive.add(entry: entry.file(path: "x", body: <<"x":utf8>>))

  case packkit.pack(archive_value: zip_archive_value, using: recipe.tar_gzip()) {
    Error(err) -> error.format_archive_error(err)
    Ok(_) -> "ok"
  }
  // -> "archive: format mismatch (archive was built as \"zip\" but \"tar\" was requested)"
}

The full error families are:

CodecError — CodecInvalidData, CodecLimitExceeded, CodecDictionaryRequired, CodecDictionaryMismatch, CodecOptionUnsupported, CodecNotImplemented.
ArchiveError — ArchiveUnsupported, ArchiveInvalid, ArchiveEntryRejected, ArchiveLimitExceeded, ArchiveNotImplemented, ArchiveCodecFailed (wraps a CodecError so a recipe-time codec failure preserves its structured cause), ArchiveFormatMismatch, ArchiveFieldOverflow, ArchiveCommentUnsupported.
RecipeError — RecipeArchiveAlreadySet, RecipeEmptyCodecChain, RecipeUnsupportedComposition, RecipeNotImplemented.
DetectError — DetectUnknownFormat, DetectNotImplemented.

Supported formats

Implemented codecs:

gzip (RFC 1952 — header metadata, multi-member streams, CRC/ISIZE verification)
zlib (RFC 1950 — Adler-32 trailer, preset dictionaries)
deflate (RFC 1951 — full decoder; stored + fixed/dynamic-Huffman LZ77 encoders)
lz4 (frame decoder + LZ77 encoder; legacy lz4c 0x184C2102 frames decode too)
snappy (raw block + framed codec, LZ77 block compressor)
bzip2 (round-trip; multi-stream .bz2 concatenation decodes)
lzw (Unix .Z encoder + decoder)
xz (stream header / block / index / footer + LZMA2 with both uncompressed and LZMA-compressed chunks, BCJ filter pre-processors, all four block-check types incl. SHA-256, multi-stream concatenation)
zstd (frame envelope + raw / RLE / FSE-compressed blocks, Huffman-coded literals, treeless literals, predefined / RLE / FSE sequence modes, multi-frame stream decoding, real LZ77 sequences on the encode side)
brotli (full RFC 7932 decoder; encoder picks the smallest of three candidates per payload, with a real LZ77 + complex-form Huffman LZ77 path)

Implemented archive families:

tar — USTAR encode/decode plus GNU LongName/LongLink and PAX attribute (x / g) decoder
cpio — newc encode/decode
ar — BSD long-name encode/decode; the decoder also accepts the GNU long-name string table form (// + /<offset>), so .a / .deb archives produced by binutils ar round-trip end-to-end
zip — stored + deflate + bzip2 + zstd + xz + PKWARE LZMA (method 14) encode/decode, Zip64 extensions, ZipCrypto + WinZip AES (AE-1 / AE-2) decryption, per-entry mtime / UID / GID, EFS UTF-8 names
7z — single-folder reader for Copy / LZMA / LZMA2 / Deflate / BZip2 plus the BCJ + Delta filter family; encoder writes a single-folder archive with LZMA, Copy, Deflate, or BZip2 as the coder

Checksum primitives shared across codecs and exposed directly:

Adler-32, CRC-32 (reflected), CRC-32C (Castagnoli), bzip2 CRC-32 (non-reflected), CRC-64 (xz / ECMA reflected, returned as a #(low_u32, high_u32) pair for cross-target precision), SHA-1, and SHA-256 (FIPS 180-4)

For the full coverage matrix — including which encoder strategies are currently exposed for each codec — see CHANGELOG.md.

Targets

Both the Erlang and JavaScript targets are exercised in CI on every push. Pure-Gleam internals mean no NIF / native binary is needed.

Development

See CONTRIBUTING.md for the local workflow.

just ci         # format-check + lint + typecheck + test
just test       # gleam test on the default target

License

MIT