packkit
Archive, compression, and container workflows for Gleam — pure Gleam, zero runtime dependencies, runs on both the Erlang and JavaScript targets. Full API reference at https://hexdocs.pm/packkit/.
packkit keeps three concepts separate, so each is testable in
isolation and reusable in any combination:
- codec — bytes in, bytes out (
gzip,zlib,zstd,xz,bzip2,lz4,snappy,lzw,brotli,deflate). - archive — entries in, bytes out (
tar,zip,cpio,ar,7z). - recipe — one archive plus zero or more outer codecs
(
tar.gz,tar.zst,cpio.xz, …).
zip and 7z stay in the archive family. They are not modelled as
recipes just because they may compress members internally — see
ZIP per-entry methods for that knob.
Every example below is checked by
test/packkit/readme_examples_test.gleam,
so if it appears here it compiles and round-trips.
Install
gleam add packkit
Quick start: pack and unpack a tar.gz
The shortest end-to-end path. Build a logical archive, hand it to
packkit.pack with a recipe, get bytes back. packkit.unpack reverses
the recipe — gunzip, then tar-decode — and returns the same logical
archive.
import packkit
import packkit/archive
import packkit/recipe
import packkit/tar
pub fn build_and_read_tar_gz() -> Int {
let archive_value =
tar.new()
|> tar.add_file(path: "hello.txt", body: <<"hello":utf8>>)
|> tar.add_file(path: "world.txt", body: <<"world":utf8>>)
let assert Ok(bytes) =
packkit.pack(archive_value: archive_value, using: recipe.tar_gzip())
let assert Ok(decoded) =
packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
archive.entry_count(decoded)
// -> 2
}
Compressing and decompressing a single byte stream
For raw byte-to-byte work, skip the archive layer and call the codec
facade directly. The codec value carries its level and optional preset
dictionary — unsupported combinations surface as
CodecOptionUnsupported, never as a silent drop.
import packkit
import packkit/codec
pub fn gzip_roundtrip(payload: BitArray) -> BitArray {
let assert Ok(compressed) =
packkit.compress(bytes: payload, with: codec.gzip())
let assert Ok(restored) =
packkit.decompress(bytes: compressed, with: codec.gzip())
restored
}
The same call shape works for every supported codec. Pick the one that matches the input or the producer:
import packkit
import packkit/codec
pub fn zstd_roundtrip(payload: BitArray) -> BitArray {
let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.zstd())
let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.zstd())
plain
}
pub fn bzip2_roundtrip(payload: BitArray) -> BitArray {
let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.bzip2())
let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.bzip2())
plain
}
pub fn brotli_roundtrip(payload: BitArray) -> BitArray {
let assert Ok(stream) = packkit.compress(bytes: payload, with: codec.brotli())
let assert Ok(plain) = packkit.decompress(bytes: stream, with: codec.brotli())
plain
}
codec.identity() is a no-op codec — useful when a recipe needs to be
parameterised over “compress or not” without branching at the call site.
Building archives
tar, cpio, ar, zip, and 7z share one logical Archive value.
The format-specific module (packkit/tar, packkit/zip, …) exposes a
new/0 constructor; from there, archive.add_file / add_directory /
add_symlink / add_hardlink work identically across formats.
Format-side limitations (e.g. ar only carries flat files) surface at
encode time as a typed ArchiveError.
Tar with directories, symlinks, and metadata
import packkit
import packkit/archive
import packkit/entry
import packkit/tar
pub fn build_tar_with_metadata() -> BitArray {
let archive_value =
tar.new()
|> tar.add_directory(path: "etc")
|> tar.add_file(path: "etc/motd", body: <<"welcome":utf8>>)
|> tar.add_symlink(path: "etc/banner", target: "motd")
|> archive.add(
entry: entry.file(path: "bin/run", body: <<"#!/bin/sh\n":utf8>>)
|> entry.with_mode(mode: 0o755)
|> entry.with_owner(user_id: 1000, group_id: 1000)
|> entry.with_modified_at(unix_seconds: 1_700_000_000),
)
let assert Ok(bytes) =
packkit.write(archive_value: archive_value, format: tar.format())
bytes
}
entry.with_mode / with_owner / with_modified_at mutate an opaque
Entry value. The checked variants
(with_mode_checked, with_owner_checked, with_modified_at_checked)
return Result(_, MetadataError) instead of panicking when the value
is out of range; reach for them in code that touches user input.
Path validation
Entry paths are validated up-front. Absolute paths, .. traversal,
embedded NUL, Windows separators, empty / . segments all surface as
typed EntryError variants — there’s no way to construct an
Entry value that would silently extract outside its archive root.
import packkit/entry
import packkit/tar
pub fn rejects_traversal() -> Result(_, entry.EntryError) {
tar.add_file_checked(
archive: tar.new(),
path: "../etc/passwd",
body: <<"x":utf8>>,
)
// -> Error(entry.PathTraversal("../etc/passwd"))
}
CPIO, ar, 7z
The same archive.add_* helpers work for every format. Use
packkit.write to serialise.
import packkit
import packkit/archive
import packkit/cpio
import packkit/ar
import packkit/seven_z
pub fn build_cpio() -> BitArray {
let archive_value =
cpio.new()
|> archive.add_file(path: "lib/libfoo.so", body: <<"…":utf8>>)
|> archive.add_file(path: "lib/libbar.so", body: <<"…":utf8>>)
let assert Ok(bytes) =
packkit.write(archive_value: archive_value, format: cpio.format())
bytes
}
pub fn build_ar() -> BitArray {
let archive_value =
ar.new()
|> archive.add_file(path: "main.o", body: <<"obj":utf8>>)
|> archive.add_file(path: "debian-binary", body: <<"2.0\n":utf8>>)
let assert Ok(bytes) =
packkit.write(archive_value: archive_value, format: ar.format())
bytes
}
pub fn build_seven_z() -> BitArray {
let archive_value =
seven_z.new()
|> archive.add_file(path: "doc/spec.txt", body: <<"hello 7z":utf8>>)
|> archive.add_file(path: "doc/notes.txt", body: <<"more":utf8>>)
let assert Ok(bytes) =
packkit.write(archive_value: archive_value, format: seven_z.format())
bytes
}
Recipe composition
A Recipe is one archive plus zero or more outer codecs in
outer-to-inner order. packkit/recipe ships convenience constructors
for the common combinations:
| Constructor | Description |
|---|---|
recipe.tar() | uncompressed tar (same API surface as the compressed variants) |
recipe.zip() | ZIP archive (per-entry compression — see below) |
recipe.seven_z() | 7z archive |
recipe.cpio() | uncompressed cpio (newc) |
recipe.ar() | BSD ar |
recipe.tar_gzip() | tar.gz |
recipe.tar_zstd() | tar.zst |
recipe.tar_xz() | tar.xz |
recipe.tar_bzip2() | tar.bz2 |
recipe.tar_lz4() | tar.lz4 |
recipe.tar_snappy() | tar.snappy |
recipe.tar_lzw() | tar.Z |
recipe.tar_zlib() | tar.zlib |
recipe.tar_brotli() | tar.br |
recipe.cpio_gzip() / cpio_bzip2() / cpio_xz() / cpio_zstd() | matching cpio variants |
Need a recipe that isn’t in the table? Compose one with recipe.wrap.
The wrapper adds an outer codec layer on top of an existing recipe.
import packkit/archive
import packkit/codec
import packkit/recipe
pub fn cpio_lz4_then_zstd() -> recipe.Recipe {
// Inner-to-outer order: cpio → lz4 → zstd
recipe.archive_with(format: archive.cpio_newc(), wrapped_by: codec.lz4())
|> recipe.wrap(with: codec.zstd())
}
recipe.description returns the canonical dotted name
("cpio-newc.lz4.zstd" for the recipe above), which is handy for
test snapshots and logs.
Detecting a format
Three entry points return an opaque Detected value, inspected through
detect.codec / detect.archive / detect.recipe / detect.extension.
Compound extensions (.tar.gz, .cpio.zst, …) take precedence over
their inner counterparts.
import gleam/option.{type Option}
import packkit
import packkit/detect
import packkit/recipe
pub fn recipe_for_filename(path: String) -> Option(recipe.Recipe) {
let assert Ok(info) = packkit.detect_filename(path)
detect.recipe(info)
}
// recipe_for_filename("backup-2026-05-22.tar.gz")
// -> Some(recipe.tar_gzip())
// recipe_for_filename("logs.tar.zst")
// -> Some(recipe.tar_zstd())
For incoming data of unknown origin (uploads, stdin) prefer
detect.from_path_or_bytes — it tries the filename first and falls
back to magic-byte sniffing on the supplied content.
import gleam/option.{type Option}
import packkit/codec
import packkit/detect
/// Pick the right codec for a downloaded blob even when the URL has no
/// useful extension (`/dev/stdin`, `download.bin`, …).
pub fn pick_codec(path: String, leading_bytes: BitArray) -> Option(codec.Codec) {
detect.from_path_or_bytes(path: path, bytes: leading_bytes)
|> option.from_result
|> option.then(detect.codec)
}
packkit.detect_filename / detect_bytes / detect_path_or_bytes
re-export the packkit/detect entrypoints from the top-level facade so
most CLI integrations only need to import packkit.
Inspecting an archive
The decoded Archive is iterated through archive.entries; each
Entry is opaque and inspected through accessors. archive.entry_by_path
short-circuits the “fetch one named member” use case.
import gleam/list
import gleam/option.{None, Some}
import packkit
import packkit/archive
import packkit/entry
import packkit/recipe
pub fn extract_one_file(bytes: BitArray) -> Result(BitArray, Nil) {
let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
case archive.entry_by_path(decoded, path: "hello.txt") {
Ok(found) -> Ok(entry.body(found))
Error(_) -> Error(Nil)
}
}
pub fn list_files(bytes: BitArray) -> List(String) {
let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.tar_gzip())
archive.entries(decoded)
|> list.filter(entry.is_file)
|> list.map(fn(e) { entry.to_string(entry.path(e)) })
}
ZIP per-entry methods
ZIP is an archive family, not a recipe — each entry can carry its own
compression method. zip.encode_with_method applies the chosen method
to every entry; mix-and-match per entry is not (yet) exposed. The
supported methods are store, deflate, bzip2, zstd, xz, and
lzma (PKWARE method 14).
import packkit
import packkit/archive
import packkit/level
import packkit/recipe
import packkit/zip
pub fn write_deflated_zip() -> BitArray {
let archive_value =
zip.new()
|> archive.add_file(path: "report.csv", body: <<"a,b,c\n1,2,3\n":utf8>>)
|> archive.add_file(path: "notes.txt", body: <<"keep me":utf8>>)
let assert Ok(bytes) =
zip.encode_with_method(
archive: archive_value,
method: zip.deflate(level: level.default()),
)
bytes
}
pub fn write_zstd_zip() -> BitArray {
let archive_value =
zip.new()
|> archive.add_file(path: "blob.bin", body: <<"…":utf8>>)
let assert Ok(bytes) =
zip.encode_with_method(archive: archive_value, method: zip.zstd())
bytes
}
pub fn read_zip(bytes: BitArray) -> Int {
let assert Ok(decoded) = packkit.unpack(bytes: bytes, using: recipe.zip())
archive.entry_count(decoded)
}
zip.decode_with_password reads PKWARE traditional (“ZipCrypto”) and
WinZip AES (AE-1 / AE-2) entries through the same logical-archive API
once the password is supplied — see the docs for the supported method
matrix.
gzip header metadata round-trip
packkit/gzip exposes the full RFC 1952 header (member name, comment,
mtime, optional extra subfields). The top-level facade hides the
header, but for tooling that needs to read or set those fields, use the
gzip module directly.
import gleam/option.{Some}
import packkit/gzip
pub fn gzip_with_header_metadata(payload: BitArray) -> #(BitArray, Result(gzip.Decoded, _)) {
let header =
gzip.default_header()
|> gzip.with_name(name: "report.csv")
|> gzip.with_comment(comment: "generated by packkit")
|> gzip.with_modified_at(unix_seconds: 1_700_000_000)
let assert Ok(bytes) =
gzip.encode_with_header(bytes: payload, header: header)
#(bytes, gzip.decode(bytes: bytes))
}
gzip.decode returns a Decoded record carrying both the original
header and the decoded payload, so callers can replay metadata from one
gzip stream into another.
Streaming chunks via packkit/stream
packkit/stream exposes opaque incremental decoder and encoder states.
push / push_encoder buffer one chunk at a time and enforce
max_input_bytes as the chunks arrive; finish / finish_encoder
runs the actual codec once.
import packkit
import packkit/codec
import packkit/stream
pub fn streamed_gzip_roundtrip(payload: BitArray) -> BitArray {
let assert Ok(stream_bytes) =
packkit.compress(bytes: payload, with: codec.gzip())
// Split the compressed stream into two arbitrary chunks; the decoder
// doesn't care how the producer carved them up.
let chunks = [stream_bytes, <<>>]
let assert Ok(plain) =
stream.decode_chunks(decoder: stream.new_gzip_decoder(), chunks: chunks)
plain
}
Every codec gets a matching constructor —
new_deflate_decoder, new_zlib_decoder, new_lz4_decoder,
new_snappy_decoder, new_bzip2_decoder, new_lzw_decoder,
new_xz_decoder, new_zstd_decoder, new_brotli_decoder — plus the
encoder twins (new_gzip_encoder, …, encode_chunks).
Resource limits
packkit/limit carries a budget that every decode entry point honours:
input size, output size, member count, name length, entry depth, and
maximum window bits. The facade variants (compress, decompress,
pack, unpack) ship *_with_limits twins that thread a custom
Limits value through the codec chain and the archive decoder.
import packkit
import packkit/codec
import packkit/error
import packkit/limit
/// Reject any gzip stream whose ciphertext is larger than 4 bytes.
/// Useful only as an illustration — production budgets live in the
/// megabytes.
pub fn refuse_oversized_gzip(stream: BitArray) -> Bool {
let tight = limit.default() |> limit.with_max_input_bytes(bytes: 4)
case
packkit.decompress_with_limits(
bytes: stream,
with: codec.gzip(),
limits: tight,
)
{
Error(error.CodecLimitExceeded(limit: "max_input_bytes", actual: _)) -> True
_ -> False
}
}
The default budget is conservative (64 MiB in, 256 MiB out, 10 000 entries) — explicit limits in shared / multi-tenant code paths are strongly recommended.
Checksums
packkit/checksum ships the same checksum families the codec engines
use internally, exposed as standalone helpers.
import packkit/checksum
pub fn checksums() -> #(Int, Int, BitArray) {
let payload = <<"packkit":utf8>>
#(
checksum.adler32(data: payload),
checksum.crc32(data: payload),
checksum.sha256(data: payload),
)
}
adler32_continue and crc32_continue let callers chain rolling
checksums across multiple chunks without re-hashing the prefix.
sha256_init / sha256_update / sha256_finalize expose the same
streaming shape for SHA-256.
Error handling
Every public entry point returns Result(_, e) with a typed error.
packkit/error.format_*_error emits a single user-facing line for each
family so CLI integrations can surface them as-is.
import packkit
import packkit/archive
import packkit/error
import packkit/recipe
import packkit/tar
import packkit/zip
import packkit/entry
pub fn refuses_format_mismatch() -> String {
// An `Archive` is bound to one format at construction time; asking
// `pack` to write it as a different format is rejected up-front.
let zip_archive_value =
zip.new()
|> archive.add(entry: entry.file(path: "x", body: <<"x":utf8>>))
case packkit.pack(archive_value: zip_archive_value, using: recipe.tar_gzip()) {
Error(err) -> error.format_archive_error(err)
Ok(_) -> "ok"
}
// -> "archive: format mismatch (archive was built as \"zip\" but \"tar\" was requested)"
}
The full error families are:
CodecError—CodecInvalidData,CodecLimitExceeded,CodecDictionaryRequired,CodecDictionaryMismatch,CodecOptionUnsupported,CodecNotImplemented.ArchiveError—ArchiveUnsupported,ArchiveInvalid,ArchiveEntryRejected,ArchiveLimitExceeded,ArchiveNotImplemented,ArchiveCodecFailed(wraps aCodecErrorso a recipe-time codec failure preserves its structured cause),ArchiveFormatMismatch,ArchiveFieldOverflow,ArchiveCommentUnsupported.RecipeError—RecipeArchiveAlreadySet,RecipeEmptyCodecChain,RecipeUnsupportedComposition,RecipeNotImplemented.DetectError—DetectUnknownFormat,DetectNotImplemented.
Supported formats
Implemented codecs:
- gzip (RFC 1952 — header metadata, multi-member streams, CRC/ISIZE verification)
- zlib (RFC 1950 — Adler-32 trailer, preset dictionaries)
- deflate (RFC 1951 — full decoder; stored + fixed/dynamic-Huffman LZ77 encoders)
- lz4 (frame decoder + LZ77 encoder; legacy
lz4c0x184C2102frames decode too) - snappy (raw block + framed codec, LZ77 block compressor)
- bzip2 (round-trip; multi-stream
.bz2concatenation decodes) - lzw (Unix
.Zencoder + decoder) - xz (stream header / block / index / footer + LZMA2 with both uncompressed and LZMA-compressed chunks, BCJ filter pre-processors, all four block-check types incl. SHA-256, multi-stream concatenation)
- zstd (frame envelope + raw / RLE / FSE-compressed blocks, Huffman-coded literals, treeless literals, predefined / RLE / FSE sequence modes, multi-frame stream decoding, real LZ77 sequences on the encode side)
- brotli (full RFC 7932 decoder; encoder picks the smallest of three candidates per payload, with a real LZ77 + complex-form Huffman LZ77 path)
Implemented archive families:
- tar — USTAR encode/decode plus GNU
LongName/LongLinkand PAX attribute (x/g) decoder - cpio — newc encode/decode
- ar — BSD long-name encode/decode; the decoder also accepts the
GNU long-name string table form (
//+/<offset>), so.a/.debarchives produced bybinutils arround-trip end-to-end - zip — stored + deflate + bzip2 + zstd + xz + PKWARE LZMA (method 14) encode/decode, Zip64 extensions, ZipCrypto + WinZip AES (AE-1 / AE-2) decryption, per-entry mtime / UID / GID, EFS UTF-8 names
- 7z — single-folder reader for Copy / LZMA / LZMA2 / Deflate / BZip2 plus the BCJ + Delta filter family; encoder writes a single-folder archive with LZMA, Copy, Deflate, or BZip2 as the coder
Checksum primitives shared across codecs and exposed directly:
- Adler-32, CRC-32 (reflected), CRC-32C (Castagnoli), bzip2 CRC-32
(non-reflected), CRC-64 (xz / ECMA reflected, returned as a
#(low_u32, high_u32)pair for cross-target precision), SHA-1, and SHA-256 (FIPS 180-4)
For the full coverage matrix — including which encoder strategies are currently exposed for each codec — see CHANGELOG.md.
Targets
Both the Erlang and JavaScript targets are exercised in CI on every push. Pure-Gleam internals mean no NIF / native binary is needed.
Development
See CONTRIBUTING.md for the local workflow.
just ci # format-check + lint + typecheck + test
just test # gleam test on the default target