mimetype
MIME type lookup and magic-number detection for Gleam on Erlang and JavaScript targets.
Features
- Extension-to-MIME and MIME-to-extensions lookup derived from
mime-db - Magic-number detection for common binary formats across archive, document, image, audio, and video families
- Pure Gleam implementation that builds on both targets
Install
gleam add mimetype
When to use this
Use mimetype when you need a small, cross-target MIME utility in
Gleam:
- Serving files or attachments: resolve
Content-Typefrom a filename or extension - Validating uploads: prefer magic-number detection over user-supplied extensions
- Bridging APIs: map between file extensions and MIME types in both directions
The extension database is generated from jshttp/mime-db, which tracks
the IANA media type registry and common ecosystem aliases. Refreshing
the generated table keeps lookups aligned with that upstream source.
Serving a file: pick a Content-Type from a filename
The most common use is reading the filename your handler already has,
turning it into a wire-ready Content-Type value. filename_to_mime_type
is case-insensitive and falls back to application/octet-stream for
unknown extensions, so the helper is safe to drop into a response path
without extra branching.
import mimetype
/// Pick the Content-Type header value to send back when serving
/// `filename` from disk or object storage.
pub fn content_type_for(filename: String) -> String {
mimetype.filename_to_mime_type(filename)
|> mimetype.to_string
}
// content_type_for("report.PDF") -> "application/pdf"
// content_type_for("avatar.jpg") -> "image/jpeg"
// content_type_for("archive.tar.gz") -> "application/gzip"
// content_type_for("notes") -> "application/octet-stream"
For HTML / CSS / JS responses where browsers expect a charset, parse the wire string once and append the parameter you actually serve:
import gleam/option.{Some}
import mimetype
pub fn html_content_type() -> String {
let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
mimetype.to_string(html)
// -> "text/html; charset=utf-8"
}
Validating an upload: detect from bytes, not the user’s extension
Browser-uploaded filenames are user input and can lie. Match the leading
bytes of the upload against mimetype.detect to get the actual format,
then enforce an allowlist of MIME types your endpoint will accept.
import mimetype
pub type UploadError {
EmptyUpload
Unsupported(detected: String)
}
/// Allow only PNG, JPEG, and WebP uploads. The detected MIME type is
/// derived from magic bytes — the caller's filename is ignored.
pub fn validate_image_upload(
bytes: BitArray,
) -> Result(mimetype.MimeType, UploadError) {
case mimetype.detect_strict(bytes) {
Ok(mime) ->
case mimetype.is_image(mime) && image_is_allowed(mime) {
True -> Ok(mime)
False -> Error(Unsupported(detected: mimetype.to_string(mime)))
}
Error(mimetype.EmptyInput) -> Error(EmptyUpload)
Error(_) -> Error(Unsupported(detected: "application/octet-stream"))
}
}
fn image_is_allowed(mime: mimetype.MimeType) -> Bool {
case mimetype.essence_of(mime) {
"image/png" | "image/jpeg" | "image/webp" -> True
_ -> False
}
}
The strict variant separates EmptyInput (zero-byte upload) from
NoMatch (bytes that did not match any signature) so the caller can
return the right HTTP status. For a non-throwing path, mimetype.detect
returns application/octet-stream for both cases instead.
Other API entry points
The full surface returns an opaque MimeType. Use mimetype.to_string
to serialise for an HTTP header; use mimetype.parse to construct one
from a wire-format string. Inspect with essence_of, parameter_of,
charset_of_type, is_image, is_a, and the rest of the predicate /
accessor family. The parameter_of docstring pins the rules for
duplicate names (first wins), case-insensitive lookup, and value
whitespace handling — consult it before building anything that round-
trips parameters.
import gleam/option.{Some}
import mimetype
pub fn main() {
mimetype.extension_to_mime_type(".json")
|> mimetype.to_string
// -> "application/json"
let assert Ok(jpeg) = mimetype.parse("image/jpeg")
mimetype.mime_type_to_extensions(jpeg)
// -> ["jpg", "jpeg", "jpe"]
mimetype.detect_with_filename(<<0, 1, 2, 3>>, "report.csv")
|> mimetype.essence_of
// -> "text/csv"
let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
mimetype.charset_of_type(html)
// -> Some("utf-8")
}
Capabilities and limitations
This library intentionally stays focused. Knowing where the detector stops is more useful than discovering it from a surprising result:
- It does perform shallow ZIP-container inspection for a small fixed allowlist:
epub, OOXML (docx/xlsx/pptx), OpenDocument (odt/ods/odp),jar, andapk. It does not recurse arbitrarily into nested containers or inspect embedded subformats beyond those targeted signatures. - It does sniff
text/plainfrom printable-ASCII-only payloads (the bounded WHATWG-style binary-vs-text heuristic added in #20) and recognises the UTF-8/16/32 BOM signatures, returningtext/plain; charset=<utf-X>for the BOM cases. This is the only text-related sniffing — it does not detect text encodings beyond the BOM marker, and the printable-ASCII fallback emits a baretext/plainwith no charset parameter. - Beyond the four BOM-derived
text/plain; charset=utf-*signatures it does not parse, validate, or surface MIME-parameter values from the wire.
Content negotiation
mimetype/accept parses RFC 9110 §12.5 Accept-family headers and
picks the best server offer for a given client header.
import mimetype
import mimetype/accept
pub fn main() {
let assert Ok(items) = accept.parse("text/html, application/json;q=0.9")
let assert Ok(html) = mimetype.parse("text/html")
let assert Ok(json) = mimetype.parse("application/json")
accept.negotiate(client_accepts: items, server_offers: [json, html])
// -> Some(html)
}
The same module handles Accept-Encoding, Accept-Charset, and
Accept-Language:
import mimetype/accept
pub fn main() {
let assert Ok(items) =
accept.parse_encoding("gzip, br;q=1.0, *;q=0.1")
accept.negotiate_value(client_accepts: items, server_offers: ["br", "gzip"])
// -> Some("br")
}
Notes:
q=0excludes a media range from consideration.- A bare
*/*client header returns the server’s first offer (server preference). Specific(MimeType)matching is essence-only — RFC §12.5.1 parameter-level “more-specific” matching is currently out of scope.
Reader-based detection
detect_reader and detect_reader_strict let callers detect a MIME
type without buffering the whole input. They take a synchronous
reader plus a byte budget, and the reader is invoked at most once
to fetch up to that many bytes from the start of the source.
Reader contract
pub type Reader(read_error) = fn(Int) -> Result(BitArray, read_error)
- The
Intargument is the maximum number of bytes the detector wants. - Returning fewer bytes than requested is fine — it is interpreted as “the source ended early”. Detection runs against whatever was returned.
- The returned
BitArrayshould always be the prefix starting at offset 0 of the source. The detector inspects it from byte 0. - The error parameter
read_erroris opaque to the library; in the strict variant it is preserved asReaderError(read_error)so callers can distinguish IO failures from “no signature matched”.
The reader is called once per detection call. There is no
streaming or back-and-forth — return enough bytes for the largest
signature you care about (the detector inspects up to a few KB by
default), or pass a custom limit argument tuned for your workload.
In-memory adapter
The simplest case: when the bytes are already in hand, wrap them in a function that ignores its argument.
import mimetype
pub fn main() {
let png = <<0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A>>
let reader = fn(_limit) { Ok(png) }
mimetype.detect_reader(reader, 3072)
|> mimetype.to_string
// -> "image/png"
}
BEAM file prefix reader
On the Erlang target, wrap a file-IO library so that one call returns
up to limit bytes from the start of the file. Any IO library that
can open a file and read a fixed-size prefix works — the snippet below
sketches the shape using a read_prefix(path, limit) helper that
returns Result(BitArray, your_error):
import mimetype
pub fn detect_file(path: String) -> Result(mimetype.MimeType, mimetype.DetectionError(your_error)) {
let reader = fn(limit) { read_prefix(path, limit) }
mimetype.detect_reader_strict(reader, 3072)
}
If read_prefix returns Ok(<<>>) for an empty file, the strict
variant surfaces Error(EmptyInput). If read_prefix itself returns
Error(some_io_error), the strict variant surfaces
Error(ReaderError(some_io_error)) so the caller can distinguish IO
failure from a genuine no-match.
JavaScript browser adapter
In the browser, File / Blob / ReadableStream reads are
asynchronous, so they cannot satisfy the synchronous Reader
contract directly. The intended pattern is:
- Read the prefix asynchronously (
await blob.slice(0, limit).arrayBuffer()or the equivalent on aReadableStream). - Pass the resulting bytes to
detect/detect_strict, not todetect_reader.
In Gleam pseudo-code, with an FFI helper read_blob_prefix that
awaits the slice and returns a BitArray:
import mimetype
pub fn detect_blob(blob: Blob) -> mimetype.MimeType {
// `read_blob_prefix` is your FFI: await blob.slice(0, 3072).arrayBuffer()
let bytes = read_blob_prefix(blob, 3072)
mimetype.detect(bytes)
}
The reader-based API is most useful when the source is itself
synchronous (BEAM file IO, in-memory buffers, deterministic stream
adapters). For Promise-based sources, awaiting the prefix once and
calling detect is the recommended shape.
Strict variants and error handling
The strict variants return Result(MimeType, DetectionError(read_error)),
where DetectionError distinguishes:
EmptyInput— the reader returned a zero-byte payload, so no detection was possible.NoMatch— the reader returned bytes, but no signature and no printable-ASCII fallback applied.ReaderError(e)— the reader itself failed;eis preserved unchanged.UnknownExtension(_)— only emitted by extension/filename helpers, not the reader API.
import gleam/io
import mimetype
pub fn classify(reader) {
case mimetype.detect_reader_strict(reader, 3072) {
Ok(mime) -> io.println(mimetype.to_string(mime))
Error(mimetype.EmptyInput) -> io.println("empty source")
Error(mimetype.NoMatch) -> io.println("unrecognised content")
Error(mimetype.ReaderError(reason)) -> io.debug(reason)
Error(mimetype.UnknownExtension(_)) -> Nil
}
}
Supported magic-number formats
detect/1 recognises the following MIME types from byte-level
signatures or structural sniffs near the start of the input. This
list is generated from src/mimetype/internal/magic.gleam by
scripts/generate_supported_formats.sh — do not edit it by hand;
re-run just generate-readme after adding or removing a signature.
Application formats
application/epub+zipapplication/gzipapplication/java-archiveapplication/jsonapplication/mswordapplication/oggapplication/pdfapplication/rtfapplication/vnd.android.package-archiveapplication/vnd.apache.parquetapplication/vnd.ms-asfapplication/vnd.ms-cab-compressedapplication/vnd.ms-excelapplication/vnd.ms-fontobjectapplication/vnd.ms-powerpointapplication/vnd.oasis.opendocument.presentationapplication/vnd.oasis.opendocument.spreadsheetapplication/vnd.oasis.opendocument.textapplication/vnd.openxmlformats-officedocument.presentationml.presentationapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheetapplication/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/vnd.sqlite3application/wasmapplication/x-7z-compressedapplication/x-archiveapplication/x-bzip2application/x-compressapplication/x-deflateapplication/x-elfapplication/x-lz4application/x-lzh-compressedapplication/x-lzipapplication/x-ole-storageapplication/x-rar-compressedapplication/x-snappy-framedapplication/x-tarapplication/x-xzapplication/zipapplication/zstd
Audio formats
audio/aacaudio/ac3audio/aiffaudio/amraudio/amr-wbaudio/flacaudio/midiaudio/mp4audio/mpegaudio/wav
Font formats
font/collectionfont/otffont/ttffont/wofffont/woff2
Image formats
image/avifimage/bmpimage/fitsimage/gifimage/heicimage/jp2image/jpegimage/jxlimage/pngimage/svg+xmlimage/tiffimage/vnd.adobe.photoshopimage/vnd.ms-ddsimage/vnd.radianceimage/webpimage/x-exrimage/x-iconimage/x-qoi
Text formats
text/htmltext/plaintext/plain; charset=utf-16betext/plain; charset=utf-16letext/plain; charset=utf-32betext/plain; charset=utf-32letext/plain; charset=utf-8text/xml
Video formats
video/mp4video/quicktimevideo/webmvideo/x-flvvideo/x-matroskavideo/x-msvideo
The detector is intentionally shallow: it looks only at fixed signatures near the start of the byte stream, plus a small amount of targeted ZIP local-header inspection for the container formats listed above. It does not recurse arbitrarily into nested containers.
Development
mise install
just ci
The generated MIME-DB lookup tables live in
src/mimetype/internal/mimetype_db_ffi.erl and
src/mimetype/internal/db_ffi.mjs, with a thin Gleam wrapper at
src/mimetype/internal/db.gleam. All three files are derived from
doc/reference/upstream/mime-db/db.json. Refresh them with:
just generate-db
CI runs the same generator against the pinned upstream commit and fails the build if the regenerated output drifts from the committed copies.
Benchmarks
The hot lookup and detection paths have a small reproducible bench
harness under test/mimetype_bench.gleam. Run it on either target:
just bench-erlang
just bench-javascript
just bench # both, in sequence
Each run prints a Markdown table of ns/op figures. Capture a
baseline from main before a refactor
(just bench-erlang > before.md), then re-run on the working branch
and diff the two tables to check for material regressions. The
harness is intentionally not wired into PR-time CI gates — it is for
local A/B comparison and ad-hoc investigation, not for blocking
merges on micro-fluctuations.
Licensing
The data tables under src/mimetype/internal/ are generated from
jshttp/mime-db. The generated FFI source files
(mimetype_db_ffi.erl and db_ffi.mjs) carry the MIT notice inline;
the same packaged notice is also included in THIRD_PARTY_NOTICES.md.