Performance
Copy MarkdownURP converts documents by talking directly to soffice over TCP.
This page compares URP against Gotenberg,
a popular LibreOffice-based conversion service.
Benchmarks
docker compose --file benchmarks/docker-compose.yml up --detach --wait
mix run benchmarks/bench.exs
Both URP and Gotenberg run LibreOffice 26.2.0 on Debian (glibc).
The fixture uses Liberation fonts only — regenerate with
uv run --with python-docx --with Pillow --with numpy benchmarks/generate_fixture.py
(pass --size 15 for the large variant).
Results (Apple M3 Max)
2.6 MB input → 221-page PDF:
Name ips average deviation median 99th %
URP 1.05 0.95 s ±7.60% 0.94 s 1.20 s
Gotenberg 0.81 1.23 s ±7.47% 1.19 s 1.42 s15.5 MB input → 62 MB PDF:
Name ips average deviation median 99th %
URP 0.145 6.87 s ±7.44% 6.73 s 7.44 s
Gotenberg 0.087 11.45 s ±1.21% 11.45 s 11.54 s27% faster for small documents, 67% faster for large ones. The gap grows because Gotenberg's Go/HTTP overhead (multipart parsing, queue management, response framing) scales with document size, while URP talks to soffice directly over a TCP socket.
I/O strategies
URP supports two I/O transfer strategies via the :io option, benchmarked
with benchmarks/io_bench.exs:
mix run benchmarks/io_bench.exs
File I/O (:file, default) writes temp files on soffice's filesystem
and transfers them over URP in ~6 round-trips. Stream I/O (:stream)
pipes bytes over the URP socket via XInputStream/XOutputStream — no temp
disk, but more round-trips.
Stream input is the bottleneck (~40-50% slower) because ZIP-based formats (docx, xlsx, pptx) require thousands of XInputStream/XSeekable random-access round-trips. Stream output adds negligible overhead — soffice writes in fixed 32 767-byte chunks, so the round-trip count is predictable.
| Strategy | Input | Output | Best for |
|---|---|---|---|
io: :file | fast | fast | Default — best throughput |
io: {:file, :stream} | fast | chunked | Large outputs without single big allocation |
io: {:stream, :file} | slow | fast | No temp disk for input |
io: :stream | slow | chunked | No temp disk at all |
Container image
See benchmarks/Dockerfile.soffice-debian. Minimal Debian trixie-slim
with LibreOffice from trixie-backports, fonts-liberation (metric-
compatible with Arial/Times/Courier), and fonts-crosextra-carlito
(Calibri replacement). ~564 MB vs Gotenberg's ~1.86 GB.
PDF output
URP and Gotenberg produce identical PDFs when using the same LibreOffice version and fonts.
A single long-lived soffice instance (the normal URP deployment)
produces byte-identical output across consecutive conversions — the
only varying fields are timestamps and document IDs in metadata
(CreationDate, ModDate, /ID, /DocChecksum).
Different soffice processes produce visually identical PDFs but assign
fonts to different PDF object numbers (~32 KB of byte differences for
a 221-page PDF). This is likely hash table iteration order in LO's
font subsetting.
Bug 160033
tracks this upstream. qpdf --deterministic-id
can normalize metadata but not font object ordering.
Using libreofficedocker/alpine
The pre-built libreofficedocker/alpine
image works with URP out of the box but has two structural drawbacks.
musl allocator overhead. Alpine uses musl libc, whose
mallocng
allocator issues mmap/munmap for most allocations — 21,432
syscalls per conversion vs 25 on glibc. This adds ~260 ms. It's
inherent to musl
and can be mitigated with
LD_PRELOAD=/usr/lib/libjemalloc.so.2.
Image bloat. Despite Alpine's small-image reputation,
libreofficedocker/alpine is 1.78 GB — nearly 3x the Debian image.
It bundles OpenJDK 11, 130+ Noto font packages, and 450 packages
total.
As of March 2026, Alpine ships LO 25.8.x (Still) while Debian
trixie-backports has 26.2.x (Fresh). Carlito is missing from the
stock Alpine image (apk add font-carlito to fix).
| Setup | 2.6 MB | 15.5 MB | LO version | Image size |
|---|---|---|---|---|
| URP → Debian glibc | 0.94 s | 6.73 s | 26.2.0 | ~564 MB |
| URP → Alpine musl | 1.20 s | 11.11 s | 25.8.1 | ~1.78 GB |
| Gotenberg (Debian glibc) | 1.19 s | 11.45 s | 26.2.0 | ~1.86 GB |