FastHTML

A C Node wrapping lexbor. Primarily used with FastSanitize.

Compiling

  • GNU Make
  • C Compiler
  • Erlang 22.0+ with development headers
  • (optional) lexbor 2.2.0+

If you want to use a system installation of lexbor, you can set WITH_SYSTEM_LEXBOR=1 during compilation time. By default it will used the vendored version present at c_src/lexbor.

Benchmarks

The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with an AMD Ryzen 9 3950X (32) @ 3.500GHz CPU and 32GB of RAM. The mix fast_html.bench task can be used for running the benchmark by yourself.

File/Parserfast_html (Port)mochiweb_html (erlang)html5ever (Rust NIF)Myhtmlex (NIF)¹
document-large.html (6.9M)125.12 ms1778.34 ms395.21 ms327.17 ms
document-small.html (25K)0.50 ms2.76 ms1.72 ms1.19 ms
fragment-large.html (33K)0.93 ms4.78 ms2.34 ms2.15 ms
fragment-small.html² (757B)44.60 μs42.13 μs43.58 μs289.71 μs

Full benchmark output can be seen in this snippet

  1. Myhtmlex has a C-Node mode, but it wasn't benchmarked here because it segfaults on document-large.html
  2. The slowdown on fragment-small.html is due to Port overhead. Unlike html5ever and Myhtmlex in NIF mode, fast_html has the parser process isolated and communicates with it over stdio, so even if a fatal crash in the parser happens, it won't bring down the entire VM.

Contribution / Bug Reports

  • Please make sure you do git submodule update after a checkout/pull
  • The project aims to be fully tested