StripJs (strip_js v1.3.0)

View Source

StripJs is an Elixir module for stripping executable JavaScript from blocks of HTML and CSS, based on the <a target=_blank href="https://github.com/philss/floki">Floki</a> parsing library.

It handles:

  • <script>...</script> and <script src="..."></script> tags
  • Event handler attributes such as onclick="..."
  • javascript:... URLs in HTML and CSS
  • CSS expression(...) directives
  • HTML entity attacks (like &lt;script&gt;)

StripJs is production ready, and has sanitized over 1.5 billion payloads at Appcues.

Installation

Add strip_js to your application's mix.exs:

def application do
  [applications: [:strip_js]]
end

def deps do
  [{:strip_js, "~> #{StripJs.Mixfile.project()[:version]}"}]
end

Usage

clean_html/2 removes all JS vectors from an HTML string:

iex> html = "<button onclick=\"alert('pwnt')\">Hi!</button>"
iex> StripJs.clean_html(html)
"<button>Hi!</button>"

clean_css/2 removes all JS vectors from a CSS string:

iex> css = "body { background-image: url('javascript:alert()'); }"
iex> StripJs.clean_css(css)
"body { background-image: url('removed_by_strip_js:alert()'); }"

StripJs relies on the Floki HTML parser library, which is built using Mochiweb by default. StripJs provides a clean_html_tree/1 function to strip JS from Floki.parse_fragment/1- and :mochiweb_html.parse/1- style HTML parse trees.

Security

StripJs blocks every JS injection vector known to the authors. It has survived four years in production, multiple professional penetration tests, and over a billion invocations with no known security issues.

If you believe there are JS injection methods not covered by this library, please submit an issue with a test case!

Bugs and Limitations

The brokenness of invalid HTML may be amplified by clean_html/2.

In uncommon cases, innocent CSS which very closely resembles JS-injection techniques may be mangled by clean_css/2.

Authorship and License

Copyright 2017-2021, Appcues, Inc.

Project homepage: StripJs

StripJs is released under the MIT License.

Summary

Functions

Removes JS vectors from the given CSS string; i.e., the contents of a stylesheet or <style> tag.

Removes JS vectors from the given HTML string.

Removes JS vectors from the given Floki/ Mochiweb-style HTML tree (html_tree/0).

Types

html_attr()

@type html_attr() :: {String.t(), String.t()}

html_node()

@type html_node() :: String.t() | {html_tag(), [html_attr()], [html_node()]}

html_tag()

@type html_tag() :: String.t()

html_tree()

@type html_tree() :: html_node() | [html_node()]

opts()

@type opts() :: Keyword.t()

Functions

clean_css(css, opts \\ [])

@spec clean_css(String.t(), opts()) :: String.t()

Removes JS vectors from the given CSS string; i.e., the contents of a stylesheet or <style> tag.

Does not HTML-escape its output. Care is taken to maintain valid CSS syntax.

Example:

iex> css = "tt { background-color: expression('alert()'); }"
iex> StripJs.clean_css(css)
"tt { background-color: removed_by_strip_js('alert()'); }"

Warning: this step is performed using regexes, not a parser, so it is possible for innocent CSS containing either of the strings javascript: or expression( to be mangled.

clean_html(html, opts \\ [])

@spec clean_html(String.t(), opts()) :: String.t()

Removes JS vectors from the given HTML string.

All non-tag text and tag attribute values will be HTML-escaped, except for the contents of <style> tags, which are passed through clean_css/2.

Even if the input HTML contained no JS, the output of clean_html/2 is not guaranteed to match its input byte-for-byte.

Examples:

iex> StripJs.clean_html("<button onclick=\"alert('phear');\">Click here</button>")
"<button>Click here</button>"

iex> StripJs.clean_html("<script> console.log('oh heck'); </script>")
""

iex> StripJs.clean_html("&lt;script&gt; console.log('oh heck'); &lt;/script&gt;")
"&lt;script&gt; console.log(&#39;oh heck&#39;); &lt;/script&gt;"  ## HTML entity attack didn't work

clean_html_tree(trees, opts \\ [])

@spec clean_html_tree(html_tree(), opts()) :: html_tree()

Removes JS vectors from the given Floki/ Mochiweb-style HTML tree (html_tree/0).

All attribute values and tag bodies except embedded stylesheets will be HTML-escaped.