StripJs (strip_js v1.3.0)
View SourceStripJs is an Elixir module for stripping executable JavaScript from blocks of HTML and CSS, based on the <a target=_blank href="https://github.com/philss/floki">Floki</a> parsing library.
It handles:
<script>...</script>
and<script src="..."></script>
tags- Event handler attributes such as
onclick="..."
javascript:...
URLs in HTML and CSS- CSS
expression(...)
directives - HTML entity attacks (like
<script>
)
StripJs is production ready, and has sanitized over 1.5 billion payloads at Appcues.
Installation
Add strip_js
to your application's mix.exs
:
def application do
[applications: [:strip_js]]
end
def deps do
[{:strip_js, "~> #{StripJs.Mixfile.project()[:version]}"}]
end
Usage
clean_html/2
removes all JS vectors from an HTML string:
iex> html = "<button onclick=\"alert('pwnt')\">Hi!</button>"
iex> StripJs.clean_html(html)
"<button>Hi!</button>"
clean_css/2
removes all JS vectors from a CSS string:
iex> css = "body { background-image: url('javascript:alert()'); }"
iex> StripJs.clean_css(css)
"body { background-image: url('removed_by_strip_js:alert()'); }"
StripJs relies on the Floki
HTML parser library, which is built using
Mochiweb by default.
StripJs provides a clean_html_tree/1
function to strip JS from
Floki.parse_fragment/1
- and :mochiweb_html.parse/1
- style HTML parse trees.
Security
StripJs blocks every JS injection vector known to the authors. It has survived four years in production, multiple professional penetration tests, and over a billion invocations with no known security issues.
If you believe there are JS injection methods not covered by this library, please submit an issue with a test case!
Bugs and Limitations
The brokenness of invalid HTML may be amplified by clean_html/2
.
In uncommon cases, innocent CSS which very closely resembles
JS-injection techniques may be mangled by clean_css/2
.
Authorship and License
Copyright 2017-2021, Appcues, Inc.
Project homepage: StripJs
StripJs is released under the MIT License.
Summary
Functions
Removes JS vectors from the given CSS string; i.e., the contents of a
stylesheet or <style>
tag.
Removes JS vectors from the given HTML string.
Removes JS vectors from the given
Floki/
Mochiweb-style HTML tree
(html_tree/0
).
Types
Functions
Removes JS vectors from the given CSS string; i.e., the contents of a
stylesheet or <style>
tag.
Does not HTML-escape its output. Care is taken to maintain valid CSS syntax.
Example:
iex> css = "tt { background-color: expression('alert()'); }"
iex> StripJs.clean_css(css)
"tt { background-color: removed_by_strip_js('alert()'); }"
Warning: this step is performed using regexes, not a parser, so it is
possible for innocent CSS containing either of the strings javascript:
or expression(
to be mangled.
Removes JS vectors from the given HTML string.
All non-tag text and tag attribute values will be HTML-escaped, except
for the contents of <style>
tags, which are passed through clean_css/2
.
Even if the input HTML contained no JS, the output of clean_html/2
is not guaranteed to match its input byte-for-byte.
Examples:
iex> StripJs.clean_html("<button onclick=\"alert('phear');\">Click here</button>")
"<button>Click here</button>"
iex> StripJs.clean_html("<script> console.log('oh heck'); </script>")
""
iex> StripJs.clean_html("<script> console.log('oh heck'); </script>")
"<script> console.log('oh heck'); </script>" ## HTML entity attack didn't work
Removes JS vectors from the given
Floki/
Mochiweb-style HTML tree
(html_tree/0
).
All attribute values and tag bodies except embedded stylesheets will be HTML-escaped.