# `Floki` [🔗](https://github.com/philss/floki/blob/v0.38.1/lib/floki.ex#L1) Floki is a simple HTML parser that enables search for nodes using CSS selectors. ## Example Assuming that you have the following HTML: ```html

Floki

Github page philss

``` To parse this, you can use the function `Floki.parse_document/1`: ```elixir {:ok, html} = Floki.parse_document(doc) # => # [{"html", [], # [ # {"body", [], # [ # {"section", [{"id", "content"}], # [ # {"p", [{"class", "headline"}], ["Floki"]}, # {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}, # {"span", [{"data-model", "user"}], ["philss"]} # ]} # ]} # ]}] ``` With this document you can perform queries such as: * `Floki.find(html, "#content")` * `Floki.find(html, ".headline")` * `Floki.find(html, "a")` * `Floki.find(html, "[data-model=user]")` * `Floki.find(html, "#content a")` * `Floki.find(html, ".headline, a")` Each HTML node is represented by a tuple like: {tag_name, attributes, children_nodes} Example of node: {"p", [{"class", "headline"}], ["Floki"]} So even if the only child node is the element text, it is represented inside a list. # `css_selector` ```elixir @type css_selector() :: String.t() | %Floki.Selector{ attributes: term(), classes: term(), combinator: term(), id: term(), namespace: term(), pseudo_classes: term(), type: term() } | [ %Floki.Selector{ attributes: term(), classes: term(), combinator: term(), id: term(), namespace: term(), pseudo_classes: term(), type: term() } ] ``` # `html_attribute` ```elixir @type html_attribute() :: {String.t(), String.t()} ``` # `html_attributes` ```elixir @type html_attributes() :: [html_attribute()] | html_attributes_map() ``` # `html_attributes_map` ```elixir @type html_attributes_map() :: %{required(String.t()) => String.t()} ``` # `html_comment` ```elixir @type html_comment() :: {:comment, String.t()} ``` # `html_declaration` ```elixir @type html_declaration() :: {:pi, String.t(), html_attributes()} ``` # `html_doctype` ```elixir @type html_doctype() :: {:doctype, String.t(), String.t(), String.t()} ``` # `html_node` ```elixir @type html_node() :: html_tag() | html_comment() | html_doctype() | html_declaration() | html_text() ``` # `html_tag` ```elixir @type html_tag() :: {String.t(), html_attributes(), [html_node()]} ``` # `html_text` ```elixir @type html_text() :: String.t() ``` # `html_tree` ```elixir @type html_tree() :: [html_node()] ``` # `attr` ```elixir @spec attr(html_tree() | html_node(), css_selector(), binary(), (binary() -> binary())) :: html_tree() ``` Changes the attribute values of the elements matched by `selector` with the function `mutation` and returns the whole element tree. ## Examples iex> Floki.attr([{"div", [{"id", "a"}], []}], "#a", "id", fn(id) -> String.replace(id, "a", "b") end) [{"div", [{"id", "b"}], []}] iex> Floki.attr([{"div", [{"class", "name"}], []}], "div", "id", fn _ -> "b" end) [{"div", [{"id", "b"}, {"class", "name"}], []}] # `attribute` ```elixir @spec attribute(html_tree() | html_node(), binary()) :: [binary()] ``` Returns a list with attribute values from elements. ## Examples iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "href") ["https://google.com"] iex> Floki.attribute([{"a", [{"href", "https://google.com"}, {"data-name", "google"}], ["Google"]}], "data-name") ["google"] # `attribute` ```elixir @spec attribute(binary() | html_tree() | html_node(), binary(), binary()) :: list() ``` Returns a list with attribute values for a given selector. ## Examples iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "a", "href") ["https://google.com"] iex> Floki.attribute( iex> [{"a", [{"class", "foo"}, {"href", "https://google.com"}], ["Google"]}], iex> "a", iex> "class" iex> ) ["foo"] iex> Floki.attribute( iex> [{"a", [{"href", "https://e.corp.com"}, {"data-name", "e.corp"}], ["E.Corp"]}], iex> "a[data-name]", iex> "data-name" iex> ) ["e.corp"] # `children` ```elixir @spec children(html_node(), Keyword.t()) :: html_tree() | nil ``` Returns the direct child nodes of a HTML node. By default, it will also include all texts. You can disable this behaviour by using the option `include_text` to `false`. If the given node is not an HTML tag, then it returns nil. ## Examples iex> Floki.children({"div", [], ["text", {"span", [], []}]}) ["text", {"span", [], []}] iex> Floki.children({"div", [], ["text", {"span", [], []}]}, include_text: false) [{"span", [], []}] iex> Floki.children({:comment, "comment"}) nil # `css_escape` ```elixir @spec css_escape(String.t()) :: String.t() ``` Escapes a string for use as a CSS identifier. ## Examples iex> Floki.css_escape("hello world") "hello\\ world" iex> Floki.css_escape("-123") "-\\31 23" # `filter_out` ```elixir @spec filter_out(html_node() | html_tree(), :comment | :text | css_selector()) :: html_node() | html_tree() ``` Returns the nodes from a HTML tree that don't match the filter selector. ## Examples iex> Floki.filter_out({"div", [], [{"script", [], ["hello"]}, " world"]}, "script") {"div", [], [" world"]} iex> Floki.filter_out([{"body", [], [{"script", [], []}, {"div", [], []}]}], "script") [{"body", [], [{"div", [], []}]}] iex> Floki.filter_out({"div", [], [{:comment, "comment"}, " text"]}, :comment) {"div", [], [" text"]} iex> Floki.filter_out({"div", [], ["text"]}, :text) {"div", [], []} # `find` ```elixir @spec find(html_tree() | html_node(), css_selector()) :: html_tree() ``` Find elements inside an HTML tree or string. ## Examples iex> {:ok, html} = Floki.parse_fragment("

hello

") iex> Floki.find(html, ".hint") [{"span", [{"class", "hint"}], ["hello"]}] iex> {:ok, html} = Floki.parse_fragment("

Content

") iex> Floki.find(html, "#important") [{"div", [{"id", "important"}], [{"div", [], ["Content"]}]}] iex> {:ok, html} = Floki.parse_fragment("

Google

") iex> Floki.find(html, "a") [{"a", [{"href", "https://google.com"}], ["Google"]}] iex> Floki.find([{ "div", [], [{"a", [{"href", "https://google.com"}], ["Google"]}]}], "div a") [{"a", [{"href", "https://google.com"}], ["Google"]}] # `find_and_update` ```elixir @spec find_and_update( html_tree(), css_selector(), ({String.t(), html_attributes()} -> {String.t(), html_attributes()} | :delete) ) :: html_tree() ``` Searches for elements inside the HTML tree and update those that matches the selector. It will return the updated HTML tree. This function works in a way similar to `traverse_and_update`, but instead of updating the children nodes, it will only updates the `tag` and `attributes` of the matching nodes. If `fun` returns `:delete`, the HTML node will be removed from the tree. ## Examples iex> Floki.find_and_update([{"a", [{"href", "http://elixir-lang.com"}], ["Elixir"]}], "a", fn iex> {"a", [{"href", href}]} -> iex> {"a", [{"href", String.replace(href, "http://", "https://")}]} iex> other -> iex> other iex> end) [{"a", [{"href", "https://elixir-lang.com"}], ["Elixir"]}] # `get_by_id` ```elixir @spec get_by_id(html_tree() | html_node(), String.t()) :: html_node() | nil ``` Finds the first element in an HTML tree by id. Returns `nil` if no element is found. This is useful when there are IDs that contain special characters that are invalid when passed as is as a CSS selector. It is similar to the `getElementById` method in the browser. ## Examples iex> {:ok, html} = Floki.parse_fragment(~s[

hello

]) iex> Floki.get_by_id(html, "id?foo_special:chars") {"span", [{"class", "hint"}, {"id", "id?foo_special:chars"}], ["hello"]} iex> Floki.get_by_id(html, "does-not-exist") nil # `is_html_node` *macro* # `parse_document` ```elixir @spec parse_document(binary(), Keyword.t()) :: {:ok, html_tree()} | {:error, String.t()} ``` Parses an HTML document from a string. This is the main function to get a tree from an HTML string. ## Options * `:attributes_as_maps` - Change the behaviour of the parser to return the attributes as maps, instead of a list of `{"key", "value"}`. Default to `false`. * `:html_parser` - The module of the backend that is responsible for parsing the HTML string. By default it is set to the built-in parser, and the module name is equal to `Floki.HTMLParser.Mochiweb`, or from the value of the application env of the same name. See https://github.com/philss/floki#alternative-html-parsers for more details. * `:parser_args` - A list of options to the parser. This can be used to pass options that are specific for a given parser. Defaults to an empty list. ## Examples iex> Floki.parse_document("hello") {:ok, [{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]} iex> Floki.parse_document("hello", html_parser: Floki.HTMLParser.Mochiweb) {:ok, [{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]} iex> Floki.parse_document( ...> "hello", ...> attributes_as_maps: true, ...> html_parser: Floki.HTMLParser.Mochiweb ...>) {:ok, [{"html", %{}, [{"head", %{}, []}, {"body", %{"class" => "main"}, ["hello"]}]}]} # `parse_document!` ```elixir @spec parse_document!(binary(), Keyword.t()) :: html_tree() ``` Parses a HTML Document from a string. Similar to `Floki.parse_document/1`, but raises `Floki.ParseError` if there was an error parsing the document. ## Example iex> Floki.parse_document!("hello") [{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}] # `parse_fragment` ```elixir @spec parse_fragment(binary(), Keyword.t()) :: {:ok, html_tree()} | {:error, String.t()} ``` Parses an HTML fragment from a string. This is mostly for parsing sections of an HTML document. ## Options * `:attributes_as_maps` - Change the behaviour of the parser to return the attributes as maps, instead of a list of `{"key", "value"}`. Remember that maps are no longer ordered since OTP 26. Default to `false`. * `:html_parser` - The module of the backend that is responsible for parsing the HTML string. By default it is set to the built-in parser, and the module name is equal to `Floki.HTMLParser.Mochiweb`, or from the value of the application env of the same name. See https://github.com/philss/floki#alternative-html-parsers for more details. * `:parser_args` - A list of options to the parser. This can be used to pass options that are specific for a given parser. Defaults to an empty list. # `parse_fragment!` ```elixir @spec parse_fragment!(binary(), Keyword.t()) :: html_tree() ``` Parses a HTML fragment from a string. Similar to `Floki.parse_fragment/1`, but raises `Floki.ParseError` if there was an error parsing the fragment. # `raw_html` ```elixir @spec raw_html( html_tree() | html_node() | binary(), keyword() ) :: binary() ``` Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored. ## Options * `:encode` - A boolean option to control if special HTML characters should be encoded as HTML entities. Defaults to `true`. You can also control the encoding behaviour at the application level via `config :floki, :encode_raw_html, false` * `:pretty` - Controls if the output should be formatted, ignoring breaklines and spaces from the input and putting new ones in order to pretty format the html. Defaults to `false`. ## Examples iex> Floki.raw_html({"div", [{"class", "wrapper"}], ["my content"]}) ~s(

my content

) iex> Floki.raw_html({"div", [{"class", "wrapper"}], ["10 > 5"]}) ~s(

10 > 5

) iex> Floki.raw_html({"div", [{"class", "wrapper"}], ["10 > 5"]}, encode: false) ~s(

10 > 5

) iex> Floki.raw_html({"div", [], ["\n ", {"span", [], "Fully indented"}, " \n"]}, pretty: true) """

Fully indented

""" # `text` ```elixir @spec text(html_tree() | html_node(), Keyword.t()) :: binary() ``` Returns the text nodes from a HTML tree. By default, it will perform a deep search through the HTML tree. You can disable deep search with the option `deep` assigned to false. You can include content of script or style tags by setting the `:js` or `:style` flags, respectively, to true. You can specify a separator between nodes content. ## Options * `:deep` - A boolean option to control how deep the search for text is going to be. If `false`, only the level of the HTML node or the first level of the HTML document is going to be considered. Defaults to `true`. * `:js` - A boolean option to control if the contents of `