Microdata v0.2.4 Microdata View Source
Microdata
is an Elixir library for parsing microdata from a provided document.
Dependencies
Meeseeks + Rust
Microdata parses HTML with Meeseeks, which depends on html5ever via meeseeks_html5ever.
Because html5ever is a Rust library, you will need to have the Rust compiler installed.
This dependency is necessary because there are no HTML5 spec compliant parsers written in Elixir/Erlang.
HTTPoison
If you are using the provided Microdata.parse(url: ...)
helper function, your library / application will need to declare a dep on HTTPoison (see below).
Installation
- Ensure your build machine has the Rust compiler installed (see above)
Add
microdata
to yourmix.exs
deps- If you plan to use the
Microdata.parse(url: ...)
helper function, include a line for{:httpoison, "~> 1.0"}
- If you plan to use the
def deps do
[
{:microdata, "~> 0.1.0"},
{:httpoison, "~> 1.0"} # optional
]
end
- Run
mix deps.get
Usage
Available on HexDocs. TL;DR:
Microdata.parse(html_text)
, if you’ve already fetched / read your HTMLMicrodata.parse(file: "path_to_file.html")
, if you’re reading from fileMicrodata.parse(url: "https://website.com/path/to/page")
, if you’d like to fetch & parse- Uses
HTTPoison ~> 1.0
under the hood; this is an optional dep so you’ll want to add it to yourmix.exs
deps as well (see above)
- Uses
It should be noted that even though the library will find and read JSON-LD in an HTML page’s <script>
tags, it will
not process JSON-LD returned as the body of an HTTP response. Passing a JSON-LD string as text will likewise not
parse. Patches to add such functionality are welcome!
Configuration
In your config.exs
you can can set the value of {:microdata, :strategies}
to a list of modules to consult (in order)
when looking for microdata content. Modules must conform to Microdata.Strategy
. By default, the Microdata library uses, in order:
Microdata.Strategy.HTMLMicroformat
- Looks for microdata in HTML tagsMicrodata.Strategy.JSONLD
- Looks for microdata in JSON-LD script tags
Roadmap
- Community contribs would be appreciated to add
itemref
support :)
Helpful Links
Credits
Thanks muchly to the team + community behind meeseeks, particularly @mischov, for the support and fixes on esoteric XPath issues.
An Invitation
Next time you’re cooking, don’t risk getting raw chicken juice or sticky sauces on your fancy cookbooks and expensive electronics! We are working on Connie, a conversational cooking assistant that uses Alexa & Google Home to answer questions like:
What am I supposed to be doing?
What’s next for the lasagna?
We wrote this lib to parse imported recipes and wanted to share it back with the community, as there are loads of ways you might use microdata in your own projects. Hope you enjoy!
If you’d like to join our private beta, please send an email to hi [AT] cookformom [DOT] com, letting us know:
- Which voice assistant you use;
- Your favourite meal; and
- What you want to learn to cook next.
Have a nice day :)
Link to this section Summary
Functions
Parses Microdata from a given document, and returns a %Microdata.Document{} struct
Link to this section Functions
parse([{:file, String.t()}]) :: {:ok, Microdata.Document.t()} | {:error, Microdata.Error.t()}
parse([{:url, String.t()}]) :: {:ok, Microdata.Document.t()} | {:error, Microdata.Error.t()}
parse(String.t()) :: {:ok, Microdata.Document.t()} | {:error, Microdata.Error.t()}
Parses Microdata from a given document, and returns a %Microdata.Document{} struct.
Examples (n.b. tested manually; not a doctest!)
iex> Microdata.parse("<html itemscope itemtype='foo'><body><p itemprop='bar'>baz</p></body></html>")
{:ok,
%Microdata.Document{
items: [
%Microdata.Item{
types: ["foo"],
properties: [
%Microdata.Property{
id: nil,
properties: [
%Microdata.Property{
names: ["bar"],
value: "baz"
}
],
}
],
types: ["foo"]
}
]
}
}
iex> Microdata.parse(file: "path/to/file.html")
{:ok, %Microdata.Document{...}}
iex> Microdata.parse(url: "https://website.com/path/to/page")
{:ok, %Microdata.Document{...}}
parse(String.t(), [{:base_uri, String.t()}]) :: {:ok, Microdata.Document.t()} | {:error, Microdata.Error.t()}