View Source DataSchema (data_schema v0.5.0)
Data schemas are declarative descriptions of how to create a struct from some input data. You can set up different schemas to handle different kinds of input data. By default we assume the incoming data is a map, but you can configure schemas to work with any arbitrary data input including XML and json.
Data Schemas really shine when working with API responses - converting the response into trusted internal data easily and efficiently.
This library has no dependencies.
Creating A Simple Schema
Let's think of creating a struct as taking some source data and turning it into the desired struct. To do this we need to know at least three things:
- The keys of the desired struct
- The types of the values for each of the keys
- Where / how to get the data for each value from the source data.
Turning the source data into the correct type defined by the schema will often require casting, so to cater for that the type definitions are casting functions. Let's look at a simple field example
field {:content, "text", &cast_string/1}
# ^ ^ ^
# struct field | |
# path to data in the source |
# casting function
This says in the source data there will be a field called "text"
. When creating a struct we should get the data under that field and pass it too cast_string/1
. The result of that function will be put in the resultant struct under the key :content
.
There are 5 kinds of struct fields we could want:
field
- The value will be a casted value from the source data.list_of
- The value will be a list of casted values created from the source data.has_one
- The value will be created from a nested data schema (so will be a struct)has_many
- The value will be created by casting a list of values into a data schema. (You end up with a list of structs defined by the provided schema). Similar to has_many in ectoaggregate
- The value will be a casted value formed from multiple bits of data in the source.
Available options are:
:optional?
- specifies whether or not the field in the struct should be included in the@enforce_keys
for the struct. By default all fields are required but you can mark them as optional by setting this totrue
. This will also be checked when creating a struct withDataSchema.to_struct/2
returning an error if the required field is null.:empty_values
- allows you to define what values should be used as "empty" for a given field. If either the value returned from the data accessor or the casted value are equivalent to any element in this list, that field is deemed to be empty. Defaults to[nil]
, meaning nil is always considered "empty".:default
- specifies a 0 arity function that will be called to produce a default value for a field when casting. This function will only be called if a field is found to be empty AND optional. If it's empty and not optional we will error.
For example:
defmodule Sandwich do
require DataSchema
DataSchema.data_schema([
field: {:type, "the_type", &{:ok, String.upcase(&1)}, optional?: true, empty_values: [nil]},
])
end
And:
defmodule Sandwich do
require DataSchema
DataSchema.data_schema([
field: {:list, "list", &{:ok, &1}, optional?: true, empty_values: [[]]},
])
end
And:
defmodule Sandwich do
require DataSchema
@options [optional?: true, empty_values: [nil], default: &DateTime.utc_now/0]
DataSchema.data_schema([
field: {:created_at, "inserted_at", &{:ok, &1}, @options},
])
end
To see this better let's look at a very simple example. Assume our input data looks like this:
source_data = %{
"content" => "This is a blog post",
"comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}],
"draft" => %{"content" => "This is a draft blog post"},
"date" => "2021-11-11",
"time" => "14:00:00",
"metadata" => %{ "rating" => 0}
}
And now let's assume the struct we wish to make is this one:
%BlogPost{
content: "This is a blog post",
comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}],
draft: %DraftPost{content: "This is a draft blog post"},
post_datetime: ~N[2020-11-11 14:00:00]
}
We can describe the following schemas to enable this:
defmodule DraftPost do
import DataSchema, only: [data_schema: 1]
data_schema([
field: {:content, "content", &{:ok, to_string(&1)}}
])
end
defmodule Comment do
import DataSchema, only: [data_schema: 1]
data_schema([
field: {:text, "text", &{:ok, to_string(&1)}}
])
end
defmodule BlogPost do
import DataSchema, only: [data_schema: 1]
@date_time_fields [
field: {:date, "date", {Date, :from_iso8601, []}},
field: {:time, "time", &Time.from_iso8601/1}
]
data_schema([
field: {:content, "content", &{:ok, to_string(&1)}},
has_many: {:comments, "comments", Comment},
has_one: {:draft, "draft", DraftPost},
aggregate: {:post_datetime, @date_time_fields, &NaiveDateTime.new(&1.date, &1.time)},
])
end
Then to transform some input data into the desired struct we can call DataSchema.to_struct/2
which works recursively to transform the input data into the struct defined by the schema.
source_data = %{
"content" => "This is a blog post",
"comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}],
"draft" => %{"content" => "This is a draft blog post"},
"date" => "2021-11-11",
"time" => "14:00:00",
"metadata" => %{ "rating" => 0}
}
DataSchema.to_struct(source_data, BlogPost)
# This will output the following:
%BlogPost{
content: "This is a blog post",
comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}],
draft: %DraftPost{content: "This is a draft blog post"},
post_datetime: ~N[2020-11-11 14:00:00]
}
So imagine the input data came from an API response:
with {:ok, %{status: 200, response_body: body}} <- Http.get("https://www.my_thing.example.com"),
{:ok, decoded} <- Jason.decode(body) do
DataSchema.to_struct(source_data, BlogPost)
end
Different Source Data Types
As we mentioned before we want to be able to handle multiple different kinds of source data in our schemas. For each type of source data we want to be able to specify how you access the data for each field type. We do that by providing a "data accessor" (a module that implements the DataSchema.DataAccessBehaviour
) when we create the schema. We do this by providing a @data_accessor
on the schema. By default if you do not provide this module attribute we use DataSchema.MapAccessor
. That means the above example is equivalent to doing the following:
defmodule DraftPost do
import DataSchema, only: [data_schema: 1]
@data_accessor DataSchema.MapAccessor
data_schema([
field: {:content, "content", &{:ok, to_string(&1)}}
])
end
defmodule Comment do
import DataSchema, only: [data_schema: 1]
@data_accessor DataSchema.MapAccessor
data_schema([
field: {:text, "text", &{:ok, to_string(&1)}}
])
end
defmodule BlogPost do
import DataSchema, only: [data_schema: 1]
@data_accessor DataSchema.MapAccessor
@date_time_fields [
field: {:date, "date", &Date.from_iso8601/1},
field: {:time, "time", &Time.from_iso8601/1}
]
data_schema([
field: {:content, "content", &{:ok, to_string(&1)}},
has_many: {:comments, "comments", Comment},
has_one: {:draft, "draft", DraftPost},
aggregate: {:post_datetime, @date_time_fields, &NaiveDateTime.new(&1.date, &1.time)},
])
end
When creating the struct DataSchema will call the relevant function for the field we are creating, passing it the source data and the path to the value(s) in the source. Our DataSchema.MapAccessor
looks like this:
defmodule DataSchema.MapAccessor do
@behaviour DataSchema.DataAccessBehaviour
@impl true
def field(data = %{}, field) do
Map.get(data, field)
end
@impl true
def list_of(data = %{}, field) do
Map.get(data, field)
end
@impl true
def has_one(data = %{}, field) do
Map.get(data, field)
end
@impl true
def has_many(data = %{}, field) do
Map.get(data, field)
end
end
To save repeating @data_accessor DataSchema.MapAccessor
on all of your schemas you could use a __using__
macro like so:
defmodule MapSchema do
defmacro __using__(_) do
quote do
import DataSchema, only: [data_schema: 1]
@data_accessor DataSchema.MapAccessor
end
end
end
Then use it like so:
defmodule DraftPost do
use MapSchema
data_schema([
field: {:content, "content", &{:ok, to_string(&1)}}
])
end
This means should we want to change how we access data (say we wanted to use Map.fetch!
instead of Map.get
) we would only need to change the accessor used in one place - inside the __using__
macro. It also gives you a handy place to provide other functions for the structs that get created, perhaps implementing a default Inspect protocol implementation for example:
defmodule MapSchema do
defmacro __using__(opts) do
skip_inspect_impl = Keyword.get(opts, :skip_inspect_impl, false)
quote bind_quoted: [skip_inspect_impl: skip_inspect_impl] do
import DataSchema, only: [data_schema: 1]
@data_accessor DataSchema.MapAccessor
unless skip_inspect_impl do
defimpl Inspect do
def inspect(struct, _opts) do
"<" <> "#{struct.__struct__}" <> ">"
end
end
end
end
end
end
This could help ensure you never log sensitive fields by requiring you to explicitly implement an inspect protocol for a struct in order to see the fields in it.
XML Schemas
Now let's imagine instead that our source data was XML. What would it require to enable that? First a new Xpath data accessor:
defmodule XpathAccessor do
@behaviour DataSchema.DataAccessBehaviour
import SweetXml, only: [sigil_x: 2]
@impl true
def field(data, path) do
SweetXml.xpath(data, ~x"#{path}"s)
end
@impl true
def list_of(data, path) do
SweetXml.xpath(data, ~x"#{path}"l)
end
@impl true
def has_one(data, path) do
SweetXml.xpath(data, ~x"#{path}")
end
@impl true
def has_many(data, path) do
SweetXml.xpath(data, ~x"#{path}"l)
end
end
As we can see our accessor uses the library Sweet XML to access the XML. That means if we wanted to change the library later we would only need to alter this one module for all of our schemas to benefit from the change.
Our source data looks like this:
source_data = """
<Blog date="2021-11-11" time="14:00:00">
<Content>This is a blog post</Content>
<Comments>
<Comment>This is a comment</Comment>
<Comment>This is another comment</Comment>
</Comments>
<Draft>
<Content>This is a draft blog post</Content>
</Draft>
</Blog>
"""
Let's define our schemas like so:
defmodule DraftPost do
import DataSchema, only: [data_schema: 1]
@data_accessor XpathAccessor
data_schema([
field: {:content, "./Content/text()", &{:ok, to_string(&1)}}
])
end
defmodule Comment do
import DataSchema, only: [data_schema: 1]
@data_accessor XpathAccessor
data_schema([
field: {:text, "./text()", &{:ok, to_string(&1)}}
])
end
defmodule BlogPost do
import DataSchema, only: [data_schema: 1]
@data_accessor XpathAccessor
@datetime_fields [
field: {:date, "/Blog/@date", &Date.from_iso8601/1},
field: {:time, "/Blog/@time", &Time.from_iso8601/1},
]
data_schema([
field: {:content, "/Blog/Content/text()", &{:ok, to_string(&1)}},
has_many: {:comments, "//Comment", Comment},
has_one: {:draft, "/Blog/Draft", DraftPost},
aggregate: {:post_datetime, @datetime_fields, &NaiveDateTime.new(&1.date, &1.time)},
])
end
And now we can transform:
source_data = """
<Blog date="2021-11-11" time="14:00:00">
<Content>This is a blog post</Content>
<Comments>
<Comment>This is a comment</Comment>
<Comment>This is another comment</Comment>
</Comments>
<Draft>
<Content>This is a draft blog post</Content>
</Draft>
</Blog>
"""
DataSchema.to_struct(source_data, BlogPost)
# This will output:
%BlogPost{
comments: [
%Comment{text: "This is a comment"},
%Comment{text: "This is another comment"}
],
content: "This is a blog post",
draft: %DraftPost{content: "This is a draft blog post"},
post_datetime: ~N[2021-11-11 14:00:00]
}
Link to this section Summary
Functions
A macro that creates a data schema. By default all struct fields are required but you can specify that a field be optional by passing the correct option in. See the Options section below for more.
Accepts a the module of a compile time schema and will expand it into a runtime schema recursively. This can be useful for tooling around generating schemas or for schema reflection.
Accepts an data schema module and some source data and attempts to create the struct defined in the schema from the source data recursively.
Creates a struct or map from the provided arguments. This function can be used to define
runtime schemas for the most dynamic of cases. This means you don't have to define a schema
at compile time using the DataShema.data_schema/1
macro.
Link to this section Functions
A macro that creates a data schema. By default all struct fields are required but you can specify that a field be optional by passing the correct option in. See the Options section below for more.
Field Types
There are 5 kinds of struct fields we can have:
field
- The value will be a casted value from the source data.list_of
- The value will be a list of casted values created from the source data.has_one
- The value will be created from a nested data schema (so will be a struct)has_many
- The value will be created by casting a list of values into a data schema. (You end up with a list of structs defined by the provided schema). Similar to has_many in ectoaggregate
- The value will a casted value formed from multiple bits of data in the source.
Options
Available options are:
:optional?
- specifies whether or not the field in the struct should be included in the@enforce_keys
for the struct. By default all fields are required but you can mark them as optional by setting this totrue
. This will also be checked when creating a struct withDataSchema.to_struct/2
returning an error if the required field is null.:empty_values
- allows you to define what values should be used as "empty" for a given field. If either the value returned from the data accessor or the casted value are equivalent to any element in this list, that field is deemed to be empty. Defaults to[nil]
.:default
- specifies a 0 arity function that will be called to produce a default value for a field when casting. This function will only be called if a field is found to be empty AND optional. If it's empty and not optional we will error.
For example:
defmodule Sandwich do
require DataSchema
DataSchema.data_schema([
field: {:type, "the_type", &{:ok, String.upcase(&1)}, optional?: true},
])
end
And:
defmodule Sandwich do
require DataSchema
DataSchema.data_schema([
field: {:list, "list", &{:ok, &1}, optional?: true, empty_values: [[]]},
])
end
And:
defmodule Sandwich do
require DataSchema
@options [optional?: true, empty_values: [nil], default: &DateTime.utc_now/0]
DataSchema.data_schema([
field: {:created_at, "inserted_at", &{:ok, &1}, @options},
])
end
Examples
See the guides for more in depth examples but below you can see how we create a schema that will take a map of data and create a struct out of it. Given the following schema:
defmodule Sandwich do
require DataSchema
DataSchema.data_schema([
field: {:type, "the_type", &{:ok, String.upcase.(&1)}},
list_of: {:fillings, "the_fillings", &({:ok, String.downcase(&1["name"])})}
])
end
input_data = %{
"the_type" => "fake steak",
"the_fillings" => [
%{"name" => "fake stake", "good?" => true},
%{"name" => "SAUCE"},
%{"name" => "sweetcorn"},
]
}
DataSchema.to_struct(input_data, Sandwich)
# outputs the following:
%Sandwich{
type: "FAKE STEAK",
fillings: ["fake stake", "sauce", "sweetcorn"],
}
Accepts a the module of a compile time schema and will expand it into a runtime schema recursively. This can be useful for tooling around generating schemas or for schema reflection.
Accepts an data schema module and some source data and attempts to create the struct defined in the schema from the source data recursively.
We essentially visit each field in the schema and extract the data the field points to from the source data, passing it to the field's casting function before setting the result of that as the value on the struct.
This function takes a simple approach to creating the struct - whatever you return from a casting function will be set as the value of the struct field. You should raise if you want casting to fail.
Examples
data = %{ "spice" => "enables space travel" }
defmodule Foo do
require DataSchema
DataSchema.data_schema(
field: {:a_rocket, "spice", &({:ok, &1})}
)
end
DataSchema.to_struct(data, Foo)
# => Outputs the following:
%Foo{a_rocket: "enables space travel"}
Creates a struct or map from the provided arguments. This function can be used to define
runtime schemas for the most dynamic of cases. This means you don't have to define a schema
at compile time using the DataShema.data_schema/1
macro.
Examples
Creating a struct:
defmodule Run do
defstruct [:time]
end
input = %{"time" => "10:00"}
fields = [
field: {:time, "time", &{:ok, to_string(&1)}}
]
DataSchema.to_struct(input, Run, fields, DataSchema.MapAccessor)
{:ok, %Run{time: "10:00"}}
Creating a map:
input = %{"time" => "10:00"}
fields = [
field: {:time, "time", &{:ok, to_string(&1)}}
]
DataSchema.to_struct(input, %{}, fields, DataSchema.MapAccessor)
{:ok, %{time: "10:00"}}