View Source Custom Parsers
how-to-add-custom-datetime-parsers-to-timex
How to add custom DateTime parsers to Timex
It is unlikely you will need to write a custom parser for Timex, but should you be in such a position, you can easily plug in your own without much trouble.
getting-started
Getting Started
In brief, all you need to know is the following:
- Extend the
Tokenizer
behavior, by addinguse Timex.Parse.DateTime.Tokenizer
to the top of your module. - Implement
tokenize/1
callback. - Implement
apply/3
callback.
What we are doing with the above is implementing a tokenizer for the format strings your custom parser will use. Incidentally this also is a prerequisite for implementing a custom formatter (you need to tokenize the format strings). Your best reference is to look at the two built-in tokenizers in Timex, as they are robust and complete implementations, but for the sake of an arbitrary example, let's walk through implementing a very simple parser for humanized strings like "5 days before the fifth of July, 2015", where the following tokens are allowed:
"{shift}" which should be in the form of "<integer> <unit> <before | after>"
- "{day}" of the form "first", "second", "third", etc.
- "{month}" which is the full name of a month, i.e. "July"
- "{year}" which is the full four digit year
tokenizer-implementation
Tokenizer Implementation
implementing-the-humanized-tokenizer
Implementing the Humanized tokenizer
We start by defining our empty module:
defmodule MyApp.DateTimeTokenizers.Humanized do
use Timex.Parse.DateTime.Tokenizer
end
Compiling with this will produce the following errors:
../humanized.ex:1: warning: undefined behaviour function tokenize/1 (for behaviour Timex.Parse.DateTime.Tokenizer)
../humanized.ex:1: warning: undefined behaviour function apply/3 (for behaviour Timex.Parse.DateTime.Tokenizer)
So we need to implement the tokenize
function which takes a format string and produces a list of Directive
structs (or {:error, term}
). The following implementation makes use of Combine, a dependency pulled in by Timex for parsing tasks, and while you do not need to implement your tokenizer using Combine, the parser function given to each Directive must take a single argument of %Combine.ParserState{}
, and return it, updated with the status and results from parsing. See the Combine repo for examples on how to implement these parsers (it is really rather trivial). My recommendation is to simply use Combine, as it is well suited for these tasks, but now you know how to work around it if so desired.
defmodule MyApp.DateTimeTokenizers.Humanized do
use Timex.Parse.DateTime.Tokenizer
use Combine
@days [
"first", "second", "third", "fourth", "fifth",
"sixth", "seventh", "eighth", "ninth", "tenth",
"eleventh", "twelfth", "thirteenth", "fourteenth", "fifteenth",
"sixteenth", "seventeenth", "eighteenth", "nineteenth", "twentieth",
"twenty-first", "twenty-second", "twenty-third", "twenty-fourth", "twenty-fifth",
"twenty-sixth", "twenty-seventh", "twenty-eighth", "twenty-ninth", "thirtieth",
"thirty-first"
]
def tokenize(s) do
case Combine.parse(s, parser) do
results when is_list(results) ->
directives = results |> List.flatten |> Enum.filter(fn x -> x !== nil end)
case Enum.any?(directives, fn %Directive{type: type} -> type != :literal end) do
false -> {:error, "Invalid format string, must contain at least one directive."}
true -> {:ok, directives}
end
{:error, _} = err -> err
end
end
# Token parser
defp parser do
many1(choice([
between(char(?{), map(one_of(word, ["shift", "day", "month", "year"]), &map_directive/1), char(?})),
map(none_of(char, ["{", "}"]), &map_literal/1)
]))
end
# Gets/builds the Directives for a given token
defp map_directive("year"), do: Directive.get(:year4, "year")
defp map_directive("month"), do: Directive.get(:mfull, "month")
defp map_directive("day"),
do: %Directive{type: :oday_phonetic, value: "day", parser: oday_phoenetic_parser()}
defp map_directive("shift"),
do: %Directive{type: :date_shift, value: "shift", parser: date_shift_parser(), weight: 99}
# Generates directives for literals
defp map_literal([]), do: nil
defp map_literal(literals)
when is_list(literals), do: Enum.map(literals, &map_literal/1)
defp map_literal(literal), do: %Directive{type: :literal, value: literal, parser: char(literal)}
# Parses a phonetic ordinal day string, i.e. third
defp oday_phoenetic_parser() do
map(one_of(word_of(~r/[\w\-]/), @days), fn day -> [day: to_day(day)] end)
end
# Parses a date shift expression, i.e. 3 days after
defp date_shift_parser() do
map(sequence([
integer,
skip(spaces),
one_of(word, ["seconds", "minutes", "hours", "days", "weeks", "months", "years"]),
skip(spaces),
one_of(word, ["before", "after"])
]), fn
[n, shift, "before"] -> [date_shift: [{to_shift(shift), -n}]]
[n, shift, "after"] -> [date_shift: [{to_shift(shift), n}]]
end)
end
defp to_shift(shift), do: String.to_atom(shift)
# Get the ordinal day value based on the ordinal day name
defp to_day(name), do: Enum.find_index(@days, fn (n) -> n == name end) + 1
end
implementation-notes
Implementation Notes
A couple of things to notice.
- Many parsing directives are already built in to Timex, you can see which ones exist by looking at the
Directive
module. Rather than re-implement parsing of month names and 4 digit years, we're usingDirective.get
to pull the predefined Directives for those. - However we also have two custom directives to handle the phonetic ordinal names and the date shift expressions, we're defining those directives (and their associated parsers) by hand. This is also the reason why you must implement
apply/3
. The built-in directives are applied via the parser, but custom tokens have to be applied by the tokenizer. I could have split out the tokenize and apply functions into two behaviours (say Tokenizer and Parser), but rather than force that upon you, this is a decision you can make in your own tokenizer (by using defdelegate to keep yourapply/3
code separated). - We're creating Directive structs for literal characters. This is important both for parsing and formatting, as it allows your parser to ignore context (i.e. skipping spaces, etc.) and focus on parsing the precise thing it needs to parse. When formatting it's also important, as it makes sure that we output a string which matches the format string precisely.
- The parsers for the directives return a keyword list of
token: value
as their result. This is a requirement, as the parser will take the input string and parse out a flattened list of{token, value}
tuples. If your directive parsers do not produce values in this form, they will be ignored, and thus your parser will not work properly (and will likely produce an error). - We're setting the
weight
key of the Directive for the shift expression. This will ensure that it is applied last. Consider the input string "3 days after July fourth, 2015", if we try to apply the shift expression to the DateTime we get inapply/3
(which starts at0/1/1T00:00:00
), we will get an error for trying to shift the date out of the gregorian calendar, when really we want to apply the shift to the date specified later in the input string. By weighting the directive to be last, we will first apply the month, then the day, then the year to the initial DateTime, then apply the shift to the date we actually wanted it applied to. You can set the weight for all your directives to have them applied in a specific order, so keep it in mind for situations like this.
We are left now with the responsibility of implementing apply/3
, leaving our tokenizer implementation looking like the following:
defmodule MyApp.DateTimeTokenizers.Humanized do
use Timex.Parse.DateTime.Tokenizer
use Combine
alias Timex.Date
...snip...
@doc """
Applies a token to the DateTime representing the current input string
Only unrecognized tokens are applied via this function, standard tokens,
such as :year4 will be handled by the parser itself.
You can return {:ok, date}, {:error, reason}, or :unrecognized (if you don't
know what to do with the provided token).
"""
def apply(%DateTime{} = date, token, value) do
case token do
:oday_phonetic ->
{:ok, %{date | :day => value}}
:date_shift ->
case value do
[{shift, n}] when is_integer(n) ->
{:ok, Timex.shift(date, [{shift, n}])}
shift ->
{:error, "Unrecognized shift operation: #{Macro.to_string(shift)}"}
end
_ ->
{:error, "Unrecognized token: #{token}."}
end
end
...snip...
end
usage
Usage
After all this, we're now ready to use our custom parser!
> alias MyApp.DateTimeTokenizers.Humanized
> phrase = "3 days before the second of July, 2015"
> format = "{shift} the {day} of {month}, {year}"
> Timex.parse!(phrase, format, Humanized)
#<DateTime(2015-07-01T00:00:00Z Etc/UTC)>