View Source DateTimeParser (DateTimeParser v1.2.0)
The biggest ambiguity between datetime formats is whether it's ymd
(year month
day), mdy
(month day year), or dmy
(day month year); this is resolved by
checking if there are slashes or dashes. If slashes, then it will try dmy
first. All other cases will use the international format ymd
. Sometimes, if
the conditions are right, it can even parse dmy
with dashes if the month is a
vocal month (eg, "Jan"
).
If the string consists of only numbers, then we will try two other parsers depending on the number of digits: Epoch or Serial. Otherwise, we'll try the tokenizer.
If the string is 10-11 digits with optional precision, then we'll try to parse it as a Unix Epoch timestamp.
If the string is 1-5 digits with optional precision, then we'll try to parse it as a Serial timestamp (spreadsheet time) treating 1899-12-31 as 1. This will cause Excel-produced dates from 1900-01-01 until 1900-03-01 to be incorrect, as they really are.
digits | parser | range | notes |
---|---|---|---|
1-5 | Serial | low = 1900-01-01 , high = 2173-10-15 . Negative numbers go to 1626-03-17 | Floats indicate time. Integers do not. |
6-9 | Tokenizer | any | This allows for "20190429" to be parsed as 2019-04-29 |
10-11 | Epoch | low = -1100-02-15 14:13:21 , high = 5138-11-16 09:46:39 | If padded with 0s, then it can capture entire range. |
else | Tokenizer | any |
Required reading
Examples
iex> DateTimeParser.parse("19 September 2018 08:15:22 AM")
{:ok, ~N[2018-09-19 08:15:22]}
iex> DateTimeParser.parse_datetime("19 September 2018 08:15:22 AM")
{:ok, ~N[2018-09-19 08:15:22]}
iex> DateTimeParser.parse_datetime("2034-01-13", assume_time: true)
{:ok, ~N[2034-01-13 00:00:00]}
iex> DateTimeParser.parse_datetime("2034-01-13", assume_time: ~T[06:00:00])
{:ok, ~N[2034-01-13 06:00:00]}
iex> DateTimeParser.parse("invalid date 10:30pm")
{:ok, ~T[22:30:00]}
iex> DateTimeParser.parse("2019-03-11T99:99:99")
{:ok, ~D[2019-03-11]}
iex> DateTimeParser.parse("2019-03-11T10:30:00pm UNK")
{:ok, ~N[2019-03-11T22:30:00]}
iex> DateTimeParser.parse("2019-03-11T22:30:00.234+00:00")
{:ok, DateTime.from_naive!(~N[2019-03-11T22:30:00.234Z], "Etc/UTC")}
# `~U[2019-03-11T22:30:00.234Z]` in Elixir 1.9+
iex> DateTimeParser.parse_date("2034-01-13")
{:ok, ~D[2034-01-13]}
iex> DateTimeParser.parse_date("01/01/2017")
{:ok, ~D[2017-01-01]}
iex> DateTimeParser.parse_datetime("1564154204")
{:ok, DateTime.from_naive!(~N[2019-07-26T15:16:44Z], "Etc/UTC")}
# `~U[2019-07-26T15:16:44Z]` in Elixir 1.9+
iex> DateTimeParser.parse_datetime("41261.6013888889")
{:ok, ~N[2012-12-18T14:26:00]}
iex> DateTimeParser.parse_date("44262")
{:ok, ~D[2021-03-07]}
# This is a serial number date, commonly found in spreadsheets, eg: `=VALUE("03/07/2021")`
iex> DateTimeParser.parse_datetime("1/1/18 3:24 PM")
{:ok, ~N[2018-01-01T15:24:00]}
iex> DateTimeParser.parse_datetime("1/1/18 3:24 PM", assume_utc: true)
{:ok, DateTime.from_naive!(~N[2018-01-01T15:24:00Z], "Etc/UTC")}
# `~U[2018-01-01T15:24:00Z]` in Elixir 1.9+
iex> DateTimeParser.parse_datetime(~s|"Mar 28, 2018 7:39:53 AM PDT"|, to_utc: true)
{:ok, DateTime.from_naive!(~N[2018-03-28T14:39:53Z], "Etc/UTC")}
iex> {:ok, datetime} = DateTimeParser.parse_datetime(~s|"Mar 1, 2018 7:39:53 AM PST"|)
iex> datetime
#DateTime<2018-03-01 07:39:53-08:00 PST America/Los_Angeles>
iex> DateTimeParser.parse_datetime(~s|"Mar 1, 2018 7:39:53 AM PST"|, to_utc: true)
{:ok, DateTime.from_naive!(~N[2018-03-01T15:39:53Z], "Etc/UTC")}
iex> {:ok, datetime} = DateTimeParser.parse_datetime(~s|"Mar 28, 2018 7:39:53 AM PDT"|)
iex> datetime
#DateTime<2018-03-28 07:39:53-07:00 PDT America/Los_Angeles>
iex> DateTimeParser.parse_time("10:13pm")
{:ok, ~T[22:13:00]}
iex> DateTimeParser.parse_time("10:13:34")
{:ok, ~T[10:13:34]}
iex> DateTimeParser.parse_time("18:14:21.2.0851000000Z")
{:ok, ~T[18:14:21.2.0851]}
iex> DateTimeParser.parse_datetime(nil)
{:error, "Could not parse nil"}
Installation
Add date_time_parser
to your list of dependencies in mix.exs
:
def deps do
[
{:date_time_parser, "~> 1.2.0"}
]
end
Configuration
You must have a timezone database configured if you want parsing to consider timezones. See tz or tzdata.
# This is the default config
alias DateTimeParser.Parser
config :date_time_parser, parsers: [Parser.Epoch, Parser.Serial, Parser.Tokenizer]
# To enable only specific parsers, include them in the :parsers key.
config :date_time_parser, parsers: [Parser.Tokenizer]
# To consider more timezones from the past at a performance cost:
config :date_time_parser, include_zones_from: ~N[1900-01-01T00:00:00]
# default is 2020-01-01T00:00:00
# Adding the timezone database from Tz
config :elixir, :time_zone_database, Tz.TimeZoneDatabase
# Or in runtime, pass in the parsers in the function.
DateTimeParser.parse(mystring, parsers: [Parser.Tokenizer])
Write your own parser
You can write your own parser!
If the built-in parsers are not applicable for your use-case, you may build your own parser to use with this library. Let's write a simple one together.
First I will check DateTimeParser.Parser
to see what behaviour my new parser
should implement. It needs two functions:
These functions accept the DateTimeParser.Parser.t/0
struct which contains the
options supplied by the user, the string itself, and the context for which you
should return your result. For example, if the context is :time
then you should
return a %Time{}
; if :datetime
you should return either a
%NaiveDateTime{}
or a %DateTime{}
; if :date
then you should return a
%Date{}
.
Let's implement a parser that reads a special time string. Our string will
represent time, but all the digits are shifted up by 10 and must be prefixed
with the secret word: "boomshakalaka:"
. For example, the real world time of
01:10
is represented as boomshakalaka:11:20
in our toy time format. 12:30
is represented as boomshakalaka:22:40
, and 5:55
is represented as
boomshakalaka:15:65
.
defmodule MyParser do
@behaviour DateTimeParser.Parser
@secret_regex ~r|boomshakalaka:(?<time>\d{2}:\d{2})|
def preflight(%{string: string} = parser) do
case Regex.named_captures(@secret_regex, string) do
%{"time" => time} ->
{:ok, %{parser | preflight: time}}
nil ->
{:error, :not_compatible}
end
end
# ... more below
end
We'll stop here first and go through the preflight function. Our special parser
will only be attempted if the supplied string has any named captures from the
regex. That is, it must begin with bookshakalaka:
followed by 2 digits, a
colon, and 2 more digits. These digits are extracted out like 00:00
where 0 is
any digit. If 05:40
is passed in, it would not be compatible so the parser
will be skipped.
Now let's parse the time:
def parse(%{preflight: time} = parser) do
[hour, minute] = String.split(time, ":")
{hour, ""} = Integer.parse(hour)
{minute, ""} = Integer.parse(minute)
result = Time.new(hour - 10, minute - 10, 0, {0, 0})
for_context(parser.context, result)
end
defp for_context(:datetime, _result), do: :error
defp for_context(:date, _result), do: :error
defp for_context(:time, result), do: result
Notice that we need to consider context of the result. If the user asked for a
DateTime, then we need to give them one. In our toy format, it only represents
time, so therefore we must return an error when the context is a :datetime
or
:date
.
DateTimeParser.parse_time("boomshakalaka:11:11", parsers: [MyParser])
#=> {:ok, ~T[01:01:00]}
DateTimeParser.parse_date("boomshakalaka:11:11", parsers: [MyParser])
#=> {:error, "Could not parse \"boomshakalaka:11:11\""}
DateTimeParser.parse_datetime("boomshakalaka:11:11", parsers: [MyParser])
#=> {:error, "Could not parse \"boomshakalaka:11:11\""}
DateTimeParser.parse("boomshakalaka:11:11", parsers: [MyParser])
#=> {:ok, ~T[01:01:00]}
Why aren't timezones recognized?
You might not have a timezone database configured.
You may configure one by using tz or tzdata. Not only should you install it, but you also must configure Elixir to use it.
For example, in a script:
Mix.install([:date_time_parser, :tz])
# :ok
DateTimeParser.parse("2020-02-02 10:00:00 PST")
# {:ok, ~N[2020-02-02 10:00:00]}
Application.put_env(:elixir, :time_zone_database, Tz.TimeZoneDatabase)
# :ok
DateTimeParser.parse("2020-02-02 10:00:00 PST")
# {:ok, #DateTime<2020-02-02 10:00:00-08:00 PST America/Los_Angeles>}
or in a Mix project:
# in mix.exs
defp deps do
[
{:date_time_parser, "1.2"},
{:tz, "~> 0.24"},
]
end
# in config/config.exs
config :elixir, :time_zone_database, Tz.TimeZoneDatabase
# then in code
DateTimeParser.parse("2020-02-02 10:00:00 PST")
#> {:ok, #DateTime<2020-02-02 10:00:00-08:00 PST America/Los_Angeles>}
Should I use this library?
Only as a last resort. Parsing dates from strings is educated guessing at best.
Since Elixir natively supports ISO-8601 parsing (see from_iso8601/2
functions), it's highly recommended that you rely on that first and foremost.
When designing your API that involves dates and strings, be specific with your requirements and supported DateTime strings, and preferably only support ISO-8601 with no exceptions. There is no ambiguity with this format so parsing to DateTime (or Date or Time) will always be correct.
This library is helpful when you must accept ambiguous DateTime string formats and having incorrect results is acceptable. Do not use this library when the resulting (and possibly incorrect) DateTime has catastrophic and dangerous effects in your system.
Summary
Types
Options for parse_date/2
Options for parse_datetime/2
Options for parse/2
.
Options for parse_time/2
.
List of modules that implement the DateTimeParser.Parser
behaviour.
Functions
Parse a %DateTime{}
, %NaiveDateTime{}
, %Date{}
, or %Time{}
from a string.
Parse a %DateTime{}
, %NaiveDateTime{}
, %Date{}
, or %Time{}
from a string. Raises a
DateTimeParser.ParseError
when parsing fails.
Parse %Date{}
from a string.
Parse a %Date{}
from a string. Raises a DateTimeParser.ParseError
when parsing fails.
Parse a %DateTime{}
or %NaiveDateTime{}
from a string.
Parse a %DateTime{}
or %NaiveDateTime{}
from a string. Raises a DateTimeParser.ParseError
when
parsing fails.
Parse %Time{}
from a string. Accepts options parse_time_options/0
Parse %Time{}
from a string. Raises a DateTimeParser.ParseError
when parsing fails.
Types
@type assume_tz_abbreviations() :: {:assume_utc, map()}
@type assume_tz_offsets() :: {:assume_utc, map()}
@type assume_utc() :: {:assume_utc, boolean()}
@type parse_date_options() :: [assume_date() | parsers()]
Options for parse_date/2
:assume_date
Defaultfalse
. If a date cannot be fully determined, then it will not be assumed by default. If you supplytrue
, thenDate.utc_today()
will be assumed. You can also supply your own date, and the found tokens will be merged with it.
@type parse_datetime_options() :: [ assume_utc() | to_utc() | assume_time() | use_1904_date_system() | parsers() | assume_tz_offsets() | assume_tz_abbreviations() ]
Options for parse_datetime/2
:assume_utc
Defaultfalse
. Only applicable for strings where parsing could not determine a timezone. Instead of returning a NaiveDateTime, this option will assume them to be in UTC timezone, and therefore return a DateTime. If the timezone is determined, then it will continue to be returned in the original timezone. Seeto_utc
option to also convert it to UTC.:to_utc
Defaultfalse
. If there's a timezone detected in the string, then attempt to convert to UTC timezone. If you know that your timestamps are in the future and are going to store it for later use, it may be better to convert to UTC and keep the original timestamp since government organizations may change timezone rules before the timestamp elapses, therefore making the UTC timestamp wrong or invalid. Check out the guide on future timestamps.:assume_time
Defaultfalse
. If a time cannot be determined, then it will not be assumed by default. If you supplytrue
, then~T[00:00:00]
will be assumed. You can also supply your own time, and the found tokens will be merged with it.:use_1904_date_system
Defaultfalse
. For Serial timestamps, the parser will use the 1900 Date System by default. If you supplytrue
, then the 1904 Date System will be used to parse the timestamp.:assume_tz_offsets
. Timezones may be expressed as time offsets, eg "-0400" to represent 4 hours before GMT. The default assumption for these offsets is to map them directly to "Etc/GMT-4" when possible. However, you may provide your own assumptions. To see the default offets, seeDateTimeParser.TimezoneAbbreviations.default_offsets
.:assume_tz_abbreviations
. Timezones may be expressed as timezone abbreviations, eg "EST" can represent Eastern Standard Time. The default assumptions for these abbreviations is to map them to the most likely timezone. However, you may prefer a different set of timezone assumptions, eg: "CST" means US Central Standard Time by default, but you may want it to mean China Standard Time; you would provide:%{"CST" => "Asia/Shanghai"}
To see the default abbreviations, seeDateTimeParser.TimezoneAbbreviations.default_abbreviations
.:parsers
The parsers to use when analyzing the string. WhenParser.Tokenizer
, the appropriate tokenizer will be used depending on the function used and conditions found in the string. Order matters and determines the order in which parsers are attempted. These are the available built-in parsers:DateTimeParser.Parser.Epoch
DateTimeParser.Parser.Serial
DateTimeParser.Parser.Tokenizer
DateTimeParser.Parser.DateTime
DateTimeParser.Parser.DateTimeUS
DateTimeParser.Parser.Date
DateTimeParser.Parser.DateUS
DateTimeParser.Parser.Time
This is the default in this order:
@type parse_options() :: parse_datetime_options() | parse_date_options() | parse_time_options()
Options for parse/2
.
Combination of parse_date_options/0
and parse_datetime_options/0
and
parse_time_options/0
@type parse_time_options() :: [parsers()]
Options for parse_time/2
.
See parse_datetime_options/0
for further definition.
@type parsers() :: {:parsers, [atom()]}
List of modules that implement the DateTimeParser.Parser
behaviour.
@type to_utc() :: {:to_utc, boolean()}
@type use_1904_date_system() :: {:use_1904_date_system, boolean()}
Functions
@spec parse(String.t() | nil, parse_options()) :: {:ok, DateTime.t() | NaiveDateTime.t() | Date.t() | Time.t()} | {:error, String.t()}
Parse a %DateTime{}
, %NaiveDateTime{}
, %Date{}
, or %Time{}
from a string.
Accepts parse_options/0
@spec parse!(String.t() | nil, parse_options()) :: DateTime.t() | NaiveDateTime.t() | Date.t() | Time.t() | no_return()
Parse a %DateTime{}
, %NaiveDateTime{}
, %Date{}
, or %Time{}
from a string. Raises a
DateTimeParser.ParseError
when parsing fails.
Accepts parse_options/0
.
@spec parse_date(String.t() | nil, parse_date_options()) :: {:ok, Date.t()} | {:error, String.t()}
Parse %Date{}
from a string.
Accepts options parse_date_options/0
@spec parse_date!(String.t() | nil, parse_datetime_options()) :: Date.t() | no_return()
Parse a %Date{}
from a string. Raises a DateTimeParser.ParseError
when parsing fails.
Accepts options parse_date_options/0
.
@spec parse_datetime(String.t() | nil, parse_datetime_options()) :: {:ok, DateTime.t() | NaiveDateTime.t()} | {:error, String.t()}
Parse a %DateTime{}
or %NaiveDateTime{}
from a string.
Accepts options parse_datetime_options/0
@spec parse_datetime!(String.t() | nil, parse_datetime_options()) :: DateTime.t() | NaiveDateTime.t() | no_return()
Parse a %DateTime{}
or %NaiveDateTime{}
from a string. Raises a DateTimeParser.ParseError
when
parsing fails.
Accepts options parse_datetime_options/0
.
@spec parse_time(String.t() | nil, parse_time_options()) :: {:ok, Time.t()} | {:error, String.t()}
Parse %Time{}
from a string. Accepts options parse_time_options/0
@spec parse_time!(String.t() | nil, parse_time_options()) :: Time.t() | no_return()
Parse %Time{}
from a string. Raises a DateTimeParser.ParseError
when parsing fails.