NimbleCSV v0.5.0 NimbleCSV behaviour View Source
NimbleCSV is a small and fast parsing and dumping library.
It works by building highly-inlined CSV parsers, designed to work with strings, enumerables and streams. At the top of your file (and not inside a function), you can define your own parser module:
NimbleCSV.define(MyParser, separator: "\t", escape: "\"")
Once defined, we can parse data accordingly:
iex> MyParser.parse_string "name\tage\njohn\t27"
[["john","27"]]
See the define/2
function for the list of functions that
would be defined in MyParser
.
Parsing
NimbleCSV is by definition restricted in scope to do only parsing (and dumping). For example, the example above discarded the headers when parsing the string, as NimbleCSV expects developers to handle those explicitly later. For example:
"name\tage\njohn\t27"
|> MyParser.parse_string
|> Enum.map(fn [name, age] ->
%{name: name, age: String.to_integer(age)}
end)
This is particularly useful with the parse_stream functionality that receives and returns a stream. For example, we can use it to parse files line by line lazily:
"path/to/csv/file"
|> File.stream!(read_ahead: 100_000)
|> MyParser.parse_stream
|> Stream.map(fn [name, age] ->
%{name: name, age: String.to_integer(age)}
end)
By default this library ships with NimbleCSV.RFC4180
, which
is the most common implementation of CSV parsing/dumping available
using comma as separators and double-quote as escape. If you
want to use it in your codebase, simply alias it to CSV and enjoy:
iex> alias NimbleCSV.RFC4180, as: CSV
iex> CSV.parse_string "name,age\njohn,27"
[["john","27"]]
Binary references
One of the reasons behind NimbleCSV performance is that it performs parsing by matching on binaries and extracting those fields as binary references. Therefore if you have a row such as:
one,two,three,four,five
NimbleCSV will return a list of ["one", "two", "three", "four", "five"]
where each element references the original row. For this reason, if
you plan to keep the parsed data around in the parsing process or even
send it to another process, you may want to copy the data before doing
the transfer.
For example, in the parse_stream
example in the previous section,
we could rewrite the Stream.map/2
operation to explicitly copy any
field that is stored as a binary:
"path/to/csv/file"
|> File.stream!(read_ahead: 100_000)
|> MyParser.parse_stream
|> Stream.map(fn [name, age] ->
%{name: :binary.copy(name),
age: String.to_integer(age)}
end)
Dumping
NimbleCSV can dump any enumerable to either iodata or to streams:
iex> IO.iodata_to_binary MyParser.dump_to_iodata([~w(name age), ~w(mary 28)])
"name\tage\nmary\t28\n"
iex> MyParser.dump_to_stream([~w(name age), ~w(mary 28)])
#Stream<...>
Link to this section Summary
Functions
Defines a new parser/dumper
Callbacks
Eagerly dumps an enumerable into iodata (a list of binaries and bytes and other lists)
Lazily dumps from an enumerable to a stream
Same as parse_enumerable(enumerable, [])
Eagerly parses CSV from an enumerable and returns a list of rows
Same as parse_stream(enumerable, [])
Lazily parses CSV from a stream and returns a stream of rows
Same as parse_string(enumerable, [])
Eagerly parses CSV from a string and returns a list of rows
Link to this section Functions
Defines a new parser/dumper.
Calling this function defines a CSV module. Therefore, define
is typically invoked at the top of your files and not inside
functions. Placing it inside a function would cause the same
module to be defined multiple times, one time per invocation,
leading your code to emit warnings and slowing down execution.
It accepts the following options:
:moduledoc
- the documentation for the generated module
The following options control parsing:
:escape
- the CSV escape, defaults to"\""
:separator
- the CSV separators, defaults to","
. It can be a string or a list of strings. If a list is given, the first entry is used for dumping (see below):newlines
- the list of entries to be considered newlines when parsing, defaults to["\r\n", "\n"]
(note they are attempted in order, so the order matters)
The following options control dumping:
:escape
- the CSV escape character, defaults to"\""
:separator
- the CSV separator character, defaults to","
:line_separator
- the CSV line separator character, defaults to"\n"
:reserved
- the list of characters to be escaped, it defaults to the:separator
,:line_separator
and:escape
characters above.
Although parsing may support multiple newline delimiters, when
dumping only one of them must be picked, which is controlled by
the :line_separator
option. This allows NimbleCSV to handle both
"\r\n"
and "\n"
when parsing, but only the latter for dumping.
Parser/Dumper API
Modules defined with define/2
implement the NimbleCSV
behaviour. See
the callbacks for this behaviour for information on the generated functions
and their documentation.
Link to this section Callbacks
dump_to_iodata(rows :: Enumerable.t()) :: iodata()
Eagerly dumps an enumerable into iodata (a list of binaries and bytes and other lists).
dump_to_stream(rows :: Enumerable.t()) :: Enumerable.t()
Lazily dumps from an enumerable to a stream.
It returns a stream that emits each row as iodata.
parse_enumerable(enum :: Enumerable.t()) :: [[binary()]]
Same as parse_enumerable(enumerable, [])
.
parse_enumerable(enum :: Enumerable.t(), opts :: keyword()) :: [[binary()]]
Eagerly parses CSV from an enumerable and returns a list of rows.
Options
:headers
- whenfalse
, no longer discard the first row. Defaults totrue
.
parse_stream(enum :: Enumerable.t()) :: Enumerable.t()
Same as parse_stream(enumerable, [])
.
parse_stream(enum :: Enumerable.t(), opts :: keyword()) :: Enumerable.t()
Lazily parses CSV from a stream and returns a stream of rows.
Options
:headers
- whenfalse
, no longer discard the first row. Defaults totrue
.
Same as parse_string(enumerable, [])
.
Eagerly parses CSV from a string and returns a list of rows.
Options
:headers
- whenfalse
, no longer discard the first row. Defaults totrue
.