Module csv
Copyright © 2021 Serge Aleynikov
Authors: Serge Aleynikov (saleyn@gmail.com).
Description
CSV file parsing functionsData Types
load_options()
load_options() =
[{load_type, recreate | replace | ignore_dups | update_dups} |
{col_types,
#{binary() =>
ColType ::
atom() | {ColType :: atom(), ColLen :: integer()}}} |
{batch_size, integer()} |
{blob_size, integer()} |
{create_table, boolean()} |
{save_create_sql_to_file, string()} |
{transforms, #{binary() => fun((term()) -> term())}} |
{guess_types, boolean()} |
{guess_limit_rows, integer()} |
{max_nulls_pcnt, float()} |
{primary_key, PKColumns :: binary() | [binary() | list()]} |
{drop_temp_table, boolean()} |
{encoding, string() | atom()} |
{verbose, boolean() | integer()}]
Options for loading data to a database.
- {load_type, Type}
- Type of loading to perform.
recreate
will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table.replace
will do an insert by usingREPLACE INTO
statement.ignore_dups
will useINSERT IGNORE INTO
statement to ignore records with duplicate keys.update_dups
will do anINSERT INTO
andON DUPLICATE KEY UPDATE
, so that the old records are updated and the new ones are inserted. - {create_table, Allow}
- Allow to create a table if it doesn't exist
- {col_types, Map}
- Types of data for all or some columens. The Map is in the format:
ColName::binary() => ColInfo
, where ColInfo isColType | {ColType, ColLen::integer()}
, andColType
is:date | datetime | integer | float | blob | number
. - {transforms, Map}
- Value transformation function for columns. The Map is in the format:
ColName::binary() => fun((Value::term()) -> term())
. - {batch_size, Size}
- Number of records per SQL insert/update/replace call
- {blob_size, Size}
- Threshold in number of bytes at which a VARCHAR field is defined as BLOB
- {save_create_sql_to_file, Filename::string()}
- Save CREATE TABLE sql statement to a file
- guess_types
- When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
- {guess_limit_rows, Limit}
- Limit the number of rows for guessing the column data types
- {max_nulls_pcnt, Percent}
- A percentage threshold of permissible NULLs in a column (0-100), above which
the column data type is forced to be treated as
string
- {primary_key, Fields}
- Names of primary key fields in the created table
- {drop_temp_table, boolean()}
- When true (default), temp table is dropped.
- {encoding, Encoding}
- The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
- verbose
- Print additional details to stdout
parse_options()
parse_options() =
[fix_lengths | binary | list |
{open, list()} |
{columns, [binary() | string()]} |
{converters,
[{binary() | string() | all,
fun((binary(), binary()) -> binary()) |
{rex, binary(), binary()}}]}]
Function Index
guess_data_type/1 | Guess the type of data by its value. |
guess_data_types/2 | Guess data types of fields in the given CSV list of rows obtained by
parsing a CSV file with parse(File,[fix_lengths]) . |
load_to_mysql/4 | Load CSV data from a File to a MySQL database. |
max_field_lengths/2 | Get max field lengths for a list obtained by parsing a CSV file with
parse_csv_file(File,[fix_lengths]) . |
parse/1 | Parse a CSV file using default options. |
parse/2 | Parse a given CSV file. |
parse_line/1 | Parse a CSV line. |
Function Details
guess_data_type/1
guess_data_type(S :: binary()) ->
{null | date | datetime | integer | float |
string,
term(),
string()}
Guess the type of data by its value
guess_data_types/2
guess_data_types(HasHeaderRow :: boolean(),
Rows :: [Fields :: [binary()]]) ->
{Type ::
string | integer | number | float |
date | datetime,
MaxFieldLen :: integer(),
NumOfNulls :: integer()}
Guess data types of fields in the given CSV list of rows obtained by
parsing a CSV file with parse(File,[fix_lengths])
.
The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}
,
where the Type
is a field type, MaxFieldLen
is the max length of
data in this column, and NumOfNulls
is the number of rows with empty
values in this column.
load_to_mysql/4
load_to_mysql(File :: string(),
Tab :: string(),
MySqlPid :: pid(),
Opts :: load_options()) ->
{Columns :: list(),
AffectedCount :: integer(),
RecCount :: integer()}
Load CSV data from a File
to a MySQL
database.
Tab
is the name of a table where to load data. MySqlPid
is the pid of a
MySQL database connection returned by mysql:start_link/1
.
The data in the table is replaced according to {import_type, Type}
:
recreate
- The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.replace
- Use "REPLACE INTO" instead of "INSERT INTO" existing tableignore_dups
- The insert in the existing table is performed and the records with duplicate keys are ignoredupdate_dups
- The insert in the existing table is performed and the records with duplicate keys are updatedupsert
- The insert/update in the existing table is performed without creating a temporary table
max_field_lengths/2
max_field_lengths(HasHeaderRow :: boolean(),
Rows :: [Fields :: list()]) ->
[Len :: integer()]
Get max field lengths for a list obtained by parsing a CSV file with
parse_csv_file(File,[fix_lengths])
.
parse/1
parse(File :: string()) -> [[binary()]]
Parse a CSV file using default options.
parse/2
parse(File :: binary() | string(), Opts :: parse_options()) ->
[[string()]]
Parse a given CSV file.
parse_line/1
parse_line(Line :: binary()) -> list()
Parse a CSV line