Xlsxir v1.6.4 Xlsxir View Source
Extracts and parses data from a .xlsx
file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data.
Link to this section Summary
Functions
Deletes ETS process tid
and returns :ok
if successful
Deprecated
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, tid}
with the timer argument set to false and returns a tuple of {:ok, tid, time}
where time is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true and tid - is the ETS table id
Accesses ETS process and returns value of specified cell
Accesses tid
ETS process and returns values of specified column in a list
See get_multi_info/2
documentation
Accesses ETS process and returns data formatted as a list of row value lists
Accesses ETS process and returns data formatted as a map of cell references and values
Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages
Returns count data based on num_type
specified:
:rows
- Returns number of rows contained in worksheet:cols
- Returns number of columns contained in worksheet:cells
- Returns number of cells contained in worksheet:name
- Returns worksheet name:all
- Returns a keyword list containing all of the above
Accesses ETS process and returns values of specified row in a list
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, table_id}
with the timer argument set to false and returns a tuple of {:ok, table_id, time}
where time
is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true. The table_id
is used to access data for that particular ETS process with the various access functions of the
Xlsxir
module
Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx
file to an ETS process.
Successful extraction returns {:ok, tid}
where tid - is ETS table id
Called when an application is started
Stream worksheet rows contained in the specified .xlsx
file
Link to this section Functions
close(tid) View Source
Deletes ETS process tid
and returns :ok
if successful.
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok
extract(path, index, timer \\ false, options \\ []) View Source
Deprecated
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, tid}
with the timer argument set to false and returns a tuple of {:ok, tid, time}
where time is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true and tid - is the ETS table id
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)timer
- boolean flag that tracks extraction process time and returns it when set totrue
. Default value isfalse
.
Options
:max_rows
- the number of rows to fetch from within the worksheet:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or "temp"
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0, false, [extract_to: :file])
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid, _timer} = Xlsxir.extract("./test/test_data/test.xlsx", 0, true)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
Test parallel parsing
iex> task1 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
iex> task2 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
iex> {:ok, tid1} = Task.await(task1)
iex> {:ok, tid2} = Task.await(task2)
iex> Xlsxir.get_list(tid1)
[["string one", "string two", 10, 20, {2016, 1, 1}]]
iex> Xlsxir.get_list(tid2)
[["string one", "string two", 10, 20, {2016, 1, 1}]]
iex> Xlsxir.close(tid1)
:ok
iex> Xlsxir.close(tid2)
:ok
Example (errors)
iex> Xlsxir.extract("./test/test_data/test.invalidfile", 0)
{:error, "Invalid file type (expected xlsx)."}
iex> Xlsxir.extract("./test/test_data/test.xlsx", 100)
{:error, "Invalid worksheet index."}
get_cell(table_id, cell_ref) View Source
Accesses ETS process and returns value of specified cell.
Parameters
table_id
- table identifier of ETS process to be accessedcell_ref
- Reference name of cell to be returned instring
format (i.e."A1"
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_cell(tid, "A1") "string one" iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_cell(tid, "A1") "string one" iex> Xlsxir.close(tid) :ok
get_col(tid, col) View Source
Accesses tid
ETS process and returns values of specified column in a list
.
Parameters
tid
- table identifier of ETS process to be accessedcol
- Reference name of column to be returned instring
format (i.e."A"
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_col(tid, "A") ["string one"] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_col(tid, "A") ["string one"] iex> Xlsxir.close(tid) :ok
get_info(table_id, num_type \\ :all) View Source
See get_multi_info/2
documentation.
get_list(tid) View Source
Accesses ETS process and returns data formatted as a list of row value lists.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_list(tid) [["string one", "string two", 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 2) iex> Xlsxir.get_list(tid) |> List.first |> Enum.count 16384
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_list(tid) [["string one", "string two", 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok
get_map(tid) View Source
Accesses ETS process and returns data formatted as a map of cell references and values.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_map(tid) %{ "A1" => "string one", "B1" => "string two", "C1" => 10, "D1" => 20, "E1" => {2016,1,1}} iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_map(tid) %{ "A1" => "string one", "B1" => "string two", "C1" => 10, "D1" => 20, "E1" => {2016,1,1}} iex> Xlsxir.close(tid) :ok
get_mda(tid) View Source
Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => "string one", 1 => "string two", 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] "string one" iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => "string one", 1 => "string two", 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] "string one" iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok
get_multi_info(tid, num_type \\ :all) View Source
Returns count data based on num_type
specified:
:rows
- Returns number of rows contained in worksheet:cols
- Returns number of columns contained in worksheet:cells
- Returns number of cells contained in worksheet:name
- Returns worksheet name:all
- Returns a keyword list containing all of the above
Parameters
tid
- table identifier of ETS process to be accessednum_type
- type of count data to be returned (see above), defaults to:all
get_row(tid, row) View Source
Accesses ETS process and returns values of specified row in a list
.
Parameters
tid
- table identifier of ETS process to be accessedrow
- Reference name of row to be returned ininteger
format (i.e.1
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell 'A1' -> "string one"
- cell 'B1' -> "string two"
- cell 'C1' -> integer of 10
- cell 'D1' -> formula of "4 * 5"
cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_row(tid, 1) ["string one", "string two", 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_row(tid, 1) ["string one", "string two", 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok
multi_extract(path, index \\ nil, timer \\ false, excel \\ nil, options \\ []) View Source
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, table_id}
with the timer argument set to false and returns a tuple of {:ok, table_id, time}
where time
is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true. The table_id
is used to access data for that particular ETS process with the various access functions of the
Xlsxir
module.
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)timer
- boolean flag that tracts extraction process time and returns it when set totrue
. Defalut value isfalse
.
Options
:max_rows
- the number of rows to fetch from within the worksheets:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or "temp"
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
Example
Extract all worksheets in an example file named test.xlsx
located in ./test/test_data
:
iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx")
iex> alive_ids = Enum.map(results, fn {:ok, tid} -> Enum.member?(:ets.all, tid) end)
iex> Enum.all?(alive_ids)
true
iex> Enum.map(results, fn {:ok, tid} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
true
Example
Extract all worksheets in an example file named test.xlsx
located in ./test/test_data
with timer:
iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx", nil, true)
iex> alive_ids = Enum.map(results, fn {:ok, tid, _timer} -> Enum.member?(:ets.all, tid) end)
iex> Enum.all?(alive_ids)
true
iex> Enum.map(results, fn {:ok, tid, _timer} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
true
Example (errors)
iex> Xlsxir.multi_extract("./test/test_data/test.invalidfile", 0)
{:error, "Invalid file type (expected xlsx)."}
iex> Xlsxir.multi_extract("./test/test_data/test.xlsx", 100)
{:error, "Invalid worksheet index."}
peek(path, index, rows, options \\ []) View Source
Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx
file to an ETS process.
Successful extraction returns {:ok, tid}
where tid - is ETS table id.
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)rows
- the number of rows to fetch from within the specified worksheet
Options
:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or "temp"
Example
Peek at the first 10 rows of the 9th worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.peek("./test/test_data/test.xlsx", 8, 10)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
start(type, args) View Source
Called when an application is started.
This function is called when an application is started using
Application.start/2
(and functions on top of that, such as
Application.ensure_started/2
). This function should start the top-level
process of the application (which should be the top supervisor of the
application's supervision tree if the application follows the OTP design
principles around supervision).
start_type
defines how the application is started:
:normal
- used if the startup is a normal startup or if the application is distributed and is started on the current node because of a failover from another node and the application specification key:start_phases
is:undefined
.{:takeover, node}
- used if the application is distributed and is started on the current node because of a failover on the nodenode
.{:failover, node}
- used if the application is distributed and is started on the current node because of a failover on nodenode
, and the application specification key:start_phases
is not:undefined
.
start_args
are the arguments passed to the application in the :mod
specification key (e.g., mod: {MyApp, [:my_args]}
).
This function should either return {:ok, pid}
or {:ok, pid, state}
if
startup is successful. pid
should be the PID of the top supervisor. state
can be an arbitrary term, and if omitted will default to []
; if the
application is later stopped, state
is passed to the stop/1
callback (see
the documentation for the c:stop/1
callback for more information).
use Application
provides no default implementation for the start/2
callback.
Callback implementation for Application.start/2
.
stream_list(path, index, options \\ []) View Source
Stream worksheet rows contained in the specified .xlsx
file.
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)
Options
:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or "temp"
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(1)
[[1, 2]]
iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(3)
[[1, 2], [3, 4]]