Xlsxir v1.6.4 Xlsxir View Source

Extracts and parses data from a .xlsx file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data.

Link to this section Summary

Functions

Deletes ETS process tid and returns :ok if successful

Deprecated Extracts worksheet data contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, tid} with the timer argument set to false and returns a tuple of {:ok, tid, time} where time is a list containing time elapsed during the extraction process (i.e. [hour, minute, second, microsecond]) when the timer argument is set to true and tid - is the ETS table id

Accesses ETS process and returns value of specified cell

Accesses tid ETS process and returns values of specified column in a list

Accesses ETS process and returns data formatted as a list of row value lists

Accesses ETS process and returns data formatted as a map of cell references and values

Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages

Returns count data based on num_type specified:

  • :rows - Returns number of rows contained in worksheet
  • :cols - Returns number of columns contained in worksheet
  • :cells - Returns number of cells contained in worksheet
  • :name - Returns worksheet name
  • :all - Returns a keyword list containing all of the above

Accesses ETS process and returns values of specified row in a list

Extracts worksheet data contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, table_id} with the timer argument set to false and returns a tuple of {:ok, table_id, time} where time is a list containing time elapsed during the extraction process (i.e. [hour, minute, second, microsecond]) when the timer argument is set to true. The table_id is used to access data for that particular ETS process with the various access functions of the Xlsxir module

Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, tid} where tid - is ETS table id

Called when an application is started

Stream worksheet rows contained in the specified .xlsx file

Link to this section Functions

Deletes ETS process tid and returns :ok if successful.

Example

Extract first worksheet in an example file named test.xlsx located in ./test/test_data:

iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok

iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok
Link to this function

extract(path, index, timer \\ false, options \\ []) View Source

Deprecated Extracts worksheet data contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, tid} with the timer argument set to false and returns a tuple of {:ok, tid, time} where time is a list containing time elapsed during the extraction process (i.e. [hour, minute, second, microsecond]) when the timer argument is set to true and tid - is the ETS table id

Cells containing formulas in the worksheet are extracted as either a string, integer or float depending on the resulting value of the cell. Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date() format (i.e. {year, month, day}).

Parameters

  • path - file path of a .xlsx file type in string format
  • index - index of worksheet from within the Excel workbook to be parsed (zero-based index)
  • timer - boolean flag that tracks extraction process time and returns it when set to true. Default value is false.

Options

  • :max_rows - the number of rows to fetch from within the worksheet
  • :extract_to - Specify how the .xlsx content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed. :memory will extract files to memory, and :file to files in the file system
  • :extract_base_dir - when extracting to file, files will be extracted in a sub directory in the :extract_base_dir directory. Defaults to Application.get_env(:xlsxir, :extract_base_dir) or "temp"

Example

Extract first worksheet in an example file named test.xlsx located in ./test/test_data:

  iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
  iex> Enum.member?(:ets.all, tid)
  true
  iex> Xlsxir.close(tid)
  :ok

  iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0, false, [extract_to: :file])
  iex> Enum.member?(:ets.all, tid)
  true
  iex> Xlsxir.close(tid)
  :ok

  iex> {:ok, tid, _timer} = Xlsxir.extract("./test/test_data/test.xlsx", 0, true)
  iex> Enum.member?(:ets.all, tid)
  true
  iex> Xlsxir.close(tid)
  :ok

Test parallel parsing

  iex> task1 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
  iex> task2 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
  iex> {:ok, tid1} = Task.await(task1)
  iex> {:ok, tid2} = Task.await(task2)
  iex> Xlsxir.get_list(tid1)
  [["string one", "string two", 10, 20, {2016, 1, 1}]]
  iex> Xlsxir.get_list(tid2)
  [["string one", "string two", 10, 20, {2016, 1, 1}]]
  iex> Xlsxir.close(tid1)
  :ok
  iex> Xlsxir.close(tid2)
  :ok

Example (errors)

  iex> Xlsxir.extract("./test/test_data/test.invalidfile", 0)
  {:error, "Invalid file type (expected xlsx)."}

  iex> Xlsxir.extract("./test/test_data/test.xlsx", 100)
  {:error, "Invalid worksheet index."}
Link to this function

get_cell(table_id, cell_ref) View Source

Accesses ETS process and returns value of specified cell.

Parameters

  • table_id - table identifier of ETS process to be accessed
  • cell_ref - Reference name of cell to be returned in string format (i.e. "A1")

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_cell(tid, "A1") "string one" iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_cell(tid, "A1") "string one" iex> Xlsxir.close(tid) :ok

Accesses tid ETS process and returns values of specified column in a list.

Parameters

  • tid - table identifier of ETS process to be accessed
  • col - Reference name of column to be returned in string format (i.e. "A")

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_col(tid, "A") ["string one"] iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_col(tid, "A") ["string one"] iex> Xlsxir.close(tid) :ok

Link to this function

get_info(table_id, num_type \\ :all) View Source

See get_multi_info/2 documentation.

Accesses ETS process and returns data formatted as a list of row value lists.

Parameters

  • tid - table identifier of ETS process to be accessed

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_list(tid) [["string one", "string two", 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 2) iex> Xlsxir.get_list(tid) |> List.first |> Enum.count 16384

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_list(tid) [["string one", "string two", 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok

Accesses ETS process and returns data formatted as a map of cell references and values.

Parameters

  • tid - table identifier of ETS process to be accessed

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_map(tid) %{ "A1" => "string one", "B1" => "string two", "C1" => 10, "D1" => 20, "E1" => {2016,1,1}} iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_map(tid) %{ "A1" => "string one", "B1" => "string two", "C1" => 10, "D1" => 20, "E1" => {2016,1,1}} iex> Xlsxir.close(tid) :ok

Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages.

Parameters

  • tid - table identifier of ETS process to be accessed

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => "string one", 1 => "string two", 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] "string one" iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => "string one", 1 => "string two", 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] "string one" iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok

Link to this function

get_multi_info(tid, num_type \\ :all) View Source

Returns count data based on num_type specified:

  • :rows - Returns number of rows contained in worksheet
  • :cols - Returns number of columns contained in worksheet
  • :cells - Returns number of cells contained in worksheet
  • :name - Returns worksheet name
  • :all - Returns a keyword list containing all of the above

Parameters

  • tid - table identifier of ETS process to be accessed
  • num_type - type of count data to be returned (see above), defaults to :all

Accesses ETS process and returns values of specified row in a list.

Parameters

  • tid - table identifier of ETS process to be accessed
  • row - Reference name of row to be returned in integer format (i.e. 1)

Example

An example file named test.xlsx located in ./test/test_data containing the following:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of "4 * 5"
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370

    iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_row(tid, 1) ["string one", "string two", 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok

    iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0) iex> Xlsxir.get_row(tid, 1) ["string one", "string two", 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok

Link to this function

multi_extract(path, index \\ nil, timer \\ false, excel \\ nil, options \\ []) View Source

Extracts worksheet data contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, table_id} with the timer argument set to false and returns a tuple of {:ok, table_id, time} where time is a list containing time elapsed during the extraction process (i.e. [hour, minute, second, microsecond]) when the timer argument is set to true. The table_id is used to access data for that particular ETS process with the various access functions of the Xlsxir module.

Cells containing formulas in the worksheet are extracted as either a string, integer or float depending on the resulting value of the cell. Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date() format (i.e. {year, month, day}).

Parameters

  • path - file path of a .xlsx file type in string format
  • index - index of worksheet from within the Excel workbook to be parsed (zero-based index)
  • timer - boolean flag that tracts extraction process time and returns it when set to true. Defalut value is false.

Options

  • :max_rows - the number of rows to fetch from within the worksheets
  • :extract_to - Specify how the .xlsx content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed. :memory will extract files to memory, and :file to files in the file system
  • :extract_base_dir - when extracting to file, files will be extracted in a sub directory in the :extract_base_dir directory. Defaults to Application.get_env(:xlsxir, :extract_base_dir) or "temp"

Example

Extract first worksheet in an example file named test.xlsx located in ./test/test_data:

  iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
  iex> Enum.member?(:ets.all, tid)
  true
  iex> Xlsxir.close(tid)
  :ok

Example

Extract all worksheets in an example file named test.xlsx located in ./test/test_data:

  iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx")
  iex> alive_ids = Enum.map(results, fn {:ok, tid} -> Enum.member?(:ets.all, tid) end)
  iex> Enum.all?(alive_ids)
  true
  iex> Enum.map(results, fn {:ok, tid} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
  true

Example

Extract all worksheets in an example file named test.xlsx located in ./test/test_data with timer:

  iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx", nil, true)
  iex> alive_ids = Enum.map(results, fn {:ok, tid, _timer} -> Enum.member?(:ets.all, tid) end)
  iex> Enum.all?(alive_ids)
  true
  iex> Enum.map(results, fn {:ok, tid, _timer} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
  true

Example (errors)

  iex> Xlsxir.multi_extract("./test/test_data/test.invalidfile", 0)
  {:error, "Invalid file type (expected xlsx)."}

  iex> Xlsxir.multi_extract("./test/test_data/test.xlsx", 100)
  {:error, "Invalid worksheet index."}
Link to this function

peek(path, index, rows, options \\ []) View Source

Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx file to an ETS process. Successful extraction returns {:ok, tid} where tid - is ETS table id.

Parameters

  • path - file path of a .xlsx file type in string format
  • index - index of worksheet from within the Excel workbook to be parsed (zero-based index)
  • rows - the number of rows to fetch from within the specified worksheet

Options

  • :extract_to - Specify how the .xlsx content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed. :memory will extract files to memory, and :file to files in the file system
  • :extract_base_dir - when extracting to file, files will be extracted in a sub directory in the :extract_base_dir directory. Defaults to Application.get_env(:xlsxir, :extract_base_dir) or "temp"

Example

Peek at the first 10 rows of the 9th worksheet in an example file named test.xlsx located in ./test/test_data:

  iex> {:ok, tid} = Xlsxir.peek("./test/test_data/test.xlsx", 8, 10)
  iex> Enum.member?(:ets.all, tid)
  true
  iex> Xlsxir.close(tid)
  :ok

Called when an application is started.

This function is called when an application is started using Application.start/2 (and functions on top of that, such as Application.ensure_started/2). This function should start the top-level process of the application (which should be the top supervisor of the application's supervision tree if the application follows the OTP design principles around supervision).

start_type defines how the application is started:

  • :normal - used if the startup is a normal startup or if the application is distributed and is started on the current node because of a failover from another node and the application specification key :start_phases is :undefined.
  • {:takeover, node} - used if the application is distributed and is started on the current node because of a failover on the node node.
  • {:failover, node} - used if the application is distributed and is started on the current node because of a failover on node node, and the application specification key :start_phases is not :undefined.

start_args are the arguments passed to the application in the :mod specification key (e.g., mod: {MyApp, [:my_args]}).

This function should either return {:ok, pid} or {:ok, pid, state} if startup is successful. pid should be the PID of the top supervisor. state can be an arbitrary term, and if omitted will default to []; if the application is later stopped, state is passed to the stop/1 callback (see the documentation for the c:stop/1 callback for more information).

use Application provides no default implementation for the start/2 callback.

Callback implementation for Application.start/2.

Link to this function

stream_list(path, index, options \\ []) View Source

Stream worksheet rows contained in the specified .xlsx file.

Cells containing formulas in the worksheet are extracted as either a string, integer or float depending on the resulting value of the cell. Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date() format (i.e. {year, month, day}).

Parameters

  • path - file path of a .xlsx file type in string format
  • index - index of worksheet from within the Excel workbook to be parsed (zero-based index)

Options

  • :extract_to - Specify how the .xlsx content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed. :memory will extract files to memory, and :file to files in the file system
  • :extract_base_dir - when extracting to file, files will be extracted in a sub directory in the :extract_base_dir directory. Defaults to Application.get_env(:xlsxir, :extract_base_dir) or "temp"

Example

Extract first worksheet in an example file named test.xlsx located in ./test/test_data:

  iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(1)
  [[1, 2]]
  iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(3)
  [[1, 2], [3, 4]]