Xlsxir v1.6.4 Xlsxir.SaxParser View Source

Provides SAX (Simple API for XML) parsing functionality of the .xlsx file via the Erlsom Erlang library. SAX (Simple API for XML) is an event-driven parsing algorithm for parsing large XML files in chunks, preventing the need to load the entire DOM into memory. Current chunk size is set to 10,000.

Link to this section Summary

Functions

Parses XmlFile (xl/worksheets/sheet#{n}.xml at index n, xl/styles.xml, xl/workbook.xml or xl/sharedStrings.xml) using SAX parsing. An Erlang Term Storage (ETS) process is started to hold the state of data parsed. The style and sharedstring XML files (if they exist) must be parsed first in order for the worksheet parser to sucessfully complete

Link to this section Functions

Link to this function

parse(xml_file, type, excel \\ nil) View Source

Parses XmlFile (xl/worksheets/sheet#{n}.xml at index n, xl/styles.xml, xl/workbook.xml or xl/sharedStrings.xml) using SAX parsing. An Erlang Term Storage (ETS) process is started to hold the state of data parsed. The style and sharedstring XML files (if they exist) must be parsed first in order for the worksheet parser to sucessfully complete.

Parameters

  • content - XML string to parse
  • type - file type identifier (:worksheet, :style or :string) of XML file to be parsed
  • max_rows - the maximum number of rows in this worksheet that should be parsed

Example

An example file named test.xlsx located in ./test/test_data containing the following in worksheet at index 0:

  • cell 'A1' -> "string one"
  • cell 'B1' -> "string two"
  • cell 'C1' -> integer of 10
  • cell 'D1' -> formula of =4*5
  • cell 'E1' -> date of 1/1/2016 or Excel date serial of 42370 The .xlsx file contents have been extracted to ./test/test_data/test. For purposes of this example, we utilize the get_at/1 function of each ETS process module to pull a sample of the parsed data. Keep in mind that the worksheet data is stored in the ETS process as a list of row lists, so the Xlsxir..get_row/2 function will return a full row of values.

    iex> {:ok, %Xlsxir.ParseStyle{tid: tid1}, } = Xlsxir.SaxParser.parse(%Xlsxir.XmlFile{content: File.read!("./test/test_data/test/xl/styles.xml")}, :style) iex> :ets.lookup(tid1, 0) [{0, nil}] iex> {:ok, %Xlsxir.ParseString{tid: tid2}, } = Xlsxir.SaxParser.parse(%Xlsxir.XmlFile{content: File.read!("./test/testdata/test/xl/sharedStrings.xml")}, :string) iex> :ets.lookup(tid2, 0) [{0, "string one"}] iex> {:ok, %Xlsxir.ParseWorkbook{tid: tid3}, } = Xlsxir.SaxParser.parse(%Xlsxir.XmlFile{content: File.read!("./test/testdata/test/xl/workbook.xml")}, :workbook) iex> :ets.lookup(tid3, 1) [{1, "Sheet1"}] iex> {:ok, %Xlsxir.ParseWorksheet{tid: tid4}, } = Xlsxir.SaxParser.parse(%Xlsxir.XmlFile{name: "sheet1.xml", content: File.read!("./test/test_data/test/xl/worksheets/sheet1.xml")}, :worksheet, %Xlsxir.XlsxFile{shared_strings: tid2, styles: tid1, workbook: tid3}) iex> :ets.lookup(tid4, 1) [{1, [["A1", "string one"], ["B1", "string two"], ["C1", 10], ["D1", 20], ["E1", {2016, 1, 1}]]}]