View Source Changelog

3-0-2-2022-11-03

3.0.2 (2022-11-03)

  • Ensure that escaped fields as the last field on the last line without a newline are included in the results

3-0-1-2022-10-25

3.0.1 (2022-10-25)

  • Ensure that stray escape quotes and unterminated escape sequences on a last line without a newline produce errors

3-0-0-2022-10-25

3.0.0 (2022-10-25)

  • The parallel parser/lexer with a binary matching parser with better performance.
  • A new :field_transform option allows specifying functionality applied when decoding any field through a function
  • Escape characters can now be specified using the :escape_character option, this Closes #59
  • The library will now reparse lines that follow e.g. an unterminated escape sequence. This ensures that all possible valid rows will be returned in normal mode
  • Encoding checks have been removed because they can either be done using :field_transform or outside the library
  • Better docs

upgrading-from-2-x

Upgrading from 2.x

  • Parallelism has been removed, alongside its options :num_workers and :worker_work_ratio. You can safely remove them.
  • StrayQuoteError is now StrayEscapeCharacterError. If you catch this error in your code, you need to rename it.
  • The :strip_fields option needs to be replaced with the :field_transform option:
      File.stream!("data.csv") |> CSV.decode(field_transform: &String.trim/1)
  • :validate_row_length now defaults to false. This option produces an error for rows with different length. Set it to true to get the same behaviour as in 2.x
  • :escape_formulas is now :unescape_formulas for decode and decode!. It is still :escape_formulas for encode. Change :escape_formulas to :unescape_formulas in decode calls to get the same behaviour as in 2.x
  • :escape_max_lines now defaults to 10 instead of 1000. To get the same behaviour as in 2.x, use:
      File.stream!("data.csv") |> CSV.decode(escape_max_lines: 1000)
  • :replace has been removed. CSV will now return fields with incorrect encoding as-is. You can use the new :field_transform option to provide a function transforming fields while they are being parsed. This allows to e.g. replace incorrect encoding:
      defp replace_bad_encoding(field) do
        if String.valid?(field) do
          field
        else
          field
          |> String.codepoints()
          |> Enum.map(fn codepoint -> if String.valid?(codepoint), do: codepoint, else: "?" end)
          |> Enum.join()
        end
      end

2-5-0-2022-09-17

2.5.0 (2022-09-17)

  • Optional parameter escape_formulas to prevent CSV injection. Fixes #103 reported by @maennchen. Contributed by @maennchen in PR #104.
  • Optional parameter force_quotes to force quotes when encoding contributed by @stuart
  • Bugfix to pass non UTF-8 lines through in normal mode so other lines can be processed, Fixes #107. Contributed by @al2o3cr.
  • Allow to encode keyword lists specifying headers as values, contributed by @michaelchu
  • Better docs thanks to @kianmeng

2-4-1-2020-09-12

2.4.1 (2020-09-12)

2-4-0-2020-09-12

2.4.0 (2020-09-12)

  • Fix StrayQuoteError not getting passed the correct arguments in strict mode. Fixes #96.
  • When headers are present multiple times and the :headers option is set to true, parse the values into a list. Contributed by @MrAlexLau in PR #97.

2-3-1-2019-03-30

2.3.1 (2019-03-30)

2-3-0-2019-03-17

2.3.0 (2019-03-17)

2-2-0-2019-03-03

2.2.0 (2019-03-03)

  • Make syntax compatible with latest Elixir releases
  • Add validate_row_length: option defaulting to true to allow disabling validation of row length.

2-0-0-2017-05-29

2.0.0 (2017-05-29)

  • Make decode return row and error tuples instead of raising errors directly
  • Make old behaviour of raising errors directly available via decode!
  • Improve error messages for escape sequences
  • Rewrite parts of the pipeline to be more modular

1-4-4-2016-11-12

1.4.4 (2016-11-12)

1-4-3-2016-08-27

1.4.3 (2016-08-27)

1-4-2-2016-06-20

1.4.2 (2016-06-20)

1-4-1-2016-05-21

1.4.1 (2016-05-21)

  • Fix condition where rows would be dropped when decoding from stateful streams. See #39 reported by @moxley

1-4-0-2016-04-03

1.4.0 (2016-04-03)

1-3-3-2016-03-25

1.3.3 (2016-03-25)

1-3-2-2016-03-08

1.3.2 (2016-03-08)

  • Cleanup, removing some unused defaults in function headers to remove compile time warnings

1-3-1-2016-03-08

1.3.1 (2016-03-08)

  • Fix :strip_cells not stripping cells when multiple options are specified - #29 by @tomjoro

1-3-0-2016-03-01

1.3.0 (2016-03-01)

  • Now supports linebreaks inside escaped fields (#13)
  • Raises an error when row length mismatches across rows
  • Uses parallel_stream for parallelism

1-2-4-2016-02-06

1.2.4 (2016-02-06)

  • Fix encoding of double quotes

1-2-3-2016-01-19

1.2.3 (2016-01-19)

  • Fix a condition where headers: true would enumerate the whole file once before parsing

1-2-2-2016-01-02

1.2.2 (2016-01-02)

  • Fix default num_pipes argument to evaluate num_pipes dependent on scheduler at runtime
  • Test utf-8 files with BOM
  • Syntax and mix updates for elixir 1.2

1-2-1-2015-10-17

1.2.1 (2015-10-17)

  • Decoder performance optimisations

1-2-0-2015-10-11

1.2.0 (2015-10-11)

1-1-5-2015-10-11

1.1.5 (2015-10-11)

  • Decoder refactor from Stream.resource/3 to Stream.transform/3 in order to get more predictable stream behaviour
  • Rows now get processed in order
  • Fix a bug where stream would get evaluated before being decoded

1-1-4-2015-09-13

1.1.4 (2015-09-13)

  • Fix a bug where headers could be out of order

1-1-3-2015-09-12

1.1.3 (2015-09-12)

  • Fix a bug where headers could get parsed as the first row

1-1-2-2015-09-05

1.1.2 (2015-09-05)

  • Fix a bug where calls to decode with num_pipes: 1 would yield varying results due to leftover state in decoder message queue

1-1-1-2015-07-14

1.1.1 (2015-07-14)

  • Rescue from errors in stream producer to get more predictable behaviour in case of failure

1-1-0-2015-07-12

1.1.0 (2015-07-12)

  • Better error messages when encountering invalid encodings

1-0-1-2015-07-11

1.0.1 (2015-07-11)

  • Indicate consolidate_protocols for better encoding performance

1-0-0-2015-05-24

1.0.0 (2015-05-24)

  • Use bytes as separators

0-2-3-2015-05-24

0.2.3 (2015-05-24)

  • Add benchmarking

0-2-2-2015-05-20

0.2.2 (2015-05-20)

  • Use utf-8 bytes instead of codepoints for multi-byte parsing

0-2-1-2015-05-20

0.2.1 (2015-05-20)

  • Fix handling of multi-byte utf-8 characters

0-2-0-2015-03-25

0.2.0 (2015-03-25)

  • Implement encoder protocol