View Source Changelog
3-0-2-2022-11-03
3.0.2 (2022-11-03)
- Ensure that escaped fields as the last field on the last line without a newline are included in the results
3-0-1-2022-10-25
3.0.1 (2022-10-25)
- Ensure that stray escape quotes and unterminated escape sequences on a last line without a newline produce errors
3-0-0-2022-10-25
3.0.0 (2022-10-25)
- The parallel parser/lexer with a binary matching parser with better performance.
- A new
:field_transform
option allows specifying functionality applied when decoding any field through a function - Escape characters can now be specified using the
:escape_character
option, this Closes #59 - The library will now reparse lines that follow e.g. an unterminated escape sequence. This ensures that all possible valid rows will be returned in normal mode
- Encoding checks have been removed because they can either be done using
:field_transform
or outside the library - Better docs
upgrading-from-2-x
Upgrading from 2.x
- Parallelism has been removed, alongside its options
:num_workers
and:worker_work_ratio
. You can safely remove them. StrayQuoteError
is nowStrayEscapeCharacterError
. If you catch this error in your code, you need to rename it.- The
:strip_fields
option needs to be replaced with the:field_transform
option:File.stream!("data.csv") |> CSV.decode(field_transform: &String.trim/1)
:validate_row_length
now defaults tofalse
. This option produces an error for rows with different length. Set it totrue
to get the same behaviour as in 2.x:escape_formulas
is now:unescape_formulas
fordecode
anddecode!
. It is still:escape_formulas
forencode
. Change:escape_formulas
to:unescape_formulas
indecode
calls to get the same behaviour as in 2.x:escape_max_lines
now defaults to10
instead of1000
. To get the same behaviour as in 2.x, use:File.stream!("data.csv") |> CSV.decode(escape_max_lines: 1000)
:replace
has been removed.CSV
will now return fields with incorrect encoding as-is. You can use the new:field_transform
option to provide a function transforming fields while they are being parsed. This allows to e.g. replace incorrect encoding:defp replace_bad_encoding(field) do if String.valid?(field) do field else field |> String.codepoints() |> Enum.map(fn codepoint -> if String.valid?(codepoint), do: codepoint, else: "?" end) |> Enum.join() end end
2-5-0-2022-09-17
2.5.0 (2022-09-17)
- Optional parameter
escape_formulas
to prevent CSV injection. Fixes #103 reported by @maennchen. Contributed by @maennchen in PR #104. - Optional parameter
force_quotes
to force quotes when encoding contributed by @stuart - Bugfix to pass non UTF-8 lines through in normal mode so other lines can be processed, Fixes #107. Contributed by @al2o3cr.
- Allow to encode keyword lists specifying headers as values, contributed by @michaelchu
- Better docs thanks to @kianmeng
2-4-1-2020-09-12
2.4.1 (2020-09-12)
- Fix unnecessary escaping of delimiters when encoding Fixes #70 reported by @karmajunkie
2-4-0-2020-09-12
2.4.0 (2020-09-12)
- Fix StrayQuoteError not getting passed the correct arguments in strict mode. Fixes #96.
- When headers are present multiple times and the
:headers
option is set totrue
, parse the values into a list. Contributed by @MrAlexLau in PR #97.
2-3-1-2019-03-30
2.3.1 (2019-03-30)
- Fix StrayQuoteError incorrectly getting raised when escape sequences end in new lines. Fixes #89. Raised by @rockwood in Issue #96.
2-3-0-2019-03-17
2.3.0 (2019-03-17)
- Add StrayQuoteError which gets raised when a row has stray quotes rather than EscapeSequenceError to help with common encoding errors.
2-2-0-2019-03-03
2.2.0 (2019-03-03)
- Make syntax compatible with latest Elixir releases
- Add
validate_row_length:
option defaulting to true to allow disabling validation of row length.
2-0-0-2017-05-29
2.0.0 (2017-05-29)
- Make
decode
return row and error tuples instead of raising errors directly - Make old behaviour of raising errors directly available
via
decode!
- Improve error messages for escape sequences
- Rewrite parts of the pipeline to be more modular
1-4-4-2016-11-12
1.4.4 (2016-11-12)
- Load
parallel_stream
as an app dependency to avoid load level errors. See issue #56 reported by @luk3thomas
1-4-3-2016-08-27
1.4.3 (2016-08-27)
- Fix a case where lines would not be aggregated correctly see #52 reported by @yury-dimov
1-4-2-2016-06-20
1.4.2 (2016-06-20)
- Update dependency on
parallel_stream
1-4-1-2016-05-21
1.4.1 (2016-05-21)
- Fix condition where rows would be dropped when decoding from stateful streams. See #39 reported by @moxley
1-4-0-2016-04-03
1.4.0 (2016-04-03)
- add option to specify headers in encode - added in #34 by @barruumrex
1-3-3-2016-03-25
1.3.3 (2016-03-25)
1-3-2-2016-03-08
1.3.2 (2016-03-08)
- Cleanup, removing some unused defaults in function headers to remove compile time warnings
1-3-1-2016-03-08
1.3.1 (2016-03-08)
- Fix
:strip_cells
not stripping cells when multiple options are specified - #29 by @tomjoro
1-3-0-2016-03-01
1.3.0 (2016-03-01)
- Now supports linebreaks inside escaped fields (#13)
- Raises an error when row length mismatches across rows
- Uses parallel_stream for parallelism
1-2-4-2016-02-06
1.2.4 (2016-02-06)
- Fix encoding of double quotes
1-2-3-2016-01-19
1.2.3 (2016-01-19)
- Fix a condition where headers: true would enumerate the whole file once before parsing
1-2-2-2016-01-02
1.2.2 (2016-01-02)
- Fix default num_pipes argument to evaluate num_pipes dependent on scheduler at runtime
- Test utf-8 files with BOM
- Syntax and mix updates for elixir 1.2
1-2-1-2015-10-17
1.2.1 (2015-10-17)
- Decoder performance optimisations
1-2-0-2015-10-11
1.2.0 (2015-10-11)
- Use
Stream.transform/4
- incompatible with Elixir <1.1.0
1-1-5-2015-10-11
1.1.5 (2015-10-11)
- Decoder refactor from
Stream.resource/3
toStream.transform/3
in order to get more predictable stream behaviour - Rows now get processed in order
- Fix a bug where stream would get evaluated before being decoded
1-1-4-2015-09-13
1.1.4 (2015-09-13)
- Fix a bug where headers could be out of order
1-1-3-2015-09-12
1.1.3 (2015-09-12)
- Fix a bug where headers could get parsed as the first row
1-1-2-2015-09-05
1.1.2 (2015-09-05)
- Fix a bug where calls to decode with num_pipes: 1 would yield varying results due to leftover state in decoder message queue
1-1-1-2015-07-14
1.1.1 (2015-07-14)
- Rescue from errors in stream producer to get more predictable behaviour in case of failure
1-1-0-2015-07-12
1.1.0 (2015-07-12)
- Better error messages when encountering invalid encodings
1-0-1-2015-07-11
1.0.1 (2015-07-11)
- Indicate
consolidate_protocols
for better encoding performance
1-0-0-2015-05-24
1.0.0 (2015-05-24)
- Use bytes as separators
0-2-3-2015-05-24
0.2.3 (2015-05-24)
- Add benchmarking
0-2-2-2015-05-20
0.2.2 (2015-05-20)
- Use utf-8 bytes instead of codepoints for multi-byte parsing
0-2-1-2015-05-20
0.2.1 (2015-05-20)
- Fix handling of multi-byte utf-8 characters
0-2-0-2015-03-25
0.2.0 (2015-03-25)
- Implement encoder protocol