View Source Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
[v0.11.1] - 2025-08-17
Added
Explorer.DataFrame.dump_ipc_schemaExplorer.DataFrame.dump_ipc_record_batchExplorer.Series.cumulative_count:stableoption forExplorer.DataFrame.group_by
Fixed
- Fix printing lazy data frame with new default print options (as of v0.11.0)
- Remove print from dataframe test
- Fix mutate docs formatting
New Contributors
- @WolfDan made their first contribution in https://github.com/elixir-explorer/explorer/pull/1103
Full Changelog
https://github.com/elixir-explorer/explorer/compare/v0.11.0...v0.11.1
[v0.11.0] - 2025-07-12
Added
Explorer.DataFrame.estimated_size/1- Estimates memory size of a DataFrameExplorer.DataFrame.to_table_string/2- Represents a DataFrame as a string for printingExplorer.Series.degrees/1- Converts radians to degreesExplorer.Series.radians/1- Converts degrees to radians:quote_styleoption to CSV functions
Fixed
- Fix bug where
:regionwas incorrectly required in%FSS.S3.Entry{} - Fix trigonometric functions to not raise on f32
- Fix warning from
:table_rexdependency when printing - Fix formatting of
Explorer.DataFrame.mutate_with/2options Explorer.Series.fill_missing/2now works for all integer and float dtypesExplorer.Series.frequencies/1now works for{:list, _}dtype- Fix inefficiency with categorization
- Fix typespecs
Explorer.DataFrame.select/2Explorer.DataFrame.ungroup/1Explorer.Seriesfunctions that may return lazy series
Changed
- Printing a DataFrame looks different
- Adds a row of
…to indicate there are hidden rows. Includes a new optionlimit_dots: :bottom | :splitto specify how to do this. - Drops the row separators except when composite dtypes are present.
- Allows you to pass through valid options to
TableRex.render!/2. This gives you a little more flexibility in case you don't like the defaults.
- Adds a row of
Explorer.DataFrame.print/1now documents its default:limitof 5 rowsExplorer.DataFrame.concat_rows/1has improved error messages- Accessing a DataFrame with a range now raises if the range is out of bounds
New Contributors
- @szajbus made their first contribution in https://github.com/elixir-explorer/explorer/pull/1030
- @viniciussbs made their first contribution in https://github.com/elixir-explorer/explorer/pull/1037
- @pejrich made their first contribution in https://github.com/elixir-explorer/explorer/pull/1040
- @jdbarillas made their first contribution in https://github.com/elixir-explorer/explorer/pull/1049
- @petrkozorezov made their first contribution in https://github.com/elixir-explorer/explorer/pull/1083
Full Changelog
https://github.com/elixir-explorer/explorer/compare/v0.10.1...v0.11.0
v0.10.1 - 2024-11-28
Fixed
Fix creation of series of
{:list, {:decimal, ...}}containing empty lists.Use
i128for:coeffield in the Rust code.This field is a positive, arbitrary precision integer on the Elixir side. It's convenient to represent it as a signed
i128because that's what the Decimal dtype expects. While you could technically create anExDecimalstruct with a negativecoef, it's not a practical concern.Fix
Explorer.DataFrame.print/1for empty dataframes.Fix datetime encoding overflow.
Before we were always converting first to a microsecond-based representation then to the final representation. The intermediate conversion is unecessary and risks overflows when trying to convert to a different time unit later. This approach converts directly to
i64from the Elixir struct and time unit.Encode millisecond precision for time and datetime series.
Fix list struct print bug.
Fixes an issue where we can't print columns with a dtype like
{:list, {:struct, ...}}where the root of the tree isn't a:structbut it contains a:struct.
Deprecated
- Remove documentation for deprecated functions
to_date/1andto_time/1. They are functions from theExplorer.Seriesthat soon will be removed.
v0.10.0 - 2024-10-23
Added
Add support for the decimals data type.
Decimals dtypes are represented by the
{:decimal, precision, scale}tuple, where precision can be a positive integer from 0 to 38, and is the maximum number of digits that can be represented by the decimal. The scale is the number of digits after the decimal point.With this addition, we also added the
:decimalpackage as a new dependency. TheExplorer.Series.from_list/2function accepts decimal numbers from that package as values -%Decimal{}.This version has a small number of operations, but is a good foundation.
Allow the usage of queries and lazy series outside callbacks and macros. This is an improvement to functions that were originally designed to accept callbacks. With this change you can now reuse lazy series across different "queries". See the
Explorer.Querydocs for details.The affected functions are:
Allow accessing the dataframe inside query.
Add "lazy read" support for Parquet and NDJSON from HTTP(s).
Expose more options for
Explorer.Series.cut/3andExplorer.Series.qcut/3. These options were available in Polars, but not in our APIs.
Fixed
Fix creation of series where a
nilvalue inside a list - for a{:list, any()}dtype - could result in an incompatible dtype. This fix will prevent panics for list of lists withnilentries.Fix
Explorer.DataFrame.dump_ndjson/2when date time is in use.Fix
Explorer.Series.product/1for lazy series.Accept
%FSS.HTTP.Entry{}structs in functions likeExplorer.DataFrame.from_parquet/2.Fix encode of binaries to terms from series of the
{:struct, any()}dtype. In case the inner fields of the struct had any binary (:binarydtype), it was causing a panic.
Changed
- Change the defaults of the functions
Explorer.Series.cut/3andExplorer.Series.qcut/3to not have "break points" column in the resultant dataframe. So the:include_breaksis nowfalseby default.
v0.9.2 - 2024-08-27
Added
- Add a new
:keepoption to themutate_with/3function andmutate/3macro. This option allows users to control which columns are retained in the output dataframe after a mutation operation. You can use:all(the default) or:none.
Fixed
- Fix handling of "LazySeries" with remote dataframes.
- Fix typespecs of
Explorer.Series.cast/2by adding adtype_alias()type.
v0.9.1 - 2024-08-15
Added
- Add support for saving to the cloud using streaming and the IPC format. This will enable saving a lazy frame to the cloud without loading it entirely in memory. It only supports saves to S3-compatible storage services.
Changed
- Force garbage collection on remote gc.
Fixed
Re-enable support for saving to the cloud using streaming and the Parquet format. It's a fix from the release of
v0.9.0that disabled this feature.Fix overwrite of dtypes for
Explorer.DataFrame.load_csv/2. This was a regression introduced inv0.9.0.
v0.9.0 - 2024-07-26
Added
Add initial support for SQL queries.
The
Explorer.DataFrame.sql/3is a function that accepts a dataframe and a SQL query. The SQL is not validated by Explorer, so the queries will be backend dependent. Right now we have only Polars as the backend.Add support for remote series and dataframes.
Automatically transfer data between nodes for remote series and dataframes and perform distributed garbage collection.
The functions in
Explorer.DataFrameandExplorer.Serieswill automatically move operations on remote dataframes to the nodes they belong to. TheExplorer.Remotemodule provides additional conveniences for manual placement.Add FLAME integration, so we automatically track remote series and dataframes returned from
FLAMEcalls when the:track_resourcesoption is enabled. See FLAME for more.Add
Explorer.DataFrame.transform/3that applies an Elixir function to each row. This function is similar toExplorer.Series.transform/2, and as such, it's considered an expensive operation. So it's recommended only if there is no similar dataframe or series operation available.Improve performance of
Explorer.Series.from_list/2for most of the cases where the:dtypeoption is given. This is specially true for when the dtype is:binary.
Changed
Stop inference of dtypes if the
:dtypeoption is given by the user. The main goal of this change is to improve performance. We are now delegating the job of decoding the terms as the given:dtypeto the backend.Explorer.Series.pow/2no longer casts to float when the exponent is a signed integer. We are following the way Polars works now, which is to try to execute the operation or raise an exception in case the exponent is negative.Explorer.Series.pivot_wider/4no longer includes thenames_fromcolumn name in the new columns whenvalues_fromis a list of columns. This is more consistent with its behaviour whenvalues_fromis a single column.Explorer.Series.substring/3no longer cycles to the end of the string if the negative offset surpasses the beginning of that string. In that case, an empty string is returned.The
Explorer.Series.ewm_*functions no longer replacenilvalues with the value at the previous index. They now propogatenilvalues through to the result series.Saving a dataframe as a Parquet file to S3 services no longer works when streaming is enabled. This is temporary due to a bug in Polars. An exception should be raised instead.
v0.8.3 - 2024-06-10
Added
Add new data type for datetimes with timezones:
{:datetime, precision, time_zone}The old dtype is now{:naive_datetime, precision}.Add option to rechunk the dataframes when using
Explorer.DataFrame.from_parquet/3
Changed
Change the
{:datetime, precision}dtype to{:naive_datetime, precision}. The idea is to mirror Elixir's datetime, and introduce support for time zones. Please note:{:datetime, precision}will work as an alias for{:naive_datetime, precision}for now but will raise a warning. The alias will be removed in a future release.Literal
%NaiveDateTime{}structs used in expressions will now have:microsecondprecision. Previously they defaulted to:nanosecondprecision. This was incorrect because%NaiveDateTime{}structs only have:microsecondprecision.
Fixed
Fix regression in
Explorer.DataFrame.concat_rows/2. It's possible to concat dataframes that are not aligned again.Fix "is_finite" and "is_infinite" from
Seriesto work in the context of aExplorer.Query.
v0.8.2 - 2024-04-22
Added
Add functions to work with strings and regexes.
Some of the functions have the prefix "re_", because they accept a string that represents a regular expression.
There is an important detail: we do not accept Elixir regexes, because we cannot guarantee that the backend supports it. Instead we accept a plain string that is "escaped". This means that you can use the
~Ssigil to build that string. Example:~S/(a|b)/.The added functions are the following:
Explorer.Series.split_into/3- split a string series into a struct of string fields. This function accepts a string as a separator.Explorer.Series.re_contains/2- check is the string series matches the regex pattern. Like the "non regex" counterpart, it returns a boolean series.Explorer.Series.re_replace/3- replaces all occurences of a pattern with replacement in string series. The replacement can refer to groups captures by using the${x}, wherexis the group index (starts with 1) or name.Explorer.Series.count_matches/2- count how many times a substring appears in a string series.Explorer.Series.re_count_matches/2- count how many times a pattern matches in a string series.Explorer.Series.re_scan/2- scan for all matches for the given regex pattern. This is going to result in a series of lists of strings -{:list, :string}.Explorer.Series.re_named_captures/2- extract all capture groups as a struct for the given regex pattern. In case the groups are not named, their positions are used as names.
Enable the usage of system certificates if OTP version 25 or above.
Add support for the
:streamingoption inExplorer.DataFrame.to_csv/3.Support operations with groups in the Lazy Polars backend. This change makes the lazy frame implementation more useful, by supporting the usage of groups in following functions:
Explorer.DataFrame.filter_with/2and the macro version of it,filter/2.Explorer.DataFrame.sort_with/3, although it ignores "maintain order" and "nulls last" options when used with groups.Explorer.DataFrame.mutate_with/2and its macro version,mutate/2.
Changed
We now avoid raising an exception if a non existent column is used in
Explorer.DataFrame.discard/2.Make the dependency of
cacertsoptional. This is because people using Erlang/OTP 25 or above can use the certificates provided by the system. So you may need to add the dependency ofcacertsif your OTP version is older than that.Some precision differences in float operations may appear. This is due to an update in the Polars version to "v0.38.1". Polars is our default backend.
Fixed
Fix
Explorer.Series.split/2inside the context ofExplorer.Query.Add optional
X-Amz-Security-Tokenheader to S3 request. This is needed in case the user is passing down atokenfor authentication.Fix
Explorer.DataFrame.sort_by/3with groups to respect:nilsoption. This is considering only the eager implementation.Fix inspection of lazy frames in remote nodes.
v0.8.1 - 2024-02-24
Added
Add
Explorer.Series.field/2to extract a field from a struct series. It returns a new series with the field's dtype.Add
Explorer.Series.json_decode/2that can decode a string series containing valid JSON objects according todtype.Add eager
count/1and lazysize/1toExplorer.Series.Add support for maps as expressions inside
Explorer.Query. They are "converted" to structs.Add
json_path_match/2to extract a string series from a string containing valid JSON objects. See the article JSONPath - XPath for JSON for details about JSON paths.Add
Explorer.Series.row_index/1to retrieve the index of rows starting from 0.Add support for passing the
:oncolumn directly (instead of inside a list) inExplorer.DataFrame.join/3.
Changed
Remove some deprecated functions from documentation.
Change internal representation of the
:structdtype to use list of tuples instead of a map to represent the dtypes of each field. This shouldn't break because we normalise maps to lists when a struct dtype is passed infrom_list/2orcast/2.Update Rustler minimum version to
~> 0.31. Since Rustler is optional, this shouldn't affect most of the users.
Fixed
Fix float overflow error to avoid crashing the VM, and instead it returns an argument error.
Fix
Explorer.DataFrame.print/2for when the DF contains structs.
v0.8.0 - 2024-01-20
Added
Add
explode/2toExplorer.DataFrame. This function is useful to expand the contents of a{:list, inner_dtype}series into a "inner_dtype" series.Add the new series functions
all?/1andany?/1, to work with boolean series.Add support for the "struct" dtype. This new dtype represents the struct dtype from Polars/Arrow.
Add
map/2andmap_with/2to theExplorer.Seriesmodule. This change enables the usage of theExplore.Queryfeatures in a series.Add
sort_by/2andsort_with/2to theExplorer.Seriesmodule. This change enables the usage of the lazy computations and theExplorer.Querymodule.Add
unnest/2toExplorer.DataFrame. It works by taking the fields of a "struct" - the new dtype - and transform them into columns.Add pairwise correlation -
Explorer.DataFrame.correlation/2- to calculate the correlation between numeric columns inside a data frame.Add pairwise covariance -
Explorer.DataFrame.covariance/2- to calculate the covariance between numeric columns inside a data frame.Add support for more integer dtypes. This change introduces new signed and unsigned integer dtypes:
{:s, 8},{:s, 16},{:s, 32}{:u, 8},{:u, 16},{:u, 32},{:u, 64}.
The existing
:integerdtype is now represented as{:s, 64}, and it's still the default dtype for integers. But series and data frames can now work with the new dtypes. Short names for these new dtypes can be used in functions likeExplorer.Series.from_list/2. For example,{:u, 32}can be represented with the atom:u32.This may bring more interoperability with Nx, and with Arrow related things, like ADBC and Parquet.
Add
ewm_standard_deviation/2andewm_variance/2toExplorer.Series. They calculate the "exponentially weighted moving" variance and standard deviation.Add support for
:skip_rows_after_headeroption for the CSV reader functions.Support
{:list, numeric_dtype}forExplorer.Series.frequencies/1.Support pins in
cond, inside the context ofExplorer.Query.Introduce the
:nulldtype. This is a special dtype from Polars and Apache Arrow to represent "all null" series.Add
Explorer.DataFrame.transpose/2to transpose a data frame.
Changed
Rename the functions related to sorting/arranging of the
Explorer.DataFrame. Nowarrange_withis namedsort_with, andarrangeissort_by.The
sort_by/3is a macro and it is going to work using theExplorer.Querymodule. On the other side, thesort_with/2uses a callback function.Remove unnecessary casts to
{:s, 64}now that we support more integer dtypes. It affects some functions, like the following in theExplorer.Seriesmodule:argsortcountrankday_of_week,day_of_year,week_of_year,month,year,hour,minute,secondabscliplengthsslicen_distinctfrequencies
And also some functions from the
Explorer.DataFramemodule:mutate- mostly because of series changessummarise- mostly because of series changesslice
Fixed
Fix inspection of series and data frames between nodes.
Fix cast of
:stringseries to{:datetime, any()}Fix mismatched types in
Explorer.Series.pow/2, making it more consistent.Normalize sorting options.
Fix functions with dtype mismatching the result from Polars. This fix is affecting the following functions:
quantile/2in the context of a lazy seriesmode/1inside a summarisationstrftime/2in the context of a lazy seriesmutate_with/2when creating a column from aNaiveDateTimeorExplorer.Duration.
v0.7.2 - 2023-11-30
Added
Add the functions
day_of_year/1andweek_of_year/1toExplorer.Series.Add
filter/2- a macro -, andfilter_with/2toExplorer.Series.This change enables the usage of queries - using
Explorer.Query- when filtering a series. The main difference is that series does not have a name when used outside a dataframe. So to refer to itself inside the query, we can use the special_variable.iex> s = Explorer.Series.from_list([1, 2, 3]) iex> Explorer.Series.filter(s, _ > 2) #Explorer.Series< Polars[1] integer [3] >Add support for the
{:list, any()}dtype, whereany()can be any other valid dtype. This is a recursive dtype, that can represent nested lists. It's useful to group data together in the same series.Add
Explorer.Series.mode/2to get the most common value(s) of the series.Add
split/2andjoin/2to theExplorer.Seriesmodule. These functions are useful to split string series into{:list, :string}, or to join parts of a{:list, :string}and return a:stringseries.Expose
ddofoption for variance, covariance and standard deviation.Add a new
{:f, 32}dtype to represent 32 bits float series. It's also possible to use the atom:f32to create this type of series. The atom:f64can be used as an alias for{:f, 64}, just like the:floatatom.Add
lengths/1andmember?/2toExplorer.Series. These functions work with{:list, any()}, whereany()is any valid dtype. The idea is to count the members of a "list" series, and check if a given value is member of a list series, respectively.Add support for streaming parquet files from a lazy dataframe to AWS S3 compatible services.
Changed
Remove restriction on
pivot_widerdtypes. In the early days, Polars only supported numeric dtypes for the "first" aggregation. This is not true anymore, and we can lift this restriction.Change
:floatdtype to be represented as{:f, 64}. It's still possible to use the atom:floatto create float series, but nowExplorer.Series.dtype/1returns{:f, 64}for float 64 bits series.
Fixed
Add missing implementation of
Explorer.Series.replace/3for lazy series.Fix inspection of DFs and series when
limit: :infinityis used.
Removed
Drop support for the
riscv64gc-unknown-linux-gnutarget.We decided to stop precompiling to this target because it's been hard to maintain it. Ideally we should support it again in the future.
v0.7.1 - 2023-09-25
Added
Add more temporal arithmetic operations. This change makes possible to mix some datatypes, like
date,durationand scalar types likeintegersandfloats.The following operations are possible now:
date - datedate + durationdate - durationduration + dateduration * integerduration * floatduration / integerduration / floatinteger * durationfloat * duration
Support lazy dataframes on
Explorer.DataFrame.print/2.Add support for strings as the "indexes" of
Explorer.Series.categorise/2. This makes possible to categorise a string series with a categories series.Introduce
cond/1support in queries, which enables multi-clause conditions. Example of usage:iex> df = DF.new(a: [10, 4, 6]) iex> DF.mutate(df, ...> b: ...> cond do ...> a > 9 -> "Exceptional" ...> a > 5 -> "Passed" ...> true -> "Failed" ...> end ...> ) #Explorer.DataFrame< Polars[3 x 2] a integer [10, 4, 6] b string ["Exceptional", "Failed", "Passed"] >Similar to
cond/1, this version also introduces support for theif/2andunless/2macros inside queries.Allow the usage of scalar booleans inside queries.
Add
Explorer.Series.replace/3for string series. This enables the replacement of patterns inside string series.
Deprecated
- Deprecate
Explorer.DataFrame.to_lazy/1in favor of justlazy/1.
Fixed
Fix the
Explorer.Series.in/2function to work with series of the:categorydtype.Now, if both series shares the same categories, we can compare them. To make sure that a categorical series shares the same categories from another series, you must create that series using the
Explorer.Series.categorise/2function.Display the dtype of duration series correctly in
Explorer.DataFrame.print/2.
v0.7.0 - 2023-08-28
Added
Enable reads and writes of dataframes from/to external file systems.
It supports HTTP(s) URLs or AWS S3 locations.
This feature introduces the FSS abstraction, which is also going to be present in newer versions of Kino. This is going to make the integration of Livebook files with Explorer much easier.
The implementation is done differently, depending on which file format is used, and if it's a read or write. All the writes to AWS S3 are done in the Rust side - using an abstraction called
CloudWriter-, and most of the readers are implemented in Elixir, by doing a download of the files, and then loading the dataframe from it. The only exception is the reads of parquet files, which are done in Rust, using Polars'scan_parquetwith streaming.We want to give a special thanks to Qqwy / Marten for the
CloudWriterimplementation!Add ADBC: Arrow Database Connectivity.
Continuing with improvements in the IO area, we added support for reading dataframes from databases using ADBC, which is similar in idea to ODBC, but integrates much better with Apache Arrow, that is the backbone of Polars - our backend today.
The function
Explorer.DataFrame.from_query/1is the entrypoint for this feature, and it allows quering databases like PostgreSQL, SQLite and Snowflake.Check the Elixir ADBC bindings docs for more information.
For the this feature, we had a fundamental contribution from Cocoa in the ADBC bindings, so we want to say a special thanks to her!
We want to thank the people that joined José in his live streamings on Twitch, and helped to build this feature!
Add the following functions to
Explorer.Series:Add duration dtypes. This is adds the following dtypes:
{:duration, :nanosecond}{:duration, :microsecond}{:duration, :millisecond}
This feature was a great contribution from Billy Lanchantin, and we want to thank him for this!
Changed
Return exception structs instead of strings for all IO operation errors, and for anything that returns an error from the NIF integration.
This change makes easier to define which type of error we want to raise.
Update Polars to v0.32.
With that we made some minor API changes, like changing some options for
cut/qcutoperations in theExplorer.Seriesmodule.Use
nil_valuesinstead ofnull_characterfor IO operations.Never expect
nilfor CSV IO dtypes.Rename
Explorer.DataFrame.table/2toExplorer.DataFrame.print/2.Change
:datetimedtype to be{:datetime, time_unit}, where time unit can be the following::millisecond:microsecond:nanosecond
Rename the following
Seriesfunctions:trim/1tostrip/2trim_leading/1tolstrip/2trim_trailing/1torstrip/2
These functions now support a string argument.
Fixed
Fix warnings for the upcoming Elixir v1.16.
Fix
Explorer.Series.abs/1type specs.Allow comparison of strings with categories.
Fix
Explorer.Series.is_nan/1inside the context ofExplorer.Query. The NIF function was not being exported.
v0.6.1 - 2023-07-06
Fixed
- Fix summarise without groups for lazy frames.
v0.6.0 - 2023-07-05
Added
Add support for OTP 26 and Elixir 1.15.
Allow
Explorer.DataFrame.summarise/2to work without groups. The aggregations can work considering the entire dataframe.Add the following series functions:
product/1,cummulative_product/1,abs/1,skew/2,window_standard_deviation/3,rank/2,year/1,mounth/1,day/1,hour/1,minute/1,second/1,strptime/2,strftime/2,argmin/1,argmax/1,cut/3,qcut/3,correlation/3,covariance/2andclip/3.They cover a lot in terms of functionality, so please check the
Explorer.Seriesdocs for further details.Add
Explorer.DataFrame.nil_count/1that counts the number of nil elements in each column.Add
Explorer.DataFrame.frequencies/2that creates a new dataframe with unique rows and the frequencies of each.Add
Explorer.DataFrame.relocate/3that enables changing order of columns from a df.Add precompiled NIFs for FreeBSD.
Support scalar values in the
on_trueandon_falsearguments ofExplore.Series.select/3.
Fixed
Fix
Series.day_of_week/1andSeries.round/2for operations using a lazy frame.Fix upcasted date to datetime for literal expressions. It allows to use scalar dates in expressions like this:
DF.mutate(a: ~D[2023-01-01]). This also fixes the support for naive datetimes.Improve error messages returned from the NIF to be always strings. Now we add more context to the string returned, instead of having
{:context, error_message}.Fix the
:infer_schema_lengthoption ofExplorer.DataFrame.from_csv/2when passingnil. Now it's possible to take into account the entire file to infer the schema.
Deprecated
- Deprecate
Explorer.Series.to_date/1andExplorer.Series.to_time/1in favor of usingExplorer.Series.cast(s, :date)andExplorer.Series.cast(s, :time)respectively.
v0.5.7 - 2023-05-10
Added
Allow
Explorer.Series.select/3to receive series of size 1 in both sides.Add trigonometric functions
sin/1,cos/1,tan/1,asin/1,acos/1andatan/1toExplorer.Series.Add
Explorer.DataFrame.to_rows_stream/2function. This is useful to traverse dataframes with large series, but is not recommended since it can be an expensive operation.Add LazyFrame version of
Explorer.DataFrame.to_ipc/3.Add options to control streaming when writing lazy dataframes. Now users can toggle streaming for the
to_ipc/3andto_parquet/3functions.Add
Explorer.DataFrame.from_ipc_stream/2lazy, but using the eager implementation underneath.Add option to control the end of line (EOF) char when reading CSV files. We call this new option
:eol_delimiter, and it's available for thefrom_csv/2andload_csv/2functions in theExplorer.DataFramemodule.Allow
Explorer.DataFrame.pivot_wider/4to use category fields.
Fixed
Fix
nif_not_loadederror whenExplorer.Series.ewm_mean/2is called from query.Type check arguments for boolean series operations, only allowing series of the
booleandtype.Do not use
../0in order to keep compatible with Elixir 1.13
Removed
- Temporarely remove support for ARM 32 bits computers in the precompilation workflow.
v0.5.6 - 2023-03-24
Added
- Add the following functions to the
Explorer.Seriesmodule:log/1,log/2andexp/1. They compute the logarithm and exponential of a series.
Fixed
Allow
Explorer.Series.select/3to receive series of size 1 for both theon_trueandon_falsearguments.Fix the encoding of special float values that may return from some series functions. This is going to encode the atoms for NaN and infinity values.
v0.5.5 - 2023-03-13
Added
Add support for multiple value columns in pivot wider. The resultant dataframe that is created from this type of pivoting is going to have columns with the names prefixed by the original value column, followed by an underscore and the name of the variable.
Add
Explorer.Series.ewm_mean/2for calculating exponentially weighted moving average.
Changed
Change the
Explorer.Backend.DataFrame'spivot_widercallback to work with multiple columns instead of only one.Change the
Explorer.Backend.DataFrame'swindow_*callbacks to work with variables instead of keyword args. This is needed to make explicit when a backend is not implementing an option.Change the
Explorer.Backend.DataFrame'sdescribecallback and remove the need for an "out df", since we won't have a lazy version of that funcion.This shouldn't affect the API, but we had an update in Polars. It is now using
v0.27.2. For further details, see: Rust Polars 0.27.0.
Fixed
Provide hints when converting string/binary series to tensors.
Add libatomic as a link to the generated NIF. This is needed to fix the load of the Explorer NIF when running on ARM 32 bits machines like the Pi 3. See the original issue
v0.5.4 - 2023-03-09
Fixed
- Fix missing "README.md" file in the list of package files.
Our readme is now required in compilation, because it contains the moduledoc for
the main
Explorermodule.
v0.5.3 - 2023-03-08
Added
Add the
Explorer.Series.format/1function that concatenates multiple series together, always returning a string series.With the addition of
format/1, we also have a new operator for string concatenation insideExplorer.Query. It is the<>operator, that is similar to what theKernel.<>/2operator does, but instead of concatenating strings, it concatenates two series, returning a string series - it is usingformat/1underneath.Add support for slicing by series in dataframes and other series.
Add support for 2D tensors in
Explorer.DataFrame.new/2.
Fixed
Fix
Explorer.DataFrame.new/2to respect the selected dtype when an entire series is nil.Improve error message for mismatched dtypes in series operations.
Fix lazy series operations of binary series and binary values. This is going to wrap binary values in the correct dtype, in order to pass down to Polars.
Fix two bugs in
Explorer.DataFrame.pivot_wider/3:nilvalues in the series that is used for new column names is now correctly creating anilcolumn. We also fixed the problem of a duplicated column created after pivoting, and possibly conflicting with an existing ID column. We add a suffix for these columns.
v0.5.2 - 2023-02-28
Added
Add
acrossand comprehensions toExplorer.Query. These features allow a more flexible and elegant way to work with multiple columns at once. Example:iris = Explorer.Datasets.iris() Explorer.DataFrame.mutate(iris, for col <- across(["sepal_width", "sepal_length", "petal_length", "petal_width"]) do {col.name, (col - mean(col)) / variance(col)} end )See the
Explorer.Querydocumentation for further details.Add support for regexes to select columns of a dataframe. Example:
df = Explorer.Datasets.wine() df[~r/(class|hue)/]Add the
:max_rowsand:columnsoptions toExplorer.DataFrame.from_parquet/2. This mirrors thefrom_csv/2function.Allow
Explorer.Seriesfunctions that accept floats to work with:nan,:infinityand:neg_infinityvalues.Add
Explorer.DataFrame.shuffle/2andExplorer.Series.shuffle/2.Add support for a list of filters in
Explorer.DataFrame.filter/2. These filters are joined asandexpressions.
Fixed
- Add
is_integer/1guard toExplorer.Series.shift/2. - Raise if series sizes do not match for binary operations.
Changed
Rename the option
:replacementto:replaceforExplorer.DataFrame.sample/3andExplorer.Series.sample/3.Change the default behaviour of sampling to not shuffle by default. A new option named
:shufflewas added to control that.
v0.5.1 - 2023-02-17
Added
Add boolean dtype to
Series.in/2.Add binary dtype to
Series.in/2.Add
Series.day_of_week/1.Allow
Series.fill_missing/2to:- receive
:infinityand:neg_infinityvalues. - receive date and datetime values.
- receive binary values.
- receive
Add support for
timedtype.Add version of
Series.pow/2that accepts series on both sides.Allow
Series.from_list/2to receive:nan,:infinityand:neg_infinityatoms.Add
Series.to_date/1andSeries.to_time/1for datetime series.Allow casting of string series to category.
Accept tensors when creating a new dataframe.
Add compatibility with Nx v0.5.
Add support for Nx's serialize and deserialize.
Add the following function implementations for the Polars' Lazy dataframe backend:
arrange_withconcat_columnsconcat_rowsdistinctdrop_nilfilter_withjoinmutate_withpivot_longerrenamesummarise_withto_parquet
Only
summarise_withsupports groups for this version.
Changed
- Require version of Rustler to be
~> 0.27.0, which mirrors the NIF requirement.
Fixed
- Casting to an unknown dtype returns a better error message.
v0.5.0 - 2023-01-12
Added
Add
DataFrame.describe/2to gather some statistics from a dataframe.Add
Series.nil_count/1to count nil values.Add
Series.in/2to check if a given value is inside a series.Add
Seriesfloat predicates:is_finite/1,is_infinite/1andis_nan/1.Add
Seriesstring functions:contains/2,trim/1,trim_leading/1,trim_trailing/1,upcase/1anddowncase/1.Enable slicing of lazy frames (
LazyFrame).Add IO operations "from/load" to the lazy frame implementation.
Add support for the
:lazyoption in theDataFrame.new/2function.Add
Seriesfloat rounding methods:round/2,floor/1andceil/1.Add support for precompiling to Linux running on RISCV CPUs.
Add support for precompiling to Linux - with musl - running on AARCH64 computers.
Allow
DataFrame.new/1to receive the:dtypesoption.Accept
:nanas an option forSeries.fill_missing/2with float series.Add basic support for the categorical dtype - the
:categorydtype.Add
Series.categories/1to return categories from a categorical series.Add
Series.categorise/2to categorise a series of integers using predefined categories.Add
Series.replace/2to replace the contents of a series.Support selecting columns with unusual names (like with spaces) inside
Explorer.Querywithcol/1.The usage is like this:
Explorer.DataFrame.filter(df, col("my col") > 42)
Fixed
- Fix
DataFrame.mutate/2using a boolean scalar value. - Stop leaking
UInt32series to Elixir. - Cast numeric columns to our supported dtypes after IO read. This fix is only applied for the eager implementation for now.
Changed
- Rename
Series.bintype/1toSeries.iotype/1.
v0.4.0 - 2022-11-29
Added
Add
Series.quotient/2andSeries.remainder/2to work with integer division.Add
Series.iotype/1to return the underlying representation type.Allow series on both sides of binary operations, like:
add(series, 1)andadd(1, series).Allow comparison, concat and coalesce operations on "(series, lazy series)".
Add lazy version of
Series.sample/3andSeries.size/1.Add support for Arrow IPC Stream files.
Add
Explorer.Queryand the macros that allow a simplified query API. This is a huge improvement to some of the main functions, and allow refering to columns as they were variables.Before this change we would need to write a filter like this:
Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col1"], 42))But now it's also possible to write this operation like this:
Explorer.DataFrame.filter(df, col1 > 42)This operation is going to use
filter_with/2underneath, which means that is going to use lazy series and compute the results at once. Notice that is mandatory to "require" the DataFrame module, since these operations are implemented as macros.The following new macros were added:
filter/2mutate/2summarise/2arrange/2
They substitute older versions that did not accept the new query syntax.
Add
DataFrame.put/3to enable adding or replacing columns in a eager manner. This works similar to the previous version ofmutate/2.Add
Series.select/3operation that enables selecting a value from two series based on a predicate.Add "dump" and "load" functions to IO operations. They are useful to load or dump dataframes from/to memory.
Add
Series.to_iovec/2andSeries.to_binary/1. They return the underlying representation of series as binary. The first one returns a list of binaries, possibly with one element if the series is contiguous in memory. The second one returns a single binary representing the series.Add
Series.shift/2that shifts the series by an offset with nil values.Rename
Series.fetch!/2andSeries.take_every/2toSeries.at/2andSeries.at_every/2.Add
DataFrame.discard/2to drop columns. This is the opposite ofselect/2.Implement
Nx.LazyContainerforExplorer.DataFrameandExplorer.Seriesso data can be passed into Nx.Add
Series.not/1that negates values in a boolean series.Add the
:binarydtype for Series. This enables the usage of arbitrary binaries.
Changed
- Change DataFrame's
to_*functions to return only:ok. - Change series inspect to resamble the dataframe inspect with the backend name.
- Rename
Series.var/1toSeries.variance/1 - Rename
Series.std/1toSeries.standard_deviation/1 - Rename
Series.count/2toSeries.frequencies/1and add a newSeries.count/1that returns the size of an "eager" series, or the count of members in a group for a lazy series. In case there is no groups, it calculates the size of the dataframe. - Change the option to control direction in
Series.sort/2andSeries.argsort/2. Instead of a boolean, now we have a new option called:directionthat accepts:ascor:desc.
Fixed
- Fix the following DataFrame functions to work with groups:
filter_with/2head/2tail/2slice/2slice/3pivot_longer/3pivot_wider/4concat_rows/1concat_columns/1
- Improve the documentation of functions that behave differently with groups.
- Fix
arrange_with/2to use "group by" stable, making results more predictable. - Add
nilas a possible return value of aggregations. - Fix the behaviour of
Series.sort/2andSeries.argsort/2to add nils at the front when direction is descending, or at the back when the direction is ascending. This also adds an option to control this behaviour.
Removed
- Remove support for
NDJSONread and write for ARM 32 bits targets. This is due to a limitation of a dependency of Polars.
v0.3.1 - 2022-09-09
Fixed
- Define
multiplyinside*_withoperations. - Fix column types in several operations, such as
n_distinct.
v0.3.0 - 2022-09-01
Added
Add
DataFrame.concat_columns/1andDataFrame.concat_columns/2for horizontally stacking dataframes.Add compression as an option to write parquet files.
Add count metadata to
DataFrametable reader.Add
DataFrame.filter_with/2,DataFrame.summarise_with/2,DataFrame.mutate_with/2andDataFrame.arrange_with/2. They all accept aDataFrameand a function, and they all work with a new concept called "lazy series".Lazy Series is an opaque representation of a series that can be used to perform complex operations without pulling data from the series. This is faster than using masks. There is no big difference from the API perspective compared to the functions that were accepting callbacks before (eg.
filter/2and the newfilter_with/2), with the exception beingDataFrame.summarise_with/2that now accepts a lot more operations.
Changed
- Bump version requirement of the
tabledependency to~> 0.1.2, and raise for non-tabular values. - Normalize how columns are handled. This changes some functions to accept one column or a list of columns, ranges, indexes and callbacks selecting columns.
- Rename
DataFrame.filter/2toDataFrame.mask/2. - Rename
Series.filter/2toSeries.mask/2. - Rename
take/2from bothSeriesandDataFrametoslice/2.slice/2now they accept ranges as well. - Raise an error if
DataFrame.pivot_wider/4has float columns as IDs. This is because we can´t properly compare floats. - Change
DataFrame.distinct/2to accept columns as argument instead of receiving it as option.
Fixed
- Ensure that we can compare boolean series in functions like
Series.equal/2. - Fix rename of columns after summarise.
- Fix inspect of float series containing
NaNorInfinityvalues. They are represented as atoms.
Deprecated
- Deprecate
DataFrame.filter/2with a callback in favor ofDataFrame.filter_with/2.
v0.2.0 - 2022-06-22
Added
- Consistently support ranges throughout the columns API
- Support negative indexes throughout the columns API
- Integrate with the
tablepackage - Add
Series.to_enum/1for lazily traversing the series - Add
Series.coalesce/1andSeries.coalesce/2for finding the first non-null value in a list of series
Changed
Series.length/1is nowSeries.size/1in keeping with Elixir idiomsNxis now an optional dependency- Minimum Elixir version is now 1.13
DataFrame.to_map/2is nowDataFrame.to_columns/2andDataFrame.to_series/2Rustleris now an optional dependencyread_andwrite_IO functions are nowfrom_andto_to_binaryis nowdump_csv- Now uses
polars's "simd" feature - Now uses
polars's "performant" feature Explorer.default_backend/0is nowExplorer.Backend.get/0Explorer.default_backend/1is nowExplorer.Backend.put/1Series.cum_*functions are nowSeries.cumulative_*to mirrorNxSeries.rolling_*functions are nowSeries.window_*to mirrorNxreverse?is now an option instead of an argument inSeries.cumulative_*functionsDataFrame.from_columns/2andDataFrame.from_rows/2is nowDataFrame.new/2- Rename "col" to "column" throughout the API
- Remove "with_" prefix in options throughout the API
DataFrame.table/2accepts options with:limitinstead of single integerrename/2no longer accepts a function, userename_with/2insteadrename_with/3now expects the function as the last argument
Fixed
- Explorer now works on Linux with musl
v0.1.1 - 2022-04-27
Security
v0.1.0 - 2022-04-26
First release.