Null-value handling sub-API for DataFrames.
Provides fill/2, drop/1, and replace/3 operations that return
new lazy DataFrames with the corresponding NA plan tuples.
Accessed via SparkEx.DataFrame.fillna/2, SparkEx.DataFrame.dropna/1,
SparkEx.DataFrame.replace/3, or directly.
Summary
Functions
Drops rows containing null values.
Fills null values with the given replacement.
Replaces values in the DataFrame.
Functions
@spec drop( SparkEx.DataFrame.t(), keyword() ) :: SparkEx.DataFrame.t()
Drops rows containing null values.
Options
:how—:any(default) drops rows with any null;:alldrops rows where all values are null.:thresh— minimum number of non-null values required to keep a row. Overrides:howwhen provided.:subset— list of column names to consider.
Examples
DataFrame.NA.drop(df)
DataFrame.NA.drop(df, how: :all)
DataFrame.NA.drop(df, thresh: 2, subset: ["age", "name"])
@spec fill(SparkEx.DataFrame.t(), term(), keyword()) :: SparkEx.DataFrame.t()
Fills null values with the given replacement.
Parameters
value— scalar (int, float, string, bool) to fill all null values, or a map%{"column_name" => replacement_value}for column-specific fills.opts— keyword options::subset— list of column names to restrict the fill to.
Examples
DataFrame.NA.fill(df, 0)
DataFrame.NA.fill(df, %{"age" => 0, "name" => "unknown"})
DataFrame.NA.fill(df, 0, subset: ["age", "salary"])
@spec replace(SparkEx.DataFrame.t(), term(), term(), keyword()) :: SparkEx.DataFrame.t()
Replaces values in the DataFrame.
Forms
replace(df, %{old => new, ...})— replacement mapreplace(df, old_value, new_value)— single replacementreplace(df, [old1, old2], [new1, new2])— parallel lists
Options
:subset— list of column names to restrict replacements to.
Examples
DataFrame.NA.replace(df, %{0 => 100, -1 => 0})
DataFrame.NA.replace(df, "N/A", nil)
DataFrame.NA.replace(df, [1, 2], [10, 20], subset: ["score"])