SparkEx.DataFrame.NA (SparkEx v0.1.0)

Copy Markdown View Source

Null-value handling sub-API for DataFrames.

Provides fill/2, drop/1, and replace/3 operations that return new lazy DataFrames with the corresponding NA plan tuples.

Accessed via SparkEx.DataFrame.fillna/2, SparkEx.DataFrame.dropna/1, SparkEx.DataFrame.replace/3, or directly.

Summary

Functions

Drops rows containing null values.

Fills null values with the given replacement.

Replaces values in the DataFrame.

Functions

drop(df, opts \\ [])

Drops rows containing null values.

Options

  • :how:any (default) drops rows with any null; :all drops rows where all values are null.
  • :thresh — minimum number of non-null values required to keep a row. Overrides :how when provided.
  • :subset — list of column names to consider.

Examples

DataFrame.NA.drop(df)
DataFrame.NA.drop(df, how: :all)
DataFrame.NA.drop(df, thresh: 2, subset: ["age", "name"])

fill(df, value, opts \\ [])

Fills null values with the given replacement.

Parameters

  • value — scalar (int, float, string, bool) to fill all null values, or a map %{"column_name" => replacement_value} for column-specific fills.
  • opts — keyword options:
    • :subset — list of column names to restrict the fill to.

Examples

DataFrame.NA.fill(df, 0)
DataFrame.NA.fill(df, %{"age" => 0, "name" => "unknown"})
DataFrame.NA.fill(df, 0, subset: ["age", "salary"])

replace(df, to_replace, value \\ nil, opts \\ [])

@spec replace(SparkEx.DataFrame.t(), term(), term(), keyword()) ::
  SparkEx.DataFrame.t()

Replaces values in the DataFrame.

Forms

  • replace(df, %{old => new, ...}) — replacement map
  • replace(df, old_value, new_value) — single replacement
  • replace(df, [old1, old2], [new1, new2]) — parallel lists

Options

  • :subset — list of column names to restrict replacements to.

Examples

DataFrame.NA.replace(df, %{0 => 100, -1 => 0})
DataFrame.NA.replace(df, "N/A", nil)
DataFrame.NA.replace(df, [1, 2], [10, 20], subset: ["score"])