Data source reader APIs for creating DataFrames from external sources.
Examples
df = SparkEx.Reader.table(session, "catalog.db.my_table")
df = SparkEx.Reader.parquet(session, "/data/events.parquet")
df = SparkEx.Reader.csv(session, "/data/users.csv", header: true, infer_schema: true)
df = SparkEx.Reader.json(session, "/data/logs.json")
Summary
Functions
Creates a DataFrame by reading Avro file(s).
Creates a DataFrame by reading binary files.
Creates a DataFrame by reading CSV file(s).
Sets the source format for a reader builder.
Creates a DataFrame by reading from JDBC.
Creates a DataFrame by reading JSON file(s).
Loads data from the configured source using reader builder state.
Creates a stateful reader builder.
Sets a single option on a reader builder.
Merges options into a reader builder.
Creates a DataFrame by reading ORC file(s).
Creates a DataFrame by reading Parquet file(s).
Sets the schema for a reader builder.
Reads a table from the catalog using reader builder options.
Creates a DataFrame from a named table (catalog table).
Creates a DataFrame by reading text file(s).
Creates a DataFrame by reading XML file(s).
Types
@type t() :: %SparkEx.Reader{ format: String.t() | nil, options: %{required(String.t()) => String.t()}, schema: String.t() | nil, session: GenServer.server() }
Functions
@spec avro(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading Avro file(s).
Options
:schema— optional schema string:options— map of Avro reader options
@spec binary_file(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading binary files.
Options
:options— map of BinaryFile reader options
@spec csv(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading CSV file(s).
Options
:schema— optional schema string:header— whether the CSV has a header row (maps to"header"option):infer_schema— whether to infer the schema (maps to"inferSchema"option):separator— field separator (maps to"sep"option):options— map of additional CSV reader options
Examples
df = SparkEx.Reader.csv(session, "/data/users.csv", header: true, infer_schema: true)
Sets the source format for a reader builder.
@spec jdbc( GenServer.server(), keyword() ) :: SparkEx.DataFrame.t()
@spec jdbc(GenServer.server(), String.t(), String.t(), keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading from JDBC.
Options
:options— map of JDBC reader options (e.g.url,dbtable):predicates— list of SQL predicate strings for JDBC predicate pushdown
@spec json(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading JSON file(s).
Options
:schema— optional schema string:options— map of JSON reader options
Examples
df = SparkEx.Reader.json(session, "/data/logs.json")
@spec load(t()) :: SparkEx.DataFrame.t()
Loads data from the configured source using reader builder state.
@spec load(t(), String.t() | [String.t()] | nil | keyword()) :: SparkEx.DataFrame.t()
@spec load(GenServer.server(), String.t()) :: SparkEx.DataFrame.t()
@spec load(t(), String.t() | [String.t()] | nil | keyword(), keyword()) :: SparkEx.DataFrame.t()
@spec load(GenServer.server(), String.t(), String.t() | [String.t()] | keyword()) :: SparkEx.DataFrame.t()
@spec load(GenServer.server(), String.t(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
@spec new(GenServer.server()) :: t()
Creates a stateful reader builder.
Mirrors PySpark's spark.read builder style.
Sets a single option on a reader builder.
Merges options into a reader builder.
@spec orc(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading ORC file(s).
Options
:schema— optional schema string:options— map of ORC reader options
Examples
df = SparkEx.Reader.orc(session, "/data/events.orc")
@spec parquet(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading Parquet file(s).
Options
:schema— optional schema string (e.g."id INT, name STRING"):options— map of Parquet reader options (default:%{})
Examples
df = SparkEx.Reader.parquet(session, "/data/events.parquet")
df = SparkEx.Reader.parquet(session, ["/data/part1.parquet", "/data/part2.parquet"])
@spec schema( t(), String.t() | SparkEx.Types.struct_type() | SparkEx.Types.data_type_proto() ) :: t()
Sets the schema for a reader builder.
Accepts either a DDL string, a struct type from SparkEx.Types,
or a Spark Connect DataType protobuf struct.
Examples
reader |> Reader.schema("id LONG, name STRING")
reader |> Reader.schema(SparkEx.Types.struct_type([
SparkEx.Types.struct_field("id", :long),
SparkEx.Types.struct_field("name", :string)
]))
@spec table(t(), String.t()) :: SparkEx.DataFrame.t()
Reads a table from the catalog using reader builder options.
@spec table(GenServer.server(), String.t(), keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame from a named table (catalog table).
Options
:options— map of string options (default:%{})
Examples
df = SparkEx.Reader.table(session, "my_database.my_table")
@spec text(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading text file(s).
Each line becomes a row with a single value column.
Options
:options— map of text reader options
Examples
df = SparkEx.Reader.text(session, "/data/lines.txt")
@spec xml(GenServer.server(), String.t() | [String.t()], keyword()) :: SparkEx.DataFrame.t()
Creates a DataFrame by reading XML file(s).
Options
:schema— optional schema string:options— map of XML reader options