View Source ReqAthena (ReqAthena v0.3.0)

Req plugin for AWS Athena.

ReqAthena makes it easy to make Athena queries and save the results into S3 buckets.

By default, ReqAthena will query results and use the default output format, which is CSV. To change that, you can use the :format option documented bellow.

Summary

Functions

Builds a new %Req.Request{} for Athena requests.

Performs a query against the Athena API.

Functions

@spec new(keyword()) :: Req.Request.t()

Builds a new %Req.Request{} for Athena requests.

Request Options

  • :access_key_id - Required. The Access Key ID from AWS credentials.

  • :secret_access_key - Required. The Secret Access Key from AWS credentials.

  • :token - Optional. The Session Token from AWS credentials.

  • :region - Required. The AWS region where AWS Athena is installed.

  • :database - Required. The AWS Athena database name.

  • :output_location - Optional. The S3 URL location to output AWS Athena query results. Results will be saved as Parquet and loaded with Explorer only if this option is given.

  • :workgroup - Conditional. The AWS Athena workgroup.

  • :cache_query - Optional. Forces a non-cached result from AWS Athena.

  • :format - Optional. The output format. Can be one of:

    • :none (default) - return decoded API response from Athena.

    • :csv - return contents of the CSV file.

    • :json - return contents of the JSON file.

      Note: Req by default automatically decodes JSON response body (Req.Steps.decode_body/1 step) and to prevent it from doing so, set decode_body: false.

    • :explorer - return contents in parquet format, lazy loaded into Explorer data frame.

    There are some limitations when using the :json and :explorer format. First, you need to install Explorer in order to use the :explorer format. Second, when using these format, you always need to provide a different output location. See the UNLOAD command docs for more details.

  • :output_compression - Optional. Sets the Parquet compression format and level for the output when using the Explorer output format. This can be a string, like "gzip", or a tuple with {format, level}, like: {"ZSTD", 4}. By default this is nil, which means that for Parquet (the format that Explorer uses) this is going to be "gzip".

There is a limitation of Athena that requires the :output_location to be present for every query that outputs to a format other than "CSV". So we append "results" to the :output_location to make the partition files be saved there.

Conditional fields must always be defined, and can be one of the fields or both.

Link to this function

query(req, sql_query, sql_query_params \\ [], opts \\ [])

View Source
@spec query(Req.Request.t(), binary(), list(), Keyword.t()) ::
  {:ok, Req.Response.t()} | {:error, Exception.t()}

Performs a query against the Athena API.

The SQL query can container ? placeholders and sql_query_params is a list of corresponding values.

This function accepts the same options as new/1.

Examples

With plain query:

iex> opts = [
...>   access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
...>   secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
...>   region: System.fetch_env!("AWS_REGION"),
...>   database: "default",
...>   output_location: System.fetch_env!("AWS_ATHENA_OUTPUT_LOCATION")
...> ]
iex> req = ReqAthena.new(opts)
iex> query = "SELECT id, type, tags, members, timestamp, visible FROM planet WHERE id = 470454 and type = 'relation'"
iex> ReqAthena.query!(req, query, [], format: :json).body
%{
  "id" => 470454,
  "members" => [
    %{"ref" => 670007839, "role" => "", "type" => "node"},
    %{"ref" => 670007840, "role" => "", "type" => "node"}
  ],
  "tags" => %{
    "name" => "Mérignac A",
    "network" => "NTF-5",
    "ref" => "17229A",
    "site" => "geodesic",
    "source" => "©IGN 2010 dans le cadre de la cartographie réglementaire",
    "type" => "site",
    "url" => "http://geodesie.ign.fr/fiches/index.php?module=e&action=fichepdf&source=carte&sit_no=17229A"
  },
  "timestamp" => "2017-01-21 12:51:34",
  "type" => "relation",
  "visible" => true
}

With parameterized query:

iex> opts = [
...>   access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
...>   secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
...>   region: System.fetch_env!("AWS_REGION"),
...>   database: "default",
...>   output_location: System.fetch_env!("AWS_ATHENA_OUTPUT_LOCATION")
...> ]
iex> req = ReqAthena.new(opts)
iex> query = "SELECT id, type FROM planet WHERE id = ? and type = ?"
iex> ReqAthena.query!(req, query, [239_970_142, "node"], format: :json).body
[%{"id" => 239970142, "type" => "node"}]
Link to this function

query!(req, sql_query, sql_query_params \\ [], opts \\ [])

View Source
@spec query!(Req.Request.t(), binary(), list(), Keyword.t()) :: Req.Response.t()

Same as query/4, but raises in case of error.