Vessel v0.8.0 Vessel

The main interface for interacting with Vessel from withing application code.

This module contains many utilities related to interacting with the Vessel Job context, as well as convenience functions for logging, and writing values to the next Job steps.

Any function in this module should require the Job context as the first param, in order to future proof in case of new configuration options being added.

Summary

Functions

context(pairs \\ [])

Creates a new Vessel context using the provided pairs

get_conf(map, key, default \\ nil)

Retrieves a value from the Job configuration

get_meta(map, field, default \\ nil)

Retrieves a meta key and value from the context

get_private(map, field, default \\ nil)

Retrieves a private key and value from the context

inspect(value, ctx, opts \\ [])

Inspects a value and outputs to the Hadoop logs

log(ctx, msg)

Outputs a message to the Hadoop logs

modify(ctx, key, value)

Modifies a top level field in the Vessel context

put_conf(ctx, key, value)

Sets a variable in the Job configuration

put_meta(ctx, field, value)

Stores a meta key and value inside the context

put_private(ctx, field, value)

Stores a private key and value inside the context

update_counter(ctx, group, counter, amount \\ 1)

Updates a Hadoop Job counter

update_status(ctx, status)

Updates the status of the Hadoop Job

write(ctx, arg)

Writes a key/value Tuple to the Job context

write(ctx, key, value)

Writes a value to the Job context for a given key

Types

t()

Functions

context(pairs \\ [])

context(Keyword.t) :: Vessel.t

Creates a new Vessel context using the provided pairs.

The pairs provided overwrite the defaults. Context must be created this way as defaults can’t be provided at compile time (because things like :conf use runtime values).

get_conf(map, key, default \\ nil)

get_conf(Vessel.t, any, any) :: any

Retrieves a value from the Job configuration.

Configuration values are treated as environment variables to conform to Hadoop Streaming. We clone the environment into the context (to avoid setting the env values rather than the job variables).

We only allow lower case variables to enter the Job configuration, as this is the model used by Hadoop Streaming. This also filters out a lot of noise from default shell variables polluting the configuration (e.g. $HOME etc).

Using environment variables means that there’s a slight chance that you’ll receive a value from the env which isn’t actually a configuration variable, so please validate appropriately.

get_meta(map, field, default \\ nil)

get_meta(Vessel.t, any, any) :: any

Retrieves a meta key and value from the context.

This should not be used outside of the library modules.

get_private(map, field, default \\ nil)

get_private(Vessel.t, any, any) :: any

Retrieves a private key and value from the context.

An optional default value can be provided to be returned if the key does not exist in the private context. If not provided, nil will be used.

inspect(value, ctx, opts \\ [])

inspect(Vessel.t | any, Vessel.t | any, Keyword.t) :: any

Inspects a value and outputs to the Hadoop logs.

You can pass your value as either the first or second argument, as long as the other one is a Vessel context - this is to make it easier to chain, in the same way you would with IO.inspect/2.

This function uses :stderr as Hadoop is listening to all :stdio output as the results of your mapper - so going via :stdio would corrupt the Job values.

log(ctx, msg)

log(Vessel.t, binary | any) :: :ok

Outputs a message to the Hadoop logs.

This function uses :stderr as Hadoop is listening to all :stdio output as the results of your mapper - so going via :stdio would corrupt the Job values.

modify(ctx, key, value)

modify(Vessel.t, atom, any) :: Vessel.t

Modifies a top level field in the Vessel context.

This should not be used externally to the library itself, as it can error when used incorrectly (for example with invalid keys).

put_conf(ctx, key, value)

put_conf(Vessel.t, any, any) :: Vessel.t

Sets a variable in the Job configuration.

This operates in a similar way to put_private/3 except that it should only be used for Job configuration values (as a semantic difference).

This does not set the variable in the environment, as we clone the environment Job configuration on startup to avoid polluting the environment.

put_meta(ctx, field, value)

put_meta(Vessel.t, any, any) :: Vessel.t

Stores a meta key and value inside the context.

This should not be used outside of the library modules.

put_private(ctx, field, value)

put_private(Vessel.t, any, any) :: Vessel.t

Stores a private key and value inside the context.

This is where you can persist values between steps in the Job. You can think of it as the Job state. You should only change things in this Map, rather than placing things in the top level of the Job context.

update_counter(ctx, group, counter, amount \\ 1)

update_counter(Vessel.t, binary, binary, number) :: :ok

Updates a Hadoop Job counter.

This is a utility function to emit a Job counter in Hadoop Streaming. You may provide a custom amount to increment by, which defaults to 1 if not provided.

update_status(ctx, status)

update_status(Vessel.t, binary) :: :ok

Updates the status of the Hadoop Job.

This is a utility function to emit status in Hadoop Streaming.

write(ctx, arg)

write(Vessel.t, {any, any}) :: :ok

Writes a key/value Tuple to the Job context.

To stay compatible with Hadoop Streaming, this will emit to :stdio in the required format.

write(ctx, key, value)

write(Vessel.t, any, any) :: :ok

Writes a value to the Job context for a given key.

To stay compatible with Hadoop Streaming, this will emit to :stdio in the required format. The separator can be customized by settings custom separators inside the :meta map, and is modified as such by the mapper/reducer phases.