cubdb v0.17.0 CubDB View Source
CubDB
is an embedded key-value database written in the Elixir language. It
runs locally, it is schema-less, and backed by a single file.
Fetaures
Both keys and values can be any arbitrary Elixir (or Erlang) term.
Arbitrary selection of ranges of entries sorted by key with
select/3
Atomic transactions with
put_multi/2
,get_and_update_multi/4
, etc.Concurrent read operations, that do not block nor are blocked by writes
Unexpected shutdowns won't corrupt the database or break atomicity
Manual or automatic compaction to optimize space usage
To ensure consistency, performance, and robustness to data corruption, CubDB
database file uses an append-only, immutable B-tree data structure. Entries
are never changed in-place, and read operations are performend on immutable
snapshots.
More information can be found in the following sections:
Usage
Start CubDB
by specifying a directory for its database file (if not existing,
it will be created):
{:ok, db} = CubDB.start_link("my/data/directory")
Alternatively, to specify more options, a keyword list can be passed:
{:ok, db} = CubDB.start_link(data_dir: "my/data/directory", auto_compact: true)
Important: avoid starting multiple CubDB
processes on the same data
directory. Only one CubDB
process should use a specific data directory at any
time.
CubDB
functions can be called concurrently from different processes, but it
is important that only one CubDB
process is started on the same data
directory.
The get/2
, put/3
, and delete/2
functions work as you probably expect:
CubDB.put(db, :foo, "some value")
#=> :ok
CubDB.get(db, :foo)
#=> "some value"
CubDB.delete(db, :foo)
#=> :ok
CubDB.get(db, :foo)
#=> nil
Both keys and values can be any Elixir (or Erlang) term:
CubDB.put(db, {"some", 'tuple', :key}, %{foo: "a map value"})
#=> :ok
CubDB.get(db, {"some", 'tuple', :key})
#=> %{foo: "a map value"}
Multiple operations can be performed as an atomic transaction with
put_multi/2
, delete_multi/2
, and the other [...]_multi
functions:
CubDB.put_multi(db, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8])
#=> :ok
Range of entries sorted by key are retrieved using select/3
:
CubDB.select(db, min_key: :b, max_key: :e)
#=> {:ok, [b: 2, c: 3, d: 4, e: 5]}
But select/3
can do much more than that. It can apply a pipeline of operations
(map
, filter
, take
, drop
and more) to the selected entries, it can
select the entries in normal or reverse order, and it can reduce
the result
using an arbitrary function:
# Take the sum of the last 3 even values:
CubDB.select(db,
# select entries in reverse order
reverse: true,
# apply a pipeline of operations to the entries
pipe: [
# map each entry discarding the key and keeping only the value
map: fn {_key, value} -> value end,
# filter only even integers
filter: fn value -> is_integer(value) && Integer.is_even(value) end,
# take the first 3 values
take: 3
],
# reduce the result to a sum
reduce: fn n, sum -> sum + n end
)
#=> {:ok, 18}
Because CubDB
uses an immutable data structure, write operations cause the
data file to grow. Occasionally, it is adviseable to run a compaction to
optimize the file size and reclaim disk space. Compaction can be started
manually by calling compact/1
, and runs in the background, without blocking
other operations:
CubDB.compact(db)
#=> :ok
Alternatively, automatic compaction can be enabled, either passing the
:auto_compact
option to start_link/1
, or by calling set_auto_compact/2
.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Runs a database compaction.
Returns the path of the current database file.
Returns the path of the data directory, as given when the CubDB
process was
started.
Deletes the entry associated to key
from the database.
Deletes multiple entries corresponding to the given keys all at once, atomically.
Returns the dirt factor.
Fetches the value for the given key
in the database, or return :error
if key
is not present.
Performs a fsync
, forcing to flush all data that might be buffered by the OS
to disk.
Gets the value associated to key
from the database.
Gets the value corresponding to key
and updates it, in one atomic transaction.
Gets and updates or deletes multiple entries in an atomic transaction.
Gets multiple entries corresponding by the given keys all at once, atomically.
Returns whether an entry with the given key
exists in the database.
Writes an entry in the database, associating key
to value
.
Writes multiple entries all at once, atomically.
Selects a range of entries from the database, and optionally performs a pipeline of operations on them.
Configures whether to perform automatic compaction, and how.
Configures whether to automatically force file sync upon each write operation.
Returns the number of entries present in the database.
Starts the CubDB
database without a link.
Starts the CubDB
database process linked to the current process.
Synchronously stops the CubDB
database.
Updates the entry corresponding to key
using the given function.
Link to this section Types
option()
View Sourceoption() :: {:auto_compact, {pos_integer(), number()} | boolean()} | {:auto_file_sync, boolean()}
Link to this section Functions
Returns a specification to start this module under a supervisor.
The default options listed in Supervisor
are used.
compact(db)
View Sourcecompact(GenServer.server()) :: :ok | {:error, String.t()}
Runs a database compaction.
As write operations are performed on a database, its file grows. Occasionally, a compaction operation can be run to shrink the file to its optimal size. Compaction runs in the background and does not block operations.
Only one compaction operation can run at any time, therefore if this function
is called when a compaction is already running, it returns {:error, :pending_compaction}
.
When compacting, CubDB
will create a new data file, and eventually switch to
it and remove the old one as the compaction succeeds. For this reason, during
a compaction, there should be enough disk space for a second copy of the
database file.
Compaction can create disk contention, so it should not be performed unnecessarily often.
current_db_file(db)
View Sourcecurrent_db_file(GenServer.server()) :: String.t()
Returns the path of the current database file.
The current database file will change after a compaction operation.
Example
{:ok, db} = CubDB.start_link("some/data/directory")
CubDB.current_db_file(db)
#=> "some/data/directory/0.cub"
Returns the path of the data directory, as given when the CubDB
process was
started.
Example
{:ok, db} = CubDB.start_link("some/data/directory")
CubDB.data_dir(db)
#=> "some/data/directory"
Deletes the entry associated to key
from the database.
If key
was not present in the database, nothing is done.
delete_multi(db, keys)
View Sourcedelete_multi(GenServer.server(), [key()]) :: :ok
Deletes multiple entries corresponding to the given keys all at once, atomically.
The keys
to be deleted are passed as a list.
Returns the dirt factor.
The dirt factor is a number, ranging from 0 to 1, giving an indication about the amount of overhead disk space (or "dirt") that can be cleaned up with a compaction operation. A value of 0 means that there is no overhead, so a compaction would have no benefit. The closer to 1 the dirt factor is, the more can be cleaned up in a compaction operation.
fetch(db, key)
View Sourcefetch(GenServer.server(), key()) :: {:ok, value()} | :error
Fetches the value for the given key
in the database, or return :error
if key
is not present.
If the database contains an entry with the given key
and value value
, it
returns {:ok, value}
. If key
is not found, it returns :error
.
Performs a fsync
, forcing to flush all data that might be buffered by the OS
to disk.
Calling this function ensures that all writes up to this point are committed to disk, and will be available after a restart.
If CubDB
is started with the option auto_file_sync: true
, calling this
function is not necessary, as every write operation will be automatically
flushed to the storage device.
If this function is NOT called, the operative system will control when the file buffer is flushed to the storage device, which leads to better write performance, but might affect durability of recent writes in case of a sudden shutdown.
get(db, key, default \\ nil)
View Sourceget(GenServer.server(), key(), value()) :: value()
Gets the value associated to key
from the database.
If no value is associated with key
, default
is returned (which is nil
,
unless specified otherwise).
Gets the value corresponding to key
and updates it, in one atomic transaction.
fun
is called with the current value associated to key
(or nil
if not
present), and must return a two element tuple: the result value to be
returned, and the new value to be associated to key
. fun
mayalso return
:pop
, in which case the current value is deleted and returned.
The return value is {:ok, result}
, or {:error, reason}
in case an error occurs.
Gets and updates or deletes multiple entries in an atomic transaction.
Gets all values associated with keys in keys_to_get
, and passes them as a
map of %{key => value}
entries to fun
. If a key is not found, it won't be
added to the map passed to fun
. Updates the database and returns a result
according to the return value of fun
. Returns {:ok
, return_value} in case
of success, {:error, reason}
otherwise.
The function fun
should return a tuple of three elements: {return_value, entries_to_put, keys_to_delete}
, where return_value
is an arbitrary value
to be returned, entries_to_put
is a map of %{key => value}
entries to be
written to the database, and keys_to_delete
is a list of keys to be deleted.
The optional timeout
argument specifies a timeout in milliseconds, which is
5000
(5 seconds) by default.
The read and write operations are executed as an atomic transaction, so they
will either all succeed, or all fail. Note that get_and_update_multi/4
blocks other write operations until it completes.
Example
Assuming a database of names as keys, and integer monetary balances as values,
and we want to transfer 10 units from "Anna"
to "Joy"
, returning their
updated balance:
{:ok, {anna, joy}} = CubDB.get_and_update_multi(db, ["Anna", "Joy"], fn entries ->
anna = Map.get(entries, "Anna", 0)
joy = Map.get(entries, "Joy", 0)
if anna < 10, do: raise(RuntimeError, message: "Anna's balance is too low")
anna = anna - 10
joy = joy + 10
{{anna, joy}, %{"Anna" => anna, "Joy" => joy}, []}
end)
Or, if we want to transfer all of the balance from "Anna"
to "Joy"
,
deleting "Anna"
's entry, and returning "Joy"
's resulting balance:
{:ok, joy} = CubDB.get_and_update_multi(db, ["Anna", "Joy"], fn entries ->
anna = Map.get(entries, "Anna", 0)
joy = Map.get(entries, "Joy", 0)
joy = joy + anna
{joy, %{"Joy" => joy}, ["Anna"]}
end)
get_multi(db, keys)
View Sourceget_multi(GenServer.server(), [key()]) :: %{required(key()) => value()}
Gets multiple entries corresponding by the given keys all at once, atomically.
The keys to get are passed as a list. The result is a map of key/value entries corresponding to the given keys. Keys that are not present in the database won't be in the result map.
Example
CubDB.put_multi(db, a: 1, b: 2, c: nil)
CubDB.get_multi(db, [:a, :b, :c, :x])
# => %{a: 1, b: 2, c: nil}
has_key?(db, key)
View Sourcehas_key?(GenServer.server(), key()) :: boolean()
Returns whether an entry with the given key
exists in the database.
put(db, key, value)
View Sourceput(GenServer.server(), key(), value()) :: :ok
Writes an entry in the database, associating key
to value
.
If key
was already present, it is overwritten.
put_multi(db, entries)
View Sourceput_multi(GenServer.server(), %{required(key()) => value()} | [entry()]) :: :ok
Writes multiple entries all at once, atomically.
Entries are passed as a map of %{key => value}
or a list of {key, value}
.
select(db, options \\ [], timeout \\ 5000)
View Sourceselect(GenServer.server(), Keyword.t(), timeout()) :: {:ok, any()} | {:error, Exception.t()}
Selects a range of entries from the database, and optionally performs a pipeline of operations on them.
It returns {:ok, result}
if successful, or {:error, exception}
if an
exception is raised.
Options
The min_key
and max_key
specify the range of entries that are selected. By
default, the range is inclusive, so all entries that have a key greater or
equal than min_key
and less or equal then max_key
are selected:
# Select all entries where "a" <= key <= "d"
CubDB.select(db, min_key: "b", max_key: "d")
The range boundaries can be excluded by setting min_key_inclusive
or
max_key_inclusive
to false
:
# Select all entries where "a" <= key < "d"
CubDB.select(db, min_key: "b", max_key: "d", max_key_inclusive: false)
Any of :min_key
and :max_key
can be omitted, to leave the range
open-ended.
# Select entries where key <= "a"
CubDB.select(db, max_key: "a")
As nil
is a valid key, setting min_key
or max_key
to nil
does NOT
leave the range open ended:
# Select entries where nil <= key <= "a"
CubDB.select(db, min_key: nil, max_key: "a")
The reverse
option, when set to true, causes the entries to be selected and
traversed in reverse order.
The pipe
option specifies an optional list of operations performed
sequentially on the selected entries. The given order of operations is
respected. The available operations, specified as tuples, are:
{:filter, fun}
filters entries for whichfun
returns a truthy value{:map, fun}
maps each entry to the value returned by the functionfun
{:take, n}
takes the firstn
entries{:drop, n}
skips the firstn
entries{:take_while, fun}
takes entries whilefun
returns a truthy value{:drop_while, fun}
skips entries whilefun
returns a truthy value
Note that, when selecting a key range, specifying min_key
and/or max_key
is more performant than using {:filter, fun}
or {:take_while | :drop_while, fun}
, because min_key
and max_key
avoid loading unnecessary entries from
disk entirely.
The reduce
option specifies how the selected entries are aggregated. If
reduce
is omitted, the entries are returned as a list. If reduce
is a
function, it is used to reduce the collection of entries. If reduce
is a
tuple, the first element is the starting value of the reduction, and the
second is the reducing function.
Examples
To select all entries with keys between :a
and :c
as a list of {key, value}
entries we can do:
{:ok, entries} = CubDB.select(db, min_key: :a, max_key: :c)
If we want to get all entries with keys between :a
and :c
, with :c
excluded, we can do:
{:ok, entries} = CubDB.select(db,
min_key: :a, max_key: :c, max_key_inclusive: false)
To select the last 3 entries, we can do:
{:ok, entries} = CubDB.select(db, reverse: true, pipe: [take: 3])
If we want to obtain the sum of the first 10 positive numeric values
associated to keys from :a
to :f
, we can do:
{:ok, sum} = CubDB.select(db,
min_key: :a,
max_key: :f,
pipe: [
map: fn {_key, value} -> value end, # map values
filter: fn n -> is_number(n) and n > 0 end # only positive numbers
take: 10, # take only the first 10 entries in the range
],
reduce: fn n, sum -> sum + n end # reduce to the sum of selected values
)
Configures whether to perform automatic compaction, and how.
If set to false
, no automatic compaction is performed. If set to true
,
auto-compaction is performed, following a write operation, if at least 100
write operations occurred since the last compaction, and the dirt factor is at
least 0.25. These values can be customized by setting the auto_compact
option to {min_writes, min_dirt_factor}
.
It returns :ok
, or {:error, reason}
if setting
is invalid.
Compaction is performed in the background and does not block other operations, but can create disk contention, so it should not be performed unnecessarily often. When writing a lot into the database, such as when importing data from an external source, it is adviseable to turn off auto compaction, and manually run compaction at the end of the import.
set_auto_file_sync(db, bool)
View Sourceset_auto_file_sync(GenServer.server(), boolean()) :: :ok
Configures whether to automatically force file sync upon each write operation.
If set to false
, no automatic file sync is performed. That improves write
performance, but leaves to the operative system the decision of when to flush
disk buffers. This means that there is the possibility that recent writes
might not be durable in case of a sudden machine shutdown. In any case,
atomicity of multi operations is preserved, and partial writes will not
corrupt the database.
If set to true
, the file buffer will be forced to flush upon every write
operation, ensuring durability even in case of sudden machine shutdowns, but
decreasing write performance.
Returns the number of entries present in the database.
start(data_dir_or_options)
View Sourcestart(String.t() | [option() | {:data_dir, String.t()} | GenServer.option()]) :: GenServer.on_start()
Starts the CubDB
database without a link.
See start_link/2
for more information about options.
start_link(data_dir_or_options)
View Sourcestart_link( String.t() | [option() | {:data_dir, String.t()} | GenServer.option()] ) :: GenServer.on_start()
Starts the CubDB
database process linked to the current process.
The argument is a keyword list of options:
data_dir
: the directory path where the database files will be stored. This option is required. If the directory does not exist, it will be created. Only oneCubDB
instance can run per directory, so if you run several databases, they should each use their own separate data directory.auto_compact
: whether to perform compaction automatically. It defaults tofalse
. Seeset_auto_compact/2
for the possible valuesauto_file_sync
: whether to force flush the disk buffer on each write. It defaults tofalse
. If set totrue
, write performance is slower, but durability is strictly guaranteed. Seeset_auto_file_sync/2
for details.
GenServer
options like name
and timeout
can also be given, and are
forwarded to GenServer.start_link/3
as the third argument.
If only the data_dir
is specified, it is possible to pass it as a single
string argument.
Examples
# Passing only the data dir
{:ok, db} = CubDB.start_link("some/data/dir")
# Passing data dir and other options
{:ok, db} = CubDB.start_link(data_dir: "some/data/dir", auto_compact: true, name: :db)
stop(db, reason \\ :normal, timeout \\ :infinity)
View Sourcestop(GenServer.server(), term(), timeout()) :: :ok
Synchronously stops the CubDB
database.
See GenServer.stop/3
for details.
update(db, key, initial, fun)
View Sourceupdate(GenServer.server(), key(), value(), (value() -> value())) :: :ok
Updates the entry corresponding to key
using the given function.
If key
is present in the database, fun
is invoked with the corresponding
value
, and the result is set as the new value of key
. If key
is not
found, initial
is inserted as the value of key
.
The return value is :ok
, or {:error, reason}
in case an error occurs.