Puid (puid v2.3.0)

Simple, fast, flexible and efficient generation of probably unique identifiers (puid, aka random strings) of intuitively specified entropy using pre-defined or custom characters.

Overview

Puid provides fast and efficient generation of random IDs. For the purposes of Puid, a random ID is considered a random string used in a context of uniqueness, that is, random IDs are a bunch of random strings that are hopefully unique.

Random string generation can be thought of as a transformation of some random source of entropy into a string representation of randomness. A general purpose random string library used for random IDs should therefore provide user specification for each of the following three key aspects:

Entropy source

What source of randomness is being transformed? Puid allows easy specification of the function used for source randomness.

ID characters

What characters are used in the ID? Puid provides 16 pre-defined character sets, as well as allows custom character designation, including Unicode

ID randomness

What is the resulting “randomness” of the IDs? Note this isn't necessarily the same as the randomness of the entropy source. Puid allows explicit specification of ID randomness in an intuitive manner.

Examples

Creating a random ID generator using Puid is a simple as:

iex> defmodule(RandId, do: use(Puid))
iex> RandId.generate()
"8nGA2UaIfaawX-Og61go5A"

Options allow easy and complete control of ID generation.

Entropy Source

Puid uses :crypto.strong_rand_bytes/1 as the default entropy source. The rand_bytes option can be used to specify any function of the form (non_neg_integer) -> binary as the source:

iex > defmodule(PrngPuid, do: use(Puid, rand_bytes: &:rand.bytes/1))
iex> PrngPuid.generate()
"bIkrSeU6Yr8_1WHGvO0H3M"

ID Characters

By default, Puid use the RFC 4648 file system & URL safe characters. The chars option can by used to specify any of 16 pre-defined character sets or custom characters, including Unicode:

iex> defmodule(HexPuid, do: use(Puid, chars: :hex))
iex> HexPuid.generate()
"13fb81e35cb89e5daa5649802ad4bbbd"

iex> defmodule(DingoskyPuid, do: use(Puid, chars: "dingosky"))
iex> DingoskyPuid.generate()
"yiidgidnygkgydkodggysonydodndsnkgksgonisnko"

iex> defmodule(DingoskyUnicodePuid, do: use(Puid, chars: "dîñgø$kyDÎÑGØßK¥", total: 2.5e6, risk: 1.0e15))
iex> DingoskyUnicodePuid.generate()
"øßK$ggKñø$dyGîñdyØøØÎîk"

ID Randomness

Generated IDs have 128-bit entropy by default. Puid provides a simple, intuitive way to specify ID randomness by declaring a total number of possible IDs with a specified risk of a repeat in that many IDs:

To generate up to 10 million random IDs with 1 in a trillion chance of repeat:

iex> defmodule(MyPuid, do: use(Puid, total: 10.0e6, risk: 1.0e15))
iex> MyPuid.generate()
"T0bFZadxBYVKs5lA"

The bits option can be used to directly specify an amount of ID randomness:

iex> defmodule(Token, do: use(Puid, bits: 256, chars: :hex_upper))
iex> Token.generate()
"6E908C2A1AA7BF101E7041338D43B87266AFA73734F423B6C3C3A17599F40F2A"

Module API

Module functions:

  • generate/0: Generate a random puid
  • total/1: total puids which can be generated at a specified risk
  • risk/1: risk of generating total puids
  • encode/1: Encode bytes into a puid
  • decode/1: Decode a puid into bytes
  • info/0: Module information

The total/1, risk/1 functions provide approximations to the risk of a repeat in some total number of generated puids. The mathematical approximations used purposely overestimate risk and underestimate total.

The encode/1, decode/1 functions convert puids to and from bits to facilitate binary data storage, e.g. as an Ecto type. Note that for efficiency Puid operates at a bit level, so decode/1 of a puid produces representative bytes such that encode/1 of those bytes produces the same puid. The bytes are the puid specific bitstring with 0 bit values appended to the ending byte boundary.

The info/0 function returns a Puid.Info structure consisting of:

  • source characters
  • name of pre-defined Puid.Chars or :custom
  • entropy bits per character
  • total entropy bits
  • may be larger than the specified bits since it is a multiple of the entropy bits per character
  • entropy representation efficiency
  • ratio of the puid entropy to the bits required for puid string representation
  • entropy source function
  • puid string length

Example

iex> defmodule(SafeId, do: use(Puid))

iex> SafeId.generate()
"CSWEPL3AiethdYFlCbSaVC"

iex> SafeId.total(1_000_000)
104350568690606000

iex> SafeId.risk(1.0e12)
9007199254740992

iex> SafeId.decode("CSWEPL3AiethdYFlCbSaVC")
<<9, 37, 132, 60, 189, 192, 137, 235, 97, 117, 129, 101, 9, 180, 154, 84, 32>>

iex> SafeId.encode(<<9, 37, 132, 60, 189, 192, 137, 235, 97, 117, 129, 101, 9, 180, 154, 84, 32>>)
"CSWEPL3AiethdYFlCbSaVC"

iex> SafeId.info()
%Puid.Info{
characters: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_",
char_set: :safe64,
entropy_bits: 132.0,
entropy_bits_per_char: 6.0,
ere: 0.75,
length: 22,
rand_bytes: &:crypto.strong_rand_bytes/1
}

Summary

Types

@type t() :: binary()