Puid.Chars (puid v2.6.0)

View Source

Pre-defined character sets for use when creating Puid modules.

Example

iex> defmodule(AlphanumId, do: use(Puid, chars: :alphanum))

Pre-defined Chars

:alpha

Upper/lower case alphabet

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

bits per character: 5.7

:alpha_lower

Lower case alphabet

abcdefghijklmnopqrstuvwxyz

bits per character: 4.7

:alpha_upper

Upper case alphabet

ABCDEFGHIJKLMNOPQRSTUVWXYZ

bits per character: 4.7

:alphanum

Upper/lower case alphabet and numbers

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

bits per character: 5.95

:alphanum_lower

Lower case alphabet and numbers

abcdefghijklmnopqrstuvwxyz0123456789

bits per character: 5.17

:alphanum_upper

Upper case alphabet and numbers

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

bits per character: 5.17

:base16

RFC 4648 base16 character set

0123456789ABCDEF

bits per character: 4

:base32

RFC 4648 base32 character set

ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

bits per character: 5

:base32_hex

RFC 4648 base32 extended hex character set with lowercase letters

0123456789abcdefghijklmnopqrstuv

bits per character: 5

:base32_hex_upper

RFC 4648 base32 extended hex character set

0123456789ABCDEFGHIJKLMNOPQRSTUV

bits per character: 5

:base36

Case-insensitive alphanumeric (lowercase)

0123456789abcdefghijklmnopqrstuvwxyz

bits per character: 5.17

:base36_upper

Case-insensitive alphanumeric (uppercase)

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

bits per character: 5.17

:base45

QR code alphanumeric mode (ISO/IEC 18004:2015)

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:

bits per character: 5.49

:base58

Bitcoin Base58 alphabet (no 0, O, I, l)

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

bits per character: 5.86

:base62

Alphanumeric characters (alias for :alphanum)

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

bits per character: 5.95

:base85

ASCII85/Ascii85 encoding (Adobe, btoa)

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstu

bits per character: 6.41

:bech32

Bitcoin SegWit address encoding (no 1, b, i, o)

023456789acdefghjklmnpqrstuvwxyz

bits per character: 5

:boolean

Boolean/binary representation

TF

bits per character: 1

:crockford32

Crockford 32

0123456789ABCDEFGHJKMNPQRSTVWXYZ

:decimal

Decimal digits

0123456789

bits per character: 3.32

:dna

DNA nucleotide bases

ACGT

bits per character: 2

:geohash

Geohash encoding alphabet (base32 variant excluding 'a', 'i', 'l', 'o')

0123456789bcdefghjkmnpqrstuvwxyz

bits per character: 5

:hex

Lowercase hexadecimal

0123456789abcdef

bits per character: 4

:hex_upper

Uppercase hexadecimal

0123456789ABCDEF

bits per character: 4

:safe_ascii

ASCII characters from ?! to ?~, minus backslash, backtick, single-quote and double-quote

`!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_bcdefghijklmnopqrstuvwxyz{|}~`

bits per character: 6.49

:safe32

Strings that don't look like English words and are easier to parse visually

2346789bdfghjmnpqrtBDFGHJLMNPQRT
  • remove all upper and lower case vowels (including y)
  • remove all numbers that look like letters
  • remove all letters that look like numbers
  • remove all letters that have poor distinction between upper and lower case values

bits per character: 6.49

:safe64

RFC 4648 file system and URL safe character set

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

bits per character: 6

:symbol

:safe_ascii characters not in :alphanum

`!#$%&()*+,-./:;<=>?@[]^_{|}~`

bits per character: 4.81

:url_safe

RFC 3986 unreserved characters (URL safe without percent-encoding)

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~

bits per character: 6.02

:word_safe32

Strings that don't look like English words

23456789CFGHJMPQRVWXcfghjmpqrvwx

Origin unknown

bits per character: 5

:z_base32

Zooko's human-oriented base32 (easier to read/transcribe)

ybndrfg8ejkmcpqxot1uwisza345h769

bits per character: 5

Summary

Types

Chars can be designated by a pre-defined atom, a binary or a charlist

Character encoding scheme. :ascii encoding uses cross-product character pairs.

Functions

charlist for a pre-defined Puid.Chars, a String.t() or a charlist.

Same as charlist/1 but either returns charlist or raises a Puid.Error

Calculate entropy metrics for a character set.

List of predefined charsets discovered from compiled module.

Types

puid_chars()

@type puid_chars() :: atom() | String.t() | charlist()

Chars can be designated by a pre-defined atom, a binary or a charlist

puid_encoding()

@type puid_encoding() :: :ascii | :utf8

Character encoding scheme. :ascii encoding uses cross-product character pairs.

Functions

charlist(chars)

@spec charlist(puid_chars()) :: {:ok, charlist()} | {:error, String.t()}

charlist for a pre-defined Puid.Chars, a String.t() or a charlist.

The characters for either String.t() or charlist types must be unique, have more than one character, and not be invalid ascii.

Example

iex> Puid.Chars.charlist(:safe32)
{:ok, ~c"2346789bdfghjmnpqrtBDFGHJLMNPQRT"}

iex> Puid.Chars.charlist("dingosky")
{:ok, ~c"dingosky"}

iex> Puid.Chars.charlist("unique")
{:error, "Characters not unique"}

charlist!(chars)

@spec charlist!(puid_chars()) :: charlist()

Same as charlist/1 but either returns charlist or raises a Puid.Error

Example

iex> Puid.Chars.charlist!(:safe32)
~c"2346789bdfghjmnpqrtBDFGHJLMNPQRT"

iex> Puid.Chars.charlist!("dingosky")
~c"dingosky"

Raises Puid.Error if the characters are not unique, too few, or contain invalid characters.

metrics(chars)

@spec metrics(puid_chars()) :: %{
  avg_bits: float(),
  bit_shifts: [{non_neg_integer(), pos_integer()}, ...],
  ere: float(),
  ete: float()
}

Calculate entropy metrics for a character set.

Return Value

Returns a map with the following keys:

  • :avg_bits - Average bits consumed per character
  • :bit_shifts - Bit shift rules used for character generation
  • :ere - Entropy representation efficiency (0 < ERE ≤ 1.0), measures how efficiently the characters represent entropy in their string form
  • :ete - Entropy transform efficiency (0 < ETE ≤ 1.0), measures how efficiently random bits are transformed into characters during generation

Examples

iex> Puid.Chars.metrics(:safe64)
%{
  avg_bits: 6.0,
  bit_shifts: [{63, 6}],
  ere: 0.75,
  ete: 1.0
}

iex> Puid.Chars.metrics(:alpha)
%{
  avg_bits: 6.769230769230769,
  bit_shifts: [{51, 6}, {55, 4}, {63, 3}],
  ere: 0.7125549647676365,
  ete: 0.8421104129072068
}

Details

ERE: Entropy representation efficiency (0 < ERE ≤ 1.0), measures how efficiently ID characters represent entropy in their string form. For Puid this is always equivalent to the bits per character.

ETE: Entropy transform efficiency (0 < ETE ≤ 1.0). Character sets with a power-of-2 number of characters have ETE = 1.0 since bit slicing always creates a proper index into the characters list. Other character sets discard some bits due to bit slicing that creates an out-of-bounds index. Puid uses an algorithm which minimizes the number of bits discarded.

avg_bits: Theoretical average bits consumed per character

:bit_shifts: Bit shift values used to determine how many bits are discarded during bit slicing.

predefined()

@spec predefined() :: [atom()]

List of predefined charsets discovered from compiled module.