Puid.Chars (puid v2.6.0)
View SourcePre-defined character sets for use when creating Puid
modules.
Example
iex> defmodule(AlphanumId, do: use(Puid, chars: :alphanum))
Pre-defined Chars
:alpha
Upper/lower case alphabet
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
bits per character: 5.7
:alpha_lower
Lower case alphabet
abcdefghijklmnopqrstuvwxyz
bits per character: 4.7
:alpha_upper
Upper case alphabet
ABCDEFGHIJKLMNOPQRSTUVWXYZ
bits per character: 4.7
:alphanum
Upper/lower case alphabet and numbers
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
bits per character: 5.95
:alphanum_lower
Lower case alphabet and numbers
abcdefghijklmnopqrstuvwxyz0123456789
bits per character: 5.17
:alphanum_upper
Upper case alphabet and numbers
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
bits per character: 5.17
:base16
RFC 4648 base16 character set
0123456789ABCDEF
bits per character: 4
:base32
RFC 4648 base32 character set
ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
bits per character: 5
:base32_hex
RFC 4648 base32 extended hex character set with lowercase letters
0123456789abcdefghijklmnopqrstuv
bits per character: 5
:base32_hex_upper
RFC 4648 base32 extended hex character set
0123456789ABCDEFGHIJKLMNOPQRSTUV
bits per character: 5
:base36
Case-insensitive alphanumeric (lowercase)
0123456789abcdefghijklmnopqrstuvwxyz
bits per character: 5.17
:base36_upper
Case-insensitive alphanumeric (uppercase)
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
bits per character: 5.17
:base45
QR code alphanumeric mode (ISO/IEC 18004:2015)
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:
bits per character: 5.49
:base58
Bitcoin Base58 alphabet (no 0, O, I, l)
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
bits per character: 5.86
:base62
Alphanumeric characters (alias for :alphanum)
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
bits per character: 5.95
:base85
ASCII85/Ascii85 encoding (Adobe, btoa)
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstu
bits per character: 6.41
:bech32
Bitcoin SegWit address encoding (no 1, b, i, o)
023456789acdefghjklmnpqrstuvwxyz
bits per character: 5
:boolean
Boolean/binary representation
TF
bits per character: 1
:crockford32
0123456789ABCDEFGHJKMNPQRSTVWXYZ
:decimal
Decimal digits
0123456789
bits per character: 3.32
:dna
DNA nucleotide bases
ACGT
bits per character: 2
:geohash
Geohash encoding alphabet (base32 variant excluding 'a', 'i', 'l', 'o')
0123456789bcdefghjkmnpqrstuvwxyz
bits per character: 5
:hex
Lowercase hexadecimal
0123456789abcdef
bits per character: 4
:hex_upper
Uppercase hexadecimal
0123456789ABCDEF
bits per character: 4
:safe_ascii
ASCII characters from ?!
to ?~
, minus backslash, backtick, single-quote and double-quote
`!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_bcdefghijklmnopqrstuvwxyz{|}~`
bits per character: 6.49
:safe32
Strings that don't look like English words and are easier to parse visually
2346789bdfghjmnpqrtBDFGHJLMNPQRT
- remove all upper and lower case vowels (including y)
- remove all numbers that look like letters
- remove all letters that look like numbers
- remove all letters that have poor distinction between upper and lower case values
bits per character: 6.49
:safe64
RFC 4648 file system and URL safe character set
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
bits per character: 6
:symbol
:safe_ascii characters not in :alphanum
`!#$%&()*+,-./:;<=>?@[]^_{|}~`
bits per character: 4.81
:url_safe
RFC 3986 unreserved characters (URL safe without percent-encoding)
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~
bits per character: 6.02
:word_safe32
Strings that don't look like English words
23456789CFGHJMPQRVWXcfghjmpqrvwx
Origin unknown
bits per character: 5
:z_base32
Zooko's human-oriented base32 (easier to read/transcribe)
ybndrfg8ejkmcpqxot1uwisza345h769
bits per character: 5
Summary
Types
Chars can be designated by a pre-defined atom, a binary or a charlist
Character encoding scheme. :ascii
encoding uses cross-product character pairs.
Functions
charlist
for a pre-defined Puid.Chars
, a String.t() or a charlist.
Same as charlist/1
but either returns charlist or raises a Puid.Error
Calculate entropy metrics for a character set.
List of predefined charsets discovered from compiled module.
Types
Functions
@spec charlist(puid_chars()) :: {:ok, charlist()} | {:error, String.t()}
charlist
for a pre-defined Puid.Chars
, a String.t() or a charlist.
The characters for either String.t() or charlist types must be unique, have more than one character, and not be invalid ascii.
Example
iex> Puid.Chars.charlist(:safe32)
{:ok, ~c"2346789bdfghjmnpqrtBDFGHJLMNPQRT"}
iex> Puid.Chars.charlist("dingosky")
{:ok, ~c"dingosky"}
iex> Puid.Chars.charlist("unique")
{:error, "Characters not unique"}
@spec charlist!(puid_chars()) :: charlist()
Same as charlist/1
but either returns charlist or raises a Puid.Error
Example
iex> Puid.Chars.charlist!(:safe32)
~c"2346789bdfghjmnpqrtBDFGHJLMNPQRT"
iex> Puid.Chars.charlist!("dingosky")
~c"dingosky"
Raises Puid.Error
if the characters are not unique, too few, or contain invalid characters.
@spec metrics(puid_chars()) :: %{ avg_bits: float(), bit_shifts: [{non_neg_integer(), pos_integer()}, ...], ere: float(), ete: float() }
Calculate entropy metrics for a character set.
Return Value
Returns a map with the following keys:
:avg_bits
- Average bits consumed per character:bit_shifts
- Bit shift rules used for character generation:ere
- Entropy representation efficiency (0 < ERE ≤ 1.0), measures how efficiently the characters represent entropy in their string form:ete
- Entropy transform efficiency (0 < ETE ≤ 1.0), measures how efficiently random bits are transformed into characters during generation
Examples
iex> Puid.Chars.metrics(:safe64)
%{
avg_bits: 6.0,
bit_shifts: [{63, 6}],
ere: 0.75,
ete: 1.0
}
iex> Puid.Chars.metrics(:alpha)
%{
avg_bits: 6.769230769230769,
bit_shifts: [{51, 6}, {55, 4}, {63, 3}],
ere: 0.7125549647676365,
ete: 0.8421104129072068
}
Details
ERE: Entropy representation efficiency (0 < ERE ≤ 1.0), measures how efficiently ID characters represent entropy in their string form. For Puid this is always equivalent to the bits per character.
ETE: Entropy transform efficiency (0 < ETE ≤ 1.0). Character sets with a power-of-2 number of characters have ETE = 1.0 since bit slicing always creates a proper index into the characters list. Other character sets discard some bits due to bit slicing that creates an out-of-bounds index. Puid uses an algorithm which minimizes the number of bits discarded.
avg_bits: Theoretical average bits consumed per character
:bit_shifts: Bit shift values used to determine how many bits are discarded during bit slicing.
@spec predefined() :: [atom()]
List of predefined charsets discovered from compiled module.