CrockfordBase32

An Elixir Implementation of Douglas Crockford's Base32 encoding.

Please see https://www.crockford.com/base32.html.

This library can encode an integer or a bitstring in Crockford's Base32, and also provide the way to decode the corresponding encoding.

Installation

def deps do
  [
    {:crockford_base32, "~> 0.7"}
  ]
end

Usage

Encode

Encode an integer:

iex> CrockfordBase32.encode(1234)
"16J"

Encode an integer with checksum: true:

iex> CrockfordBase32.encode(1234, checksum: true)
"16JD"

Encode an inetger, and insert hyphens (-) per the step size(via split_size) in encoded result:

iex> CrockfordBase32.encode(1234, split_size: 2)
"16-J"
iex> CrockfordBase32.encode(1234, split_size: 1)
"1-6-J"
iex> CrockfordBase32.encode(1234, split_size: 1, checksum: true)
"1-6-J-D"

Encode a bitstring, and optional split_size and checksum options are both working:

iex> CrockfordBase32.encode(<<12345678::size(48)>>)
"00001F319R"
iex> CrockfordBase32.encode("abc")
"C5H66"
iex> CrockfordBase32.encode("abc", checksum: true)
"C5H66C"
iex> CrockfordBase32.encode("abc", checksum: true, split_size: 3)
"C5H-66C"
iex> CrockfordBase32.encode(<<5::size(3)>>)
"M"

Decode

There will internally remove all hyphen(s) before decoding.

Decode the encoded to an integer:

iex> CrockfordBase32.decode_to_integer("16J")
{:ok, 1234}
iex> CrockfordBase32.decode_to_integer("16-J")
{:ok, 1234}
iex> CrockfordBase32.decode_to_integer("16-j")
{:ok, 1234}

With a check symbol, and decode the encoded to an integer:

iex> CrockfordBase32.decode_to_integer("16JD", checksum: true)
{:ok, 1234}
iex> CrockfordBase32.decode_to_integer("16J1", checksum: true)
:error_checksum

Decode the encoded to a bitstring:

iex> CrockfordBase32.decode_to_bitstring("00001F319R")
{:ok, <<0, 0, 0, 188, 97, 78>>}
iex> CrockfordBase32.decode_to_bitstring("C5H66")
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("C5H-66")
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("c5H-66")
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("c5h-66")
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("c5h66")
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("M")
{:ok, <<5::size(3)>>}

With a check symbol, and decode the encoded to a bitstring:

iex> CrockfordBase32.decode_to_bitstring("C5H66C", checksum: true)
{:ok, "abc"}
iex> CrockfordBase32.decode_to_bitstring("C5H66D", checksum: true)
:error_checksum

Some invalid cases:

iex> CrockfordBase32.decode_to_bitstring(<<1, 2, 3>>)
:error
iex> CrockfordBase32.decode_to_bitstring(<<>>)
:error
iex> CrockfordBase32.decode_to_integer(<<1, 2, 3>>)
:error
iex> CrockfordBase32.decode_to_integer(<<>>)
:error

Fixed Size Encoding

In some cases, you may want to encode the fixed size bytes, we can do this be with a better performance leverages the benefit of the pattern match of Elixir/Erlang. I use this feature to implement a ULID in Elixir.

Refer ULID specification, a ULID concatenates a UNIX timestamp in milliseconds(a 48 bit integer) and a randomness in 80-bit, since an integer in bits are padded with some <<0::1>> leading when needed, and a ULID in 128-bit after encoded its length is 26 (can be divisible by 5), apply the fixed size encoding with type: :integer can efficiently encode/decode a ULID, for example:

defmoule ULID do

  defmoule Base32.Bits128 do
    use CrockfordBase32,
      bits_size: 128,
      type: :integer # Optional, defaults to `:bitstring`
  end

end

Then we can use ULID.Base32.Bits128 to encode/decode a 128-bit bitstring which concatenates an integer (as UNIX timestamp in millisecond) in 48-bit and a randomly generated in 80-bit.

Padding 0-bit when decoding

Crockford's Base32 avoids the use of padding characters by zero-extending the data to ensure the bit-length is a multiple of 5, there is no need to retain additional padding bits(<<0::size(1)>>) in the decoded result, so there may some decoded bits that are not as complete as expected, for example:

A string("01HY3B3HQ5FMEVJN8ME7C4HZDM") is a 26 length randomly generated string as a suffix of TypeID, TypeID's specification defines its suffix base32 encoding be with two zeroed bits are pre-pended to the 128-bits of the UUID, resulting in 130-bits of data.

Notice: Please ignore case in the following parameter "s", CrockfordBase32 is not case sensitive, but TypeID only uses lowercase.

iex> s = "01HY3B3HQ5FMEVJN8ME7C4HZDM"
iex> {:ok, input} = CrockfordBase32.decode_to_bitstring(s)
{:ok, <<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109>>}
iex> bit_size(input)
128
iex> CrockfordBase32.encode(<<input::bitstring, 0::size(2)>>)
"01HY3B3HQ5FMEVJN8ME7C4HZDM"
iex> <<0::size(2), uuid::bitstring>> = <<input::bitstring, 0::size(2)>>
<<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109,
  0::size(2)>>
iex> uuid
<<1, 143, 134, 177, 198, 229, 125, 29, 185, 85, 20, 113, 216, 72, 253, 180>>

We must explicitly append two zero bits(<<0::size(2)>>) into the <<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109>> and only later we can get the correct uuid bitstring.

Similar to TypeID, pre-defining a fixed-size bit string encoding, we can do this:

defmoule Typeid do

  defmoule Base32.Bits130 do
    use CrockfordBase32,
      bits_size: 130
  end

end

Use "Typeid.Base32.Bits130" and then do not need to manually to pad the zero bit(s), it will use its fixed size to handle the padding.

iex> Typeid.Base32.Bits130.decode("01HY3B3HQ5FMEVJN8ME7C4HZDM")
{:ok,
 <<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109,
   0::size(2)>>}
iex> {:ok, input} = Typeid.Base32.Bits130.decode("01HY3B3HQ5FMEVJN8ME7C4HZDM")
{:ok,
 <<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109,
   0::size(2)>>}
iex> bit_size(input)
130
iex> <<0::size(2), uuid::bitstring>> = input
<<0, 99, 225, 172, 113, 185, 95, 71, 110, 85, 69, 28, 118, 18, 63, 109,
  0::size(2)>>
iex> uuid
<<1, 143, 134, 177, 198, 229, 125, 29, 185, 85, 20, 113, 216, 72, 253, 180>>
iex> Typeid.Base32.Bits130.encode(input)
"01HY3B3HQ5FMEVJN8ME7C4HZDM"

Custom alphabet

There is a way to custom alphabet in the encoding, for example:

  defmodule Typeid.Base32 do
    use CrockfordBase32,
      bits_size: 130,
      alphabet: '0123456789abcdefghjkmnpqrstvwxyz'
  end

Use "Typeid.Base32" to satisfy TypeID's specification uses 0123456789abcdefghjkmnpqrstvwxyz as its alphabet.

Credits

These libraries or tools are very helpful in understanding and reference, thanks!