View Source Xray (Xray v1.2.0)

Xray offers utility functions for inspecting string binaries, their code points, and their base2 representations.

This package was the result of my own studying of Elixir strings and binaries. It's unlikely you would actually use this as a dependency, but I offer it up for public use in the hopes that it may be educational.

Link to this section Summary

Functions

Reveals the integer codepoint for the given single character; when run with the default options, this is equivalent to the question-mark operator, e.g. ?x but this function works with variables (whereas the question mark only evaluates literal characters).

Given a string binary, this returns a list of the codepoints that represent each of the characters in the string. This is what you might expect String.codepoints/1 to return, but instead of returning a list of the component characters, this function returns the numbers (which is what code points are).

This function prints a report on the provided input string. This may not work especially well when the input contains non-printable characters (YMMV).

Link to this section Functions

Link to this function

codepoint(arg, opts \\ [])

View Source
@spec codepoint(binary(), opts :: keyword()) :: integer() | String.t()

Reveals the integer codepoint for the given single character; when run with the default options, this is equivalent to the question-mark operator, e.g. ?x but this function works with variables (whereas the question mark only evaluates literal characters).

options

Options:

as_hex-boolean-default-false

:as_hex (boolean) default: false

When true, returns the hexidecimal representation of the codepoint number. The hexidecimal representation is useful when looking up documentation, e.g. on Wikipedia or on websites like codepoints.net.

examples

Examples

iex> Xray.codepoint("ä")
228
iex> Xray.codepoint("ä", as_hex: true)
"00E4"
Link to this function

codepoints(string, opts \\ [])

View Source
@spec codepoints(string :: binary(), opts :: keyword()) :: list()

Given a string binary, this returns a list of the codepoints that represent each of the characters in the string. This is what you might expect String.codepoints/1 to return, but instead of returning a list of the component characters, this function returns the numbers (which is what code points are).

Note that this function returns a string: if a list is returned, Elixir will usually attempt to format it as a human-readable string, which defeats the purpose of the inspection.

This function offers output similar to what IO.inspect/2 when the :as_lists option set to true

options

Options

examples

Examples

iex> Xray.codepoints("cät")
"99, 228, 116"

Compare this to inspecting a single-quoted charlist:

iex> IO.inspect('cät', charlists: :as_lists)
[99, 228, 116]

But IO.inspect will send output to STDOUT.

@spec inspect(value :: binary()) :: String.t()

This function prints a report on the provided input string. This may not work especially well when the input contains non-printable characters (YMMV).

For each character in the string, the following information is shown:

  • code point as a decimal, e.g. 228
  • code point in its Elixir Unicode representation, e.g. \u00E4
  • a link to a page containing more information about this Unicode code point
  • count of the number of bytes required to represent this code point using UTF-8 encoding
  • an inspection of the UTF-8 binaries, e.g. <<195, 164>>
  • a Base2 representation (i.e. 1's and 0's) of the encoded code point

The Base2 representation (what we would be tempted to call the "binary" representation) highlights control bits in red to help show how UTF-8 identifies how many bytes are required to encode each character.

examples

Examples

  iex> Xray.inspect("cät")
  ======================================================
  Input String: cät
  Character Count: 3
  Byte Count: 4
  Is valid? true
  Is printable? true
  ======================================================

  c   Codepoint: 99 (\u0063) https://codepoints.net/U+0063
      Script(s): latin
      Byte Count: 1
      UTF-8: <<99>>
      Base2: 01100011

  ä   Codepoint: 228 (\u00E4) https://codepoints.net/U+00E4
      Script(s): latin
      Byte Count: 2
      UTF-8: <<195, 164>>
      Base2: 11000011 10100100

  t   Codepoint: 116 (\u0074) https://codepoints.net/U+0074
      Script(s): latin
      Byte Count: 1
      UTF-8: <<116>>
      Base2: 01110100