Dicom.CharacterSet (Dicom v0.5.1)

Copy Markdown View Source

DICOM Specific Character Set handling.

Supports decoding of text values according to the character set specified by tag (0008,0005) SpecificCharacterSet. See DICOM PS3.5 Section 6.1.

Supported Character Sets

  • Default character repertoire (ISO IR 6 / ASCII) — always supported
  • ISO_IR 100 (Latin-1 / ISO 8859-1)
  • ISO_IR 101 (Latin-2 / ISO 8859-2)
  • ISO_IR 109 (Latin-3 / ISO 8859-3)
  • ISO_IR 110 (Latin-4 / ISO 8859-4)
  • ISO_IR 144 (Cyrillic / ISO 8859-5)
  • ISO_IR 127 (Arabic / ISO 8859-6)
  • ISO_IR 126 (Greek / ISO 8859-7)
  • ISO_IR 138 (Hebrew / ISO 8859-8)
  • ISO_IR 148 (Latin-5 / ISO 8859-9)
  • ISO_IR 13 (JIS X 0201 — Roman + half-width Katakana)
  • ISO_IR 192 (UTF-8)

The labels ISO 2022 IR 6 and ISO 2022 IR 100 are accepted only when the value contains no ISO 2022 escape sequences. Actual code-extension switching is not implemented.

Multi-valued Specific Character Set declarations can be extracted from a data set, but this module does not implement full repertoire switching across multiple declared character sets.

All other character sets return {:error, {:unsupported_charset, term}}.

Summary

Functions

Decodes a binary value according to the given character set.

Decodes a binary value, returning the original binary on failure instead of an error.

Extracts the primary character set from a parsed data set's elements map.

Extracts all Specific Character Set values from a parsed data set's elements map.

Returns true if the given character set label is recognized by the decoder.

Types

charset()

@type charset() :: String.t()

Functions

decode(binary, charset)

@spec decode(binary(), charset() | nil) :: {:ok, String.t()} | {:error, term()}

Decodes a binary value according to the given character set.

If charset is nil or empty, the default character repertoire is assumed (ISO IR 6 / ASCII, which is a subset of Latin-1 and UTF-8).

Returns {:ok, string} or {:error, reason}.

Examples

iex> Dicom.CharacterSet.decode("JOHN", nil)
{:ok, "JOHN"}

iex> Dicom.CharacterSet.decode(<<0xC4, 0xD6, 0xDC>>, "ISO_IR 100")
{:ok, "ÄÖÜ"}

decode_lossy(binary, charset)

@spec decode_lossy(binary(), charset() | nil) :: binary()

Decodes a binary value, returning the original binary on failure instead of an error.

This is a convenience function for use in the parser where we want to attempt charset decoding but fall back to the undecoded bytes rather than failing. Successful decodes return a UTF-8 Elixir string; failed decodes return the original binary unchanged.

extract(elements)

@spec extract(map()) :: charset() | nil

Extracts the primary character set from a parsed data set's elements map.

Returns the first (or only) character set value, or nil if absent. Use extract_all/1 when you need the full Specific Character Set list.

extract_all(elements)

@spec extract_all(map()) :: [charset()]

Extracts all Specific Character Set values from a parsed data set's elements map.

supported?(charset)

@spec supported?(charset() | nil) :: boolean()

Returns true if the given character set label is recognized by the decoder.

For ISO 2022 IR 6 and ISO 2022 IR 100, this means only the non-switching single-byte subset is accepted. Values containing ISO 2022 escape sequences still return an error from decode/2.