DICOM Specific Character Set handling.
Supports decoding of text values according to the character set specified by tag (0008,0005) SpecificCharacterSet. See DICOM PS3.5 Section 6.1.
Supported Character Sets
- Default character repertoire (ISO IR 6 / ASCII) — always supported
ISO_IR 100(Latin-1 / ISO 8859-1)ISO_IR 101(Latin-2 / ISO 8859-2)ISO_IR 109(Latin-3 / ISO 8859-3)ISO_IR 110(Latin-4 / ISO 8859-4)ISO_IR 144(Cyrillic / ISO 8859-5)ISO_IR 127(Arabic / ISO 8859-6)ISO_IR 126(Greek / ISO 8859-7)ISO_IR 138(Hebrew / ISO 8859-8)ISO_IR 148(Latin-5 / ISO 8859-9)ISO_IR 13(JIS X 0201 — Roman + half-width Katakana)ISO_IR 192(UTF-8)
The labels ISO 2022 IR 6 and ISO 2022 IR 100 are accepted only when the
value contains no ISO 2022 escape sequences. Actual code-extension switching
is not implemented.
Multi-valued Specific Character Set declarations can be extracted from a data set, but this module does not implement full repertoire switching across multiple declared character sets.
All other character sets return {:error, {:unsupported_charset, term}}.
Summary
Functions
Decodes a binary value according to the given character set.
Decodes a binary value, returning the original binary on failure instead of an error.
Extracts the primary character set from a parsed data set's elements map.
Extracts all Specific Character Set values from a parsed data set's elements map.
Returns true if the given character set label is recognized by the decoder.
Types
@type charset() :: String.t()
Functions
Decodes a binary value according to the given character set.
If charset is nil or empty, the default character repertoire is assumed
(ISO IR 6 / ASCII, which is a subset of Latin-1 and UTF-8).
Returns {:ok, string} or {:error, reason}.
Examples
iex> Dicom.CharacterSet.decode("JOHN", nil)
{:ok, "JOHN"}
iex> Dicom.CharacterSet.decode(<<0xC4, 0xD6, 0xDC>>, "ISO_IR 100")
{:ok, "ÄÖÜ"}
Decodes a binary value, returning the original binary on failure instead of an error.
This is a convenience function for use in the parser where we want to attempt charset decoding but fall back to the undecoded bytes rather than failing. Successful decodes return a UTF-8 Elixir string; failed decodes return the original binary unchanged.
Extracts the primary character set from a parsed data set's elements map.
Returns the first (or only) character set value, or nil if absent.
Use extract_all/1 when you need the full Specific Character Set list.
Extracts all Specific Character Set values from a parsed data set's elements map.
Returns true if the given character set label is recognized by the decoder.
For ISO 2022 IR 6 and ISO 2022 IR 100, this means only the non-switching
single-byte subset is accepted. Values containing ISO 2022 escape sequences
still return an error from decode/2.