# `Dicom.CharacterSet`
[🔗](https://github.com/Balneario-de-Cofrentes/dicom/blob/v0.5.1/lib/dicom/character_set.ex#L1)

DICOM Specific Character Set handling.

Supports decoding of text values according to the character set specified
by tag (0008,0005) SpecificCharacterSet. See DICOM PS3.5 Section 6.1.

## Supported Character Sets

- Default character repertoire (ISO IR 6 / ASCII) — always supported
- `ISO_IR 100` (Latin-1 / ISO 8859-1)
- `ISO_IR 101` (Latin-2 / ISO 8859-2)
- `ISO_IR 109` (Latin-3 / ISO 8859-3)
- `ISO_IR 110` (Latin-4 / ISO 8859-4)
- `ISO_IR 144` (Cyrillic / ISO 8859-5)
- `ISO_IR 127` (Arabic / ISO 8859-6)
- `ISO_IR 126` (Greek / ISO 8859-7)
- `ISO_IR 138` (Hebrew / ISO 8859-8)
- `ISO_IR 148` (Latin-5 / ISO 8859-9)
- `ISO_IR 13` (JIS X 0201 — Roman + half-width Katakana)
- `ISO_IR 192` (UTF-8)

The labels `ISO 2022 IR 6` and `ISO 2022 IR 100` are accepted only when the
value contains no ISO 2022 escape sequences. Actual code-extension switching
is not implemented.

Multi-valued Specific Character Set declarations can be extracted from a
data set, but this module does not implement full repertoire switching across
multiple declared character sets.

All other character sets return `{:error, {:unsupported_charset, term}}`.

# `charset`

```elixir
@type charset() :: String.t()
```

# `decode`

```elixir
@spec decode(binary(), charset() | nil) :: {:ok, String.t()} | {:error, term()}
```

Decodes a binary value according to the given character set.

If `charset` is nil or empty, the default character repertoire is assumed
(ISO IR 6 / ASCII, which is a subset of Latin-1 and UTF-8).

Returns `{:ok, string}` or `{:error, reason}`.

## Examples

    iex> Dicom.CharacterSet.decode("JOHN", nil)
    {:ok, "JOHN"}

    iex> Dicom.CharacterSet.decode(<<0xC4, 0xD6, 0xDC>>, "ISO_IR 100")
    {:ok, "ÄÖÜ"}

# `decode_lossy`

```elixir
@spec decode_lossy(binary(), charset() | nil) :: binary()
```

Decodes a binary value, returning the original binary on failure instead of an error.

This is a convenience function for use in the parser where we want to
attempt charset decoding but fall back to the undecoded bytes rather than
failing. Successful decodes return a UTF-8 Elixir string; failed decodes
return the original binary unchanged.

# `extract`

```elixir
@spec extract(map()) :: charset() | nil
```

Extracts the primary character set from a parsed data set's elements map.

Returns the first (or only) character set value, or nil if absent.
Use `extract_all/1` when you need the full Specific Character Set list.

# `extract_all`

```elixir
@spec extract_all(map()) :: [charset()]
```

Extracts all Specific Character Set values from a parsed data set's elements map.

# `supported?`

```elixir
@spec supported?(charset() | nil) :: boolean()
```

Returns true if the given character set label is recognized by the decoder.

For `ISO 2022 IR 6` and `ISO 2022 IR 100`, this means only the non-switching
single-byte subset is accepted. Values containing ISO 2022 escape sequences
still return an error from `decode/2`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*