tongue_tied

A BCP47 language tag parser for Gleam.

This module provides functionality to parse and validate language tags according to RFC 5646 (BCP47) specification. It handles all types of language tags including well-formed language tags, grandfathered tags, and private use tags.

Examples

import tongue_tied

// Parse a simple language tag
let assert Ok(tag) = tongue_tied.parse_language_tag("en-US")
// Returns: LangTag(language: "en", region: Some("us"), ...)

// Parse a complex tag with script
let assert Ok(complex) = tongue_tied.parse_language_tag("zh-Hant-HK")
// Returns: LangTag(language: "zh", script: Some("hant"), region: Some("hk"), ...)

// Parse grandfathered tags
let assert Ok(grandfathered) = tongue_tied.parse_language_tag("i-klingon")
// Returns: Grandfathered("i-klingon")

// Parse private use tags
let assert Ok(private) = tongue_tied.parse_language_tag("x-custom-lang")
// Returns: PrivateUse("x-custom-lang")

Types

Represents an extension in a language tag.

Extensions provide a mechanism for extending language tags with additional information using single-character singletons.

pub type Extension {
  Extension(singleton: String, subtags: List(String))
}

Constructors

  • Extension(singleton: String, subtags: List(String))

    An extension with its singleton character and subtags.

    Fields

    • singleton: Single character identifier (except ‘x’)
    • subtags: List of 2-8 character alphanumeric subtags

Represents a parsed language tag according to BCP47 specification.

This type covers all possible language tag formats:

  • LangTag: Well-formed language tags with optional components
  • PrivateUse: Private use tags starting with “x-”
  • Grandfathered: Special grandfathered tags preserved for compatibility
pub type LanguageTag {
  LangTag(
    language: String,
    script: option.Option(String),
    region: option.Option(String),
    variants: List(String),
    extensions: List(Extension),
    privateuse: option.Option(String),
  )
  PrivateUse(privateuse: String)
  Grandfathered(tag: String)
}

Constructors

  • LangTag(
      language: String,
      script: option.Option(String),
      region: option.Option(String),
      variants: List(String),
      extensions: List(Extension),
      privateuse: option.Option(String),
    )

    A well-formed language tag with its components.

    Fields

    • language: The primary language subtag, possibly including extended language subtags
    • script: Optional 4-letter script code (e.g., “Latn”, “Hant”)
    • region: Optional 2-letter or 3-digit region code (e.g., “US”, “419”)
    • variants: List of variant subtags
    • extensions: List of extension subtags with their singletons
    • privateuse: Optional private use subtags
  • PrivateUse(privateuse: String)

    A private use language tag starting with “x-”.

    These tags are used for private agreements between parties.

  • Grandfathered(tag: String)

    A grandfathered language tag.

    These are special tags that were registered before BCP47 and are maintained for backward compatibility (e.g., “i-klingon”, “en-GB-oed”).

Represents errors that can occur during language tag parsing.

pub type ParseError {
  InvalidFormat(message: String)
  UnexpectedEnd
  InvalidCharacter(char: String, position: Int)
}

Constructors

  • InvalidFormat(message: String)

    Invalid format with descriptive message.

  • UnexpectedEnd

    Unexpected end of input.

  • InvalidCharacter(char: String, position: Int)

    Invalid character at specific position.

Values

pub fn parse_language_tag(
  input: String,
) -> Result(LanguageTag, ParseError)

Parse a BCP47 language tag string into its structured representation.

This function accepts any valid BCP47 language tag and returns the appropriate LanguageTag variant. The input is normalized to lowercase before parsing (except for script codes which are title-cased in the output).

Examples

// Simple language
parse_language_tag("en")
// -> Ok(LangTag(language: "en", script: None, region: None, ...))

// Language with region
parse_language_tag("fr-CA")
// -> Ok(LangTag(language: "fr", region: Some("ca"), ...))

// Complex tag with script, region, and variant
parse_language_tag("de-Latn-DE-1901")
// -> Ok(LangTag(language: "de", script: Some("latn"), region: Some("de"), variants: ["1901"], ...))

// Grandfathered tag
parse_language_tag("i-klingon")
// -> Ok(Grandfathered("i-klingon"))

// Private use tag
parse_language_tag("x-my-language")
// -> Ok(PrivateUse("x-my-language"))

// Invalid tag
parse_language_tag("invalid-tag-123456789")
// -> Error(InvalidFormat("..."))
Search Document