tongue_tied
A BCP47 language tag parser for Gleam.
This module provides functionality to parse and validate language tags according to RFC 5646 (BCP47) specification. It handles all types of language tags including well-formed language tags, grandfathered tags, and private use tags.
Examples
import tongue_tied
// Parse a simple language tag
let assert Ok(tag) = tongue_tied.parse_language_tag("en-US")
// Returns: LangTag(language: "en", region: Some("us"), ...)
// Parse a complex tag with script
let assert Ok(complex) = tongue_tied.parse_language_tag("zh-Hant-HK")
// Returns: LangTag(language: "zh", script: Some("hant"), region: Some("hk"), ...)
// Parse grandfathered tags
let assert Ok(grandfathered) = tongue_tied.parse_language_tag("i-klingon")
// Returns: Grandfathered("i-klingon")
// Parse private use tags
let assert Ok(private) = tongue_tied.parse_language_tag("x-custom-lang")
// Returns: PrivateUse("x-custom-lang")
Types
Represents an extension in a language tag.
Extensions provide a mechanism for extending language tags with additional information using single-character singletons.
pub type Extension {
Extension(singleton: String, subtags: List(String))
}
Constructors
-
Extension(singleton: String, subtags: List(String))An extension with its singleton character and subtags.
Fields
singleton: Single character identifier (except ‘x’)subtags: List of 2-8 character alphanumeric subtags
Represents a parsed language tag according to BCP47 specification.
This type covers all possible language tag formats:
LangTag: Well-formed language tags with optional componentsPrivateUse: Private use tags starting with “x-”Grandfathered: Special grandfathered tags preserved for compatibility
pub type LanguageTag {
LangTag(
language: String,
script: option.Option(String),
region: option.Option(String),
variants: List(String),
extensions: List(Extension),
privateuse: option.Option(String),
)
PrivateUse(privateuse: String)
Grandfathered(tag: String)
}
Constructors
-
LangTag( language: String, script: option.Option(String), region: option.Option(String), variants: List(String), extensions: List(Extension), privateuse: option.Option(String), )A well-formed language tag with its components.
Fields
language: The primary language subtag, possibly including extended language subtagsscript: Optional 4-letter script code (e.g., “Latn”, “Hant”)region: Optional 2-letter or 3-digit region code (e.g., “US”, “419”)variants: List of variant subtagsextensions: List of extension subtags with their singletonsprivateuse: Optional private use subtags
-
PrivateUse(privateuse: String)A private use language tag starting with “x-”.
These tags are used for private agreements between parties.
-
Grandfathered(tag: String)A grandfathered language tag.
These are special tags that were registered before BCP47 and are maintained for backward compatibility (e.g., “i-klingon”, “en-GB-oed”).
Represents errors that can occur during language tag parsing.
pub type ParseError {
InvalidFormat(message: String)
UnexpectedEnd
InvalidCharacter(char: String, position: Int)
}
Constructors
-
InvalidFormat(message: String)Invalid format with descriptive message.
-
UnexpectedEndUnexpected end of input.
-
InvalidCharacter(char: String, position: Int)Invalid character at specific position.
Values
pub fn parse_language_tag(
input: String,
) -> Result(LanguageTag, ParseError)
Parse a BCP47 language tag string into its structured representation.
This function accepts any valid BCP47 language tag and returns the appropriate
LanguageTag variant. The input is normalized to lowercase before parsing
(except for script codes which are title-cased in the output).
Examples
// Simple language
parse_language_tag("en")
// -> Ok(LangTag(language: "en", script: None, region: None, ...))
// Language with region
parse_language_tag("fr-CA")
// -> Ok(LangTag(language: "fr", region: Some("ca"), ...))
// Complex tag with script, region, and variant
parse_language_tag("de-Latn-DE-1901")
// -> Ok(LangTag(language: "de", script: Some("latn"), region: Some("de"), variants: ["1901"], ...))
// Grandfathered tag
parse_language_tag("i-klingon")
// -> Ok(Grandfathered("i-klingon"))
// Private use tag
parse_language_tag("x-my-language")
// -> Ok(PrivateUse("x-my-language"))
// Invalid tag
parse_language_tag("invalid-tag-123456789")
// -> Error(InvalidFormat("..."))