View Source Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.8.1 - 2026-02-06

Mistral converter now handles real Voxtral API output formats (previously only worked with synthetic test data)
Support for diarized mode: speaker_id field, sentence-level segments without nested words
Support for word-level mode: individual word segments grouped into utterances by punctuation, pauses, or length
Leading whitespace in Mistral segment text is now trimmed
WebVTT generation no longer produces empty files for Mistral transcripts

WebVTT format_subtitle_time/1 now accepts integer timestamps (required for Mistral)
Code formatting in lib/convert/speechmatics.ex

Language code normalization during transcript conversion
New BoldTranscriptsEx.Convert.Language module with vendor-specific normalization functions
Support for BCP-47 format (Deepgram), underscore format (AssemblyAI), and base language codes (Speechmatics)
Comprehensive test coverage for language normalization

BREAKING: Language codes in Bold format metadata are now normalized to internal format (en_us, en_uk, de_de, etc.)
Deepgram: BCP-47 codes (e.g., en-US) are converted to underscore format (en_us), with special handling for en-GB → en_uk
AssemblyAI: Language codes are normalized and base languages get default regions (e.g., de → de_de)
Speechmatics: Base language codes are mapped to default regional variants (e.g., en → en_us)
All converters now return normalized language codes in metadata.language field

Language normalization now happens at the library level instead of requiring application-side normalization
Consistent language code format across all vendor conversions

For releases before 0.6.0, see the git history.