View Source Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.8.1 - 2026-02-06

Fixed

  • Mistral converter now handles real Voxtral API output formats (previously only worked with synthetic test data)
  • Support for diarized mode: speaker_id field, sentence-level segments without nested words
  • Support for word-level mode: individual word segments grouped into utterances by punctuation, pauses, or length
  • Leading whitespace in Mistral segment text is now trimmed
  • WebVTT generation no longer produces empty files for Mistral transcripts

0.8.0 - 2026-02-06

Added

  • Mistral Voxtral transcript converter (BoldTranscriptsEx.Convert.Mistral)
  • Support for :mistral vendor in BoldTranscriptsEx.convert/3
  • Mistral-specific language normalization in Convert.Language
  • GitHub Actions CI (test + format check on push/PR)
  • GitHub Actions release automation (auto-publish to hex.pm)
  • MIT LICENSE file
  • llms.txt for LLM discoverability

Changed

  • Updated package description to include all supported vendors
  • Added source_url and homepage_url to hex.pm package metadata

Removed

  • lib/test.ex — development scratch file
  • lib/webvtt2.ex — unused experimental module

Fixed

  • WebVTT format_subtitle_time/1 now accepts integer timestamps (required for Mistral)
  • Code formatting in lib/convert/speechmatics.ex

0.7.0 - 2025-10-08

Added

  • Language code normalization during transcript conversion
  • New BoldTranscriptsEx.Convert.Language module with vendor-specific normalization functions
  • Support for BCP-47 format (Deepgram), underscore format (AssemblyAI), and base language codes (Speechmatics)
  • Comprehensive test coverage for language normalization

Changed

  • BREAKING: Language codes in Bold format metadata are now normalized to internal format (en_us, en_uk, de_de, etc.)
  • Deepgram: BCP-47 codes (e.g., en-US) are converted to underscore format (en_us), with special handling for en-GBen_uk
  • AssemblyAI: Language codes are normalized and base languages get default regions (e.g., dede_de)
  • Speechmatics: Base language codes are mapped to default regional variants (e.g., enen_us)
  • All converters now return normalized language codes in metadata.language field

Fixed

  • Language normalization now happens at the library level instead of requiring application-side normalization
  • Consistent language code format across all vendor conversions

0.6.0 - 2024-12-29

Added

  • Speechmatics integration
  • Bold Transcript Format v2.0 support

For releases before 0.6.0, see the git history.