View Source Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.8.1 - 2026-02-06
Fixed
- Mistral converter now handles real Voxtral API output formats (previously only worked with synthetic test data)
- Support for diarized mode:
speaker_idfield, sentence-level segments without nestedwords - Support for word-level mode: individual word segments grouped into utterances by punctuation, pauses, or length
- Leading whitespace in Mistral segment text is now trimmed
- WebVTT generation no longer produces empty files for Mistral transcripts
0.8.0 - 2026-02-06
Added
- Mistral Voxtral transcript converter (
BoldTranscriptsEx.Convert.Mistral) - Support for
:mistralvendor inBoldTranscriptsEx.convert/3 - Mistral-specific language normalization in
Convert.Language - GitHub Actions CI (test + format check on push/PR)
- GitHub Actions release automation (auto-publish to hex.pm)
- MIT LICENSE file
- llms.txt for LLM discoverability
Changed
- Updated package description to include all supported vendors
- Added
source_urlandhomepage_urlto hex.pm package metadata
Removed
lib/test.ex— development scratch filelib/webvtt2.ex— unused experimental module
Fixed
- WebVTT
format_subtitle_time/1now accepts integer timestamps (required for Mistral) - Code formatting in
lib/convert/speechmatics.ex
0.7.0 - 2025-10-08
Added
- Language code normalization during transcript conversion
- New
BoldTranscriptsEx.Convert.Languagemodule with vendor-specific normalization functions - Support for BCP-47 format (Deepgram), underscore format (AssemblyAI), and base language codes (Speechmatics)
- Comprehensive test coverage for language normalization
Changed
- BREAKING: Language codes in Bold format metadata are now normalized to internal format (
en_us,en_uk,de_de, etc.) - Deepgram: BCP-47 codes (e.g.,
en-US) are converted to underscore format (en_us), with special handling foren-GB→en_uk - AssemblyAI: Language codes are normalized and base languages get default regions (e.g.,
de→de_de) - Speechmatics: Base language codes are mapped to default regional variants (e.g.,
en→en_us) - All converters now return normalized language codes in
metadata.languagefield
Fixed
- Language normalization now happens at the library level instead of requiring application-side normalization
- Consistent language code format across all vendor conversions
0.6.0 - 2024-12-29
Added
- Speechmatics integration
- Bold Transcript Format v2.0 support
For releases before 0.6.0, see the git history.