View Source BoldTranscriptsEx.Convert.Mistral (bold_transcripts_ex v0.8.1)
Handles conversion of Mistral Voxtral transcription files to Bold format.
Supports three segment formats from the Voxtral API:
- Legacy: Segments have a nested
"words"array and a"speaker"key. - Diarized: Segments are sentence-level with
"speaker_id"but no"words". - Word-level: Segments are individual words with no speaker info and no
"words".
Summary
Functions
Converts a Mistral Voxtral transcript to the Bold Transcript format v2.
Functions
Converts a Mistral Voxtral transcript to the Bold Transcript format v2.
Parameters
transcript: The JSON string or decoded map of the transcript data from Voxtral.opts: Options for the conversion::language: The language code of the transcript (e.g., "en", "de"). Defaults to "en".
Returns
{:ok, merged_data}: A tuple with:okatom and the data in Bold Transcript format.
Examples
iex> transcript = ~s({"text": "Hello", "segments": [{"id": 0, "start": 0.0, "end": 1.0, "text": "Hello", "speaker": "speaker_0", "words": [{"word": "Hello", "start": 0.0, "end": 1.0}]}]})
iex> BoldTranscriptsEx.Convert.Mistral.transcript_to_bold(transcript)
{:ok, %{"metadata" => %{"version" => "2.0", "duration" => 1.0, "language" => "en_us", "source_url" => "", "source_vendor" => "mistral", "source_model" => "", "source_version" => "", "transcription_date" => nil, "speakers" => %{"A" => nil}}, "utterances" => [%{"start" => 0.0, "end" => 1.0, "text" => "Hello", "speaker" => "A", "confidence" => 1.0, "words" => [%{"word" => "Hello", "start" => 0.0, "end" => 1.0, "confidence" => 1.0}]}]}}