View Source BoldTranscriptsEx.Convert.Mistral (bold_transcripts_ex v0.8.1)

Handles conversion of Mistral Voxtral transcription files to Bold format.

Supports three segment formats from the Voxtral API:

  • Legacy: Segments have a nested "words" array and a "speaker" key.
  • Diarized: Segments are sentence-level with "speaker_id" but no "words".
  • Word-level: Segments are individual words with no speaker info and no "words".

Summary

Functions

Converts a Mistral Voxtral transcript to the Bold Transcript format v2.

Functions

transcript_to_bold(transcript, opts \\ [])

Converts a Mistral Voxtral transcript to the Bold Transcript format v2.

Parameters

  • transcript: The JSON string or decoded map of the transcript data from Voxtral.
  • opts: Options for the conversion:
    • :language: The language code of the transcript (e.g., "en", "de"). Defaults to "en".

Returns

  • {:ok, merged_data}: A tuple with :ok atom and the data in Bold Transcript format.

Examples

iex> transcript = ~s({"text": "Hello", "segments": [{"id": 0, "start": 0.0, "end": 1.0, "text": "Hello", "speaker": "speaker_0", "words": [{"word": "Hello", "start": 0.0, "end": 1.0}]}]})
iex> BoldTranscriptsEx.Convert.Mistral.transcript_to_bold(transcript)
{:ok, %{"metadata" => %{"version" => "2.0", "duration" => 1.0, "language" => "en_us", "source_url" => "", "source_vendor" => "mistral", "source_model" => "", "source_version" => "", "transcription_date" => nil, "speakers" => %{"A" => nil}}, "utterances" => [%{"start" => 0.0, "end" => 1.0, "text" => "Hello", "speaker" => "A", "confidence" => 1.0, "words" => [%{"word" => "Hello", "start" => 0.0, "end" => 1.0, "confidence" => 1.0}]}]}}