WhisperCt2.Segment (whisper_ct2 v0.5.0)

Copy Markdown View Source

One <|t_start|> text <|t_end|> segment of a transcription.

Times are absolute seconds within the input audio. tokens is the raw text-token ID list (timestamp tokens stripped); useful for diarization or custom decoding. no_speech_prob is the no-speech probability of the parent 30 s chunk, repeated on every segment in that chunk. avg_logprob is the sequence-level average log probability returned by CTranslate2 - filter at e.g. avg_logprob < -1.0 to reject low-confidence hallucination. words is nil unless :word_timestamps was set on the transcribe call; when present it carries one %WhisperCt2.Word{} per Whisper word with its own time span.

Summary

Types

t()

@type t() :: %WhisperCt2.Segment{
  avg_logprob: float(),
  end: float(),
  no_speech_prob: float(),
  start: float(),
  text: String.t(),
  tokens: [non_neg_integer()],
  words: [WhisperCt2.Word.t()] | nil
}