View Source Google Cloud Speech gRPC API client

Hex.pm

Elixir client for Google Speech-to-Text V2 streaming API using gRPC

Installation

The package can be installed by adding :ex_google_stt to your list of dependencies in mix.exs:

def deps do
  [
    {:ex_google_stt, "~> 0.0.1"}
  ]
end

Configuration

This library uses Goth to obtain authentication tokens. It requires Google Cloud credendials to be configured. See Goth's README for details.

Tests with tag :integration communicate with Google APIs and require such config, thus are excluded by default, use mix test --include integration to run them.

Usage

Introduction

Here's the basic flow:

Create your Genserver

A Genserver is required to that the StreamingServer can send the transcriptions back to the caller. This is captured via a handle_info

Build the configurations

recognizer = "projects/["project_id"]/locations/global/recognizers/_"

cfg = %RecognitionConfig{
  decoding_config:
    {:auto_decoding_config, %Google.Cloud.Speech.V2.AutoDetectDecodingConfig{}},
  model: "long",
  language_codes: ["en-GB"],
  features: %{enable_automatic_punctuation: true}
}

str_cfg = %StreamingRecognitionConfig{
  config: cfg,
  streaming_features: %{interim_results: true}
}

str_cfg_req = %StreamingRecognizeRequest{
  streaming_request: {:streaming_config, str_cfg},
  recognizer: @recognizer
}

Start the server

  • Start the StreamingServer with start_link
  • Send the configuration request. This must always be the first request.
    {:ok, transcription_server} = StreamingServer.start_link()
    StreamingServer.send_config(transcription_server, str_cfg_req)

Send Requests

request = %StreamingRecognizeRequest{streaming_request: {:audio, data}, recognizer: recognizer}

StreamingServer.send_request(transcription_server, request)

Receive Responses

This is done in the original caller. You can also include this in a Phoenix.Channel.

def handle_info(%StreamingRecognizeResponse{} = response, state) do
  results = response.results
  transcripts = Enum.map(results, fn result ->
    [alternative] = result.alternatives
      %{content: alternative.transcript, is_final: result.is_final}
  end)
end

Infinite stream

Google's STT V2 knows when a sentence finishes, as long as there's some silence after it. When that happens, it'll return the transcription without ending the stream.

Therefore, as long as we keep the stream open, we can keep transcribing realtime speech.

A few points to notice though.

  • The model must be long or latest_long. short will result in ending the stream after the first utterance.
  • One must end the stream to ensure the transcription stops.

Auto-generated modules

This library uses protobuf-elixir and its protoc-gen-elixir plugin to generate Elixir modules from *.proto files for Google's Speech gRPC API. The documentation for the types defined in *.proto files can be found here

Fixture

A recording fragment in test/fixtures comes from an audiobook "The adventures of Sherlock Holmes (version 2)" available on LibriVox

Status

Current version of library supports only Streaming API and not tested in production. Treat this as experimental.

License

This project includes modified code from [Original Project or Code Name], which is licensed under the Apache License 2.0 (the "License"). You may not use the files containing modifications from the original project except in compliance with the License. A copy of the License is included in this project in the file named LICENSE.

The original work is available at [link to the original repository or project homepage].

Portions of this project are modifications based on work created by Software Mansion and used according to terms described in the Apache License 2.0. See here for the original repository.

The modifications are also licensed under Apache License 2.0.

Disclaimer

While this project includes modified code from [Original Project or Code Name], it is not endorsed by or affiliated with the original authors or their organizations.