View Source Google Cloud Speech gRPC API client
Elixir client for Google Speech-to-Text V2 streaming API using gRPC
Installation
The package can be installed by adding :ex_google_stt to your list of dependencies in mix.exs:
def deps do
[
{:ex_google_stt, "~> 0.0.1"}
]
endConfiguration
This library uses Goth to obtain authentication tokens. It requires Google Cloud credendials to be configured. See Goth's README for details.
Tests with tag :integration communicate with Google APIs and require such config, thus are
excluded by default, use mix test --include integration to run them.
Usage
Introduction
Here's the basic flow:
Create your Genserver
A Genserver is required to that the StreamingServer can send the transcriptions back to the caller. This is captured via a handle_info
Build the configurations
recognizer = "projects/["project_id"]/locations/global/recognizers/_"
cfg = %RecognitionConfig{
decoding_config:
{:auto_decoding_config, %Google.Cloud.Speech.V2.AutoDetectDecodingConfig{}},
model: "long",
language_codes: ["en-GB"],
features: %{enable_automatic_punctuation: true}
}
str_cfg = %StreamingRecognitionConfig{
config: cfg,
streaming_features: %{interim_results: true}
}
str_cfg_req = %StreamingRecognizeRequest{
streaming_request: {:streaming_config, str_cfg},
recognizer: @recognizer
}Start the server
- Start the
StreamingServerwithstart_link - Send the configuration request. This must always be the first request.
{:ok, transcription_server} = StreamingServer.start_link() StreamingServer.send_config(transcription_server, str_cfg_req)
Send Requests
request = %StreamingRecognizeRequest{streaming_request: {:audio, data}, recognizer: recognizer}
StreamingServer.send_request(transcription_server, request)Receive Responses
This is done in the original caller.
You can also include this in a Phoenix.Channel.
def handle_info(%StreamingRecognizeResponse{} = response, state) do
results = response.results
transcripts = Enum.map(results, fn result ->
[alternative] = result.alternatives
%{content: alternative.transcript, is_final: result.is_final}
end)
endInfinite stream
Google's STT V2 knows when a sentence finishes, as long as there's some silence after it. When that happens, it'll return the transcription without ending the stream.
Therefore, as long as we keep the stream open, we can keep transcribing realtime speech.
A few points to notice though.
- The
modelmust belongorlatest_long.shortwill result in ending the stream after the first utterance. - One must end the stream to ensure the transcription stops.
Auto-generated modules
This library uses protobuf-elixir and its protoc-gen-elixir plugin to generate Elixir modules from *.proto files for Google's Speech gRPC API. The documentation for the types defined in *.proto files can be found here
Fixture
A recording fragment in test/fixtures comes from an audiobook
"The adventures of Sherlock Holmes (version 2)" available on LibriVox
Status
Current version of library supports only Streaming API and not tested in production. Treat this as experimental.
License
This project includes modified code from [Original Project or Code Name], which is licensed under the Apache License 2.0 (the "License"). You may not use the files containing modifications from the original project except in compliance with the License. A copy of the License is included in this project in the file named LICENSE.
The original work is available at [link to the original repository or project homepage].
Portions of this project are modifications based on work created by
and used according to terms described in the Apache License 2.0. See here for the original repository.
The modifications are also licensed under Apache License 2.0.
Disclaimer
While this project includes modified code from [Original Project or Code Name], it is not endorsed by or affiliated with the original authors or their organizations.