Speech Client#

Basic client for Google Cloud Speech API.

class google.cloud.speech.client.Client(credentials=None, http=None)[source]#

Bases: google.cloud.client.Client

Client to bundle configuration needed for API requests.

Parameters:
  • project (str) – The project which the client acts on behalf of. Will be passed when creating a dataset / job. If not passed, falls back to the default inferred from the environment.
  • credentials (oauth2client.client.OAuth2Credentials or NoneType) – The OAuth2 Credentials to use for the connection owned by this client. If not passed (and if no http object is passed), falls back to the default inferred from the environment.
  • http (httplib2.Http or class that defines request().) – An optional HTTP object to make requests. If not passed, an http object is created that is bound to the credentials for the current object.
async_recognize(content, source_uri, encoding, sample_rate, language_code=None, max_alternatives=None, profanity_filter=None, speech_context=None)[source]#

Asychronous Recognize request to Google Speech API.

See async_recognize.

Parameters:
  • content (bytes) – Byte stream of audio.
  • source_uri (str) – URI that points to a file that contains audio data bytes as specified in RecognitionConfig. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: gs://bucket_name/object_name.
  • encoding (str) – encoding of audio data sent in all RecognitionAudio messages, can be one of: LINEAR16, FLAC, MULAW, AMR, AMR_WB
  • sample_rate (int) – Sample rate in Hertz of the audio data sent in all requests. Valid values are: 8000-48000. For best results, set the sampling rate of the audio source to 16000 Hz. If that’s not possible, use the native sample rate of the audio source (instead of re-sampling).
  • language_code (str) – (Optional) The language of the supplied audio as BCP-47 language tag. Example: 'en-GB'. If omitted, defaults to 'en-US'.
  • max_alternatives (int) – (Optional) Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. Defaults to 1
  • profanity_filter (bool) – If True, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. 'f***'. If False or omitted, profanities won’t be filtered out.
  • speech_context (list) – A list of strings (max 50) containing words and phrases “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases. This can also be used to add new words to the vocabulary of the recognizer.
Return type:

~google.cloud.speech.operation.Operation

Returns:

Operation for asynchronous request to Google Speech API.

sync_recognize(content, source_uri, encoding, sample_rate, language_code=None, max_alternatives=None, profanity_filter=None, speech_context=None)[source]#

Synchronous Speech Recognition.

See sync_recognize.

Parameters:
  • content (bytes) – Byte stream of audio.
  • source_uri (str) – URI that points to a file that contains audio data bytes as specified in RecognitionConfig. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: gs://bucket_name/object_name.
  • encoding (str) – encoding of audio data sent in all RecognitionAudio messages, can be one of: LINEAR16, FLAC, MULAW, AMR, AMR_WB
  • sample_rate (int) – Sample rate in Hertz of the audio data sent in all requests. Valid values are: 8000-48000. For best results, set the sampling rate of the audio source to 16000 Hz. If that’s not possible, use the native sample rate of the audio source (instead of re-sampling).
  • language_code (str) – (Optional) The language of the supplied audio as BCP-47 language tag. Example: 'en-GB'. If omitted, defaults to 'en-US'.
  • max_alternatives (int) – (Optional) Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. Defaults to 1
  • profanity_filter (bool) – If True, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. 'f***'. If False or omitted, profanities won’t be filtered out.
  • speech_context (list) – A list of strings (max 50) containing words and phrases “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases. This can also be used to add new words to the vocabulary of the recognizer.
Return type:

list

Returns:

A list of dictionaries. One dict for each alternative. Each dictionary typically contains two keys (though not all will be present in all cases)

  • transcript: The detected text from the audio recording.
  • confidence: The confidence in language detection, float between 0 and 1.

Connection#

Create / interact with Google Cloud Speech connections.

class google.cloud.speech.connection.Connection(credentials=None, http=None)[source]#

Bases: google.cloud.connection.JSONConnection

A connection to Google Cloud Speech JSON REST API.

API_BASE_URL = 'https://speech.googleapis.com'#

The base of the API call URL.

API_URL_TEMPLATE = '{api_base_url}/{api_version}/{path}'#

A template for the URL of a particular API call.

API_VERSION = 'v1beta1'#

The version of the API, used in building the API call’s URL.

SCOPE = ('https://www.googleapis.com/auth/cloud-platform',)#

The scopes required for authenticating as an API consumer.