Using the API#

The Google Speech API enables developers to convert audio to text. The API recognizes over 80 languages and variants, to support your global user base.


This is a Beta release of Google Speech API. This API is not intended for real-time usage in critical applications.


Client objects provide a means to configure your application. Each instance holds an authenticated connection to the Natural Language service.

For an overview of authentication in google-cloud-python, see Authentication.

Assuming your environment is set up as described in that document, create an instance of Client.

>>> from import speech
>>> client = speech.Client()

Asychronous Recognition#

The async_recognize() sends audio data to the Speech API and initiates a Long Running Operation. Using this operation, you can periodically poll for recognition results. Use asynchronous requests for audio data of any duration up to 80 minutes.

See: Speech Asynchronous Recognize

>>> import time
>>> operation = client.async_recognize(
...     None, 'gs://my-bucket/recording.flac',
...     'FLAC', 16000, max_alternatives=2)
>>> retry_count = 100
>>> while retry_count > 0 and not operation.complete:
...     retry_count -= 1
...     time.sleep(10)
...     operation.poll()  # API call
>>> operation.complete
>>> operation.results[0].transcript
'how old is the Brooklyn Bridge'
>>> operation.results[0].confidence

Synchronous Recognition#

The sync_recognize() method converts speech data to text and returns alternative text transcriptons.

>>> alternatives = client.sync_recognize(
...     None, 'gs://my-bucket/recording.flac',
...     'FLAC', 16000, max_alternatives=2)
>>> for alternative in alternatives:
...     print('=' * 20)
...     print('transcript: ' + alternative['transcript'])
...     print('confidence: ' + alternative['confidence'])
transcript: Hello, this is a test
confidence: 0.81
transcript: Hello, this is one test
confidence: 0