Using the API#
The Google Speech API enables developers to convert audio to text. The API recognizes over 80 languages and variants, to support your global user base.
This is a Beta release of Google Speech API. This API is not intended for real-time usage in critical applications.
Client objects provide a
means to configure your application. Each instance holds
an authenticated connection to the Natural Language service.
For an overview of authentication in
Assuming your environment is set up as described in that document,
create an instance of
>>> from google.cloud import speech >>> client = speech.Client()
async_recognize() sends audio data to the
Speech API and initiates a Long Running Operation. Using this operation, you
can periodically poll for recognition results. Use asynchronous requests for
audio data of any duration up to 80 minutes.
>>> import time >>> operation = client.async_recognize( ... None, 'gs://my-bucket/recording.flac', ... 'FLAC', 16000, max_alternatives=2) >>> retry_count = 100 >>> while retry_count > 0 and not operation.complete: ... retry_count -= 1 ... time.sleep(10) ... operation.poll() # API call >>> operation.complete True >>> operation.results.transcript 'how old is the Brooklyn Bridge' >>> operation.results.confidence 0.98267895
sync_recognize() method converts speech
data to text and returns alternative text transcriptons.
>>> alternatives = client.sync_recognize( ... None, 'gs://my-bucket/recording.flac', ... 'FLAC', 16000, max_alternatives=2) >>> for alternative in alternatives: ... print('=' * 20) ... print('transcript: ' + alternative['transcript']) ... print('confidence: ' + alternative['confidence']) ==================== transcript: Hello, this is a test confidence: 0.81 ==================== transcript: Hello, this is one test confidence: 0