Using the API#

The Google Natural Language API can be used to reveal the structure and meaning of text via powerful machine learning models. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app. You can analyze text uploaded in your request or integrate with your document storage on Google Cloud Storage.

Warning

This is a Beta release of Google Cloud Natural Language API. This API is not intended for real-time usage in critical applications.

Client#

Client objects provide a means to configure your application. Each instance holds an authenticated connection to the Natural Language service.

For an overview of authentication in google-cloud-python, see Authentication.

Assuming your environment is set up as described in that document, create an instance of Client.

>>> from google.cloud import language
>>> client = language.Client()

By default the language is 'en-US' and the encoding is UTF-8. To over-ride these values:

>>> client = language.Client(language='es',
...                          encoding=language.Encoding.UTF16)

The encoding can be one of Encoding.UTF8, Encoding.UTF16, or Encoding.UTF32.

Methods#

The Google Natural Language API has three supported methods

and each method uses a Document for representing text. To create a Document,

>>> text_content = (
...     'Google, headquartered in Mountain View, unveiled the '
...     'new Android phone at the Consumer Electronic Show.  '
...     'Sundar Pichai said in his keynote that users love '
...     'their new Android phones.')
>>> document = client.document_from_text(text_content)

By using document_from_text(), the document’s type is plain text:

>>> document.doc_type == language.Document.PLAIN_TEXT
True

In addition, the document’s language defaults to the language on the client

>>> document.language
'en-US'
>>> document.language == client.language
True

In addition, the document_from_html(), factory can be used to created an HTML document. In this method and the from text method, the language can be over-ridden:

>>> html_content = """\
... <html>
...   <head>
...     <title>El Tiempo de las Historias</time>
...   </head>
...   <body>
...     <p>La vaca salt&oacute; sobre la luna.</p>
...   </body>
... </html>
... """
>>> document = client.document_from_html(html_content,
...                                      language='es')

The language argument can be either ISO-639-1 or BCP-47 language codes; at the time, only English, Spanish, and Japanese are supported. However, the analyzeSentiment method only supports English text.

The document type (doc_type) value can be one of Document.PLAIN_TEXT or Document.HTML.

In addition to supplying the text / HTML content, a document can refer to content stored in Google Cloud Storage. We can use the document_from_url() method:

>>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt'
>>> document = client.document_from_url(
...     gcs_url, doc_type=language.Document.HTML)
>>> document.gcs_url == gcs_url
True
>>> document.doc_type == language.Document.PLAIN_TEXT
True

The document type can be specified with the doc_type argument:

>>> document = client.document_from_url(
...     gcs_url, doc_type=language.Document.HTML)

Analyze Entities#

The analyze_entities() method finds named entities (i.e. proper names) in the text and returns them as a list of Entity objects. Each entity has a corresponding type, salience (prominence), associated metadata and other properties.

>>> text_content = ("Michelangelo Caravaggio, Italian painter, is "
...                 "known for 'The Calling of Saint Matthew'.")
>>> document = client.document(text_content)
>>> entities = document.analyze_entities()
>>> for entity in entities:
...     print('=' * 20)
...     print('         name: %s' % (entity.name,))
...     print('         type: %s' % (entity.entity_type,))
...     print('wikipedia_url: %s' % (entity.wikipedia_url,))
...     print('     metadata: %s' % (entity.metadata,))
...     print('     salience: %s' % (entity.salience,))
====================
         name: Michelangelo Caravaggio
         type: PERSON
wikipedia_url: http://en.wikipedia.org/wiki/Caravaggio
     metadata: {}
     salience: 0.7615959
====================
         name: Italian
         type: LOCATION
wikipedia_url: http://en.wikipedia.org/wiki/Italy
     metadata: {}
     salience: 0.19960518
====================
         name: The Calling of Saint Matthew
         type: EVENT
wikipedia_url: http://en.wikipedia.org/wiki/The_Calling_of_St_Matthew_(Caravaggio)
     metadata: {}
     salience: 0.038798928

Analyze Sentiment#

The analyze_sentiment() method analyzes the sentiment of the provided text and returns a Sentiment. Currently, this method only supports English text.

>>> text_content = "Jogging isn't very fun."
>>> document = client.document(text_content)
>>> sentiment = document.analyze_sentiment()
>>> print(sentiment.polarity)
-1
>>> print(sentiment.magnitude)
0.8

Annotate Text#

The annotate_text() method analyzes a document and is intended for users who are familiar with machine learning and need in-depth text features to build upon.

The method returns a named tuple with four entries:

sentences: A list of sentences in the text
tokens: A list of Token object (e.g. words, punctuation)
sentiment: The Sentiment of the text (as returned by analyze_sentiment())
entities: list of Entity objects extracted from the text (as returned by analyze_entities())

By default annotate_text() has three arguments include_syntax, include_entities and include_sentiment which are all True. However, each of these Features can be selectively turned off by setting the corresponding arguments to False.

When include_syntax=False, sentences and tokens in the response is None. When include_sentiment, sentiment in the response is None. When include_entities, entities in the response is None.

>>> text_content = 'The cow jumped over the Moon.'
>>> document = client.document(text_content)
>>> annotations = document.annotate_text()
>>> # Sentences present if include_syntax=True
>>> print(annotations.sentences)
['The cow jumped over the Moon.']
>>> # Tokens present if include_syntax=True
>>> for token in annotations.tokens:
...     msg = '%11s: %s' % (token.part_of_speech, token.text_content)
...     print(msg)
 DETERMINER: The
       NOUN: cow
       VERB: jumped
 ADPOSITION: over
 DETERMINER: the
       NOUN: Moon
PUNCTUATION: .
>>> # Sentiment present if include_sentiment=True
>>> print(annotations.sentiment.polarity)
1
>>> print(annotations.sentiment.magnitude)
0.1
>>> # Entities present if include_entities=True
>>> for entity in annotations.entities:
...     print('=' * 20)
...     print('         name: %s' % (entity.name,))
...     print('         type: %s' % (entity.entity_type,))
...     print('wikipedia_url: %s' % (entity.wikipedia_url,))
...     print('     metadata: %s' % (entity.metadata,))
...     print('     salience: %s' % (entity.salience,))
====================
         name: Moon
         type: LOCATION
wikipedia_url: http://en.wikipedia.org/wiki/Natural_satellite
     metadata: {}
     salience: 0.11793101