Document#

Definition for Google Cloud Natural Language API documents.

A document is used to hold text to be analyzed and annotated.

class google.cloud.language.document.Annotations(sentences, tokens, sentiment, entities)#

Bases: tuple

Annotations for a document.

Parameters:	sentences (list) – List of `Sentence` in a document. tokens (list) – List of `Token` from a document. sentiment (`Sentiment`) – The sentiment of a document. entities (list) – List of `Entity` found in a document.

entities#: Alias for field number 3

sentences#: Alias for field number 0

sentiment#: Alias for field number 2

tokens#: Alias for field number 1

google.cloud.language.document.DEFAULT_LANGUAGE = 'en-US'#: Default document language, English.

class google.cloud.language.document.Document(client, content=None, gcs_url=None, doc_type='PLAIN_TEXT', language='en-US', encoding='UTF8')[source]#

Bases: object

Document to send to Google Cloud Natural Language API.

Represents either plain text or HTML, and the content is either stored on the document or referred to in a Google Cloud Storage object.

Parameters:

client (Client) – A client which holds credentials and other configuration.
content (str) – (Optional) The document text content (either plain text or HTML).
gcs_url (str) – (Optional) The URL of the Google Cloud Storage object holding the content. Of the form gs://{bucket}/{blob-name}.
doc_type (str) – (Optional) The type of text in the document. Defaults to plain text. Can be one of PLAIN_TEXT or or HTML.
language (str) – (Optional) The language of the document text. Defaults to DEFAULT_LANGUAGE.
encoding (str) – (Optional) The encoding of the document text. Defaults to UTF-8. Can be one of UTF8, UTF16 or UTF32.

Raises:

ValueError both content and gcs_url are specified or if neither are specified.

HTML = 'HTML'#: HTML document type.

PLAIN_TEXT = 'PLAIN_TEXT'#: Plain text document type.

TYPE_UNSPECIFIED = 'TYPE_UNSPECIFIED'#: Unspecified document type.

analyze_entities()[source]#

Analyze the entities in the current document.

Finds named entities (currently finds proper names as of August 2016) in the text, entity types, salience, mentions for each entity, and other properties.

See analyzeEntities.

Return type:	list
Returns:	A list of `Entity` returned from the API.

analyze_sentiment()[source]#

Analyze the sentiment in the current document.

See analyzeSentiment.

Return type:	`Sentiment`
Returns:	The sentiment of the current document.

annotate_text(include_syntax=True, include_entities=True, include_sentiment=True)[source]#

Advanced natural language API: document syntax and other features.

Includes the full functionality of analyze_entities() and analyze_sentiment(), enabled by the flags include_entities and include_sentiment respectively.

In addition include_syntax adds a new feature that analyzes the document for semantic and syntacticinformation.

Note

This API is intended for users who are familiar with machine learning and need in-depth text features to build upon.

See annotateText.

Parameters:	include_syntax (bool) – (Optional) Flag to enable syntax analysis of the current document. include_entities (bool) – (Optional) Flag to enable entity extraction from the current document. include_sentiment (bool) – (Optional) Flag to enable sentiment analysis of the current document.
Return type:	`Annotations`
Returns:	A tuple of each of the four values returned from the API: sentences, tokens, sentiment and entities.

class google.cloud.language.document.Encoding[source]#

Bases: object

Document text encoding types.

NONE = 'NONE'#: Unspecified encoding type.

UTF16 = 'UTF16'#: UTF-16 encoding type.

UTF32 = 'UTF32'#: UTF-32 encoding type.

UTF8 = 'UTF8'#: UTF-8 encoding type.