View Source GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument (google_api_content_warehouse v0.4.0)

A document contains the raw text contents of the document as well as an analysis. The document can be split into tokens which can contain information about POS tags and dependency relations. The document can also contain entities and mentions of these entities in the document. Next available id: 36

Attributes

  • relation (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftRelation.t), default: nil) - Relations between entities in the document.
  • annotations (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t, default: nil) - Generic annotations.
  • contentage (type: String.t, default: nil) - Age of the content of the document. For details, see: quality/historical/shingle/signals/contentage.proto The format has been translated to a canonical timestamp (seconds since epoch).
  • bylineDate (type: String.t, default: nil) - Document's byline date, if available: this is the date that will be shown in the snippets in web search results. It is stored as the number of seconds since epoch. See segindexer/compositedoc.proto
  • date (type: String.t, default: nil) - Document anchor date in YYYYMMDDhhmmss format.
  • entity (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftEntity.t), default: nil) - Entities in the document.
  • semanticNode (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftSemanticNode.t), default: nil) - The semantic nodes for the document represent arbitrary types of higher-level abstractions beyond entity mention coreference and binary relations between entities. These may include: n-ary relations, semantic frames or events. The semantic nodes for a document are the nodes in a directed acyclic graph, with an adjacency list representation.
  • lastSignificantUpdate (type: String.t, default: nil) - Last significant update of the page content, in the same format as the contentage field, and also derived from ContentAge.last_significant_update in quality/historical/shingle/signals/contentage.proto.
  • token (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken.t), default: nil) - Tokenization of the document.
  • measure (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftMeasure.t), default: nil) - Measures in the documents. This covers both time expressions as well as physical quantities.
  • hyperlink (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftHyperlink.t), default: nil) - The hyperlinks in the document. Multiple hyperlinks are sorted in left-to-right order.
  • annotatedPhrase (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftAnnotatedPhrase.t), default: nil) - Annotated phrases in the document that are not semantically well-defined mentions of entities.
  • contentFirstseen (type: String.t, default: nil) - Stores minimum of first time google successfully crawled a document, or indexed the document with contents (i.e, not roboted). It is stored as the number of seconds since epoch. See quality/historical/signals/firstseen/firstseen.proto
  • contentType (type: integer(), default: nil) - Optional document content_type (from webutil/http/content-type.proto). Used for setting the content_type when converting the SAFT Document to a CompositeDoc. Will be inferred if not given here.
  • entityLabel (type: list(String.t), default: nil) - Entity labels used in this document. This field is used to define labels for the Entity::entity_type_probability field, which contains corresponding probabilities. WARNING: This field is deprecated. go/saft-replace-deprecated-entity-type
  • httpHeaders (type: String.t, default: nil) - HTTP header for document. If the HTTP headers field is set it should be the complete header including the HTTP status line and the trailing cr/nl. HTTP headers are not required to be valid UTF-8. Per the HTTP/1.1 Syntax (RFC7230) standard, non-ASCII octets should be treated as opaque data.
  • topic (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocumentTopic.t), default: nil) -
  • docid (type: String.t, default: nil) - Identifier for document.
  • language (type: integer(), default: nil) - Document language (default is English). This field's value maps cleanly to the i18n.languages.Language proto enum (i18n::languages::Language in C++).
  • text (type: String.t, default: nil) - Raw text contents of document. (In docjoin attachments from the SAFT goldmine annotator this field will be empty.)
  • trace (type: boolean(), default: nil) - Whether to enable component tracing during analysis of this document. See http://go/saft-tracing for details.
  • labeledSpans (type: %{optional(String.t) => GoogleApi.ContentWarehouse.V1.Model.NlpSaftLabeledSpans.t}, default: nil) - Generic labeled spans (produced by the span labeling framework, go/saft-span-labeling). The map key identifies spans of the same type. By convention, it should be of the form "team_name/span_type_name".
  • golden (type: boolean(), default: nil) - Flag for indicating that the document is a gold-standard document. This can be used for putting additional weight on human-labeled documents in contrast to automatically labeled annotations.
  • focusEntity (type: integer(), default: nil) - Focus entity. For lexicon articles, like Wikipedia pages, a document is often about a certain entity. This is the local entity id of the focus entity for the document.
  • constituencyRoot (type: list(integer()), default: nil) - The root node of the constituency tree for each sentence. If non-empty, the list of roots will be aligned with the sentences in the document. Note that some sentences may not have been parsed for various reasons; these sentences will be annotated with placeholder "stub parses". For details, see //nlp/saft/components/constituents/util/stub-parse.h.
  • author (type: list(String.t), default: nil) - Document author(s).
  • syntacticDate (type: String.t, default: nil) - Document's syntactic date (e.g. date explicitly mentioned in the URL of the document or in the document title). It is stored as the number of seconds since epoch. See quality/timebased/syntacticdate/proto/syntactic-date.proto
  • url (type: String.t, default: nil) - Source document URL.
  • privacySensitive (type: boolean(), default: nil) - True if this document contains privacy sensitive data. When the document is transferred in RPC calls the RPC should use SSL_PRIVACY_AND_INTEGRITY security level.
  • subsection (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument.t), default: nil) - Sub-sections for document for dividing a document into volumes, parts, chapters, sections, etc.
  • constituencyNode (type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftConstituencyNode.t), default: nil) - Constituency parse tree nodes for the sentences in this document.
  • rpcError (type: boolean(), default: nil) - True if some RPC which touched this document had an error.
  • title (type: String.t, default: nil) - Optional document title.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument{
  annotatedPhrase:
    [GoogleApi.ContentWarehouse.V1.Model.NlpSaftAnnotatedPhrase.t()] | nil,
  annotations:
    GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
  author: [String.t()] | nil,
  bylineDate: String.t() | nil,
  constituencyNode:
    [GoogleApi.ContentWarehouse.V1.Model.NlpSaftConstituencyNode.t()] | nil,
  constituencyRoot: [integer()] | nil,
  contentFirstseen: String.t() | nil,
  contentType: integer() | nil,
  contentage: String.t() | nil,
  date: String.t() | nil,
  docid: String.t() | nil,
  entity: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftEntity.t()] | nil,
  entityLabel: [String.t()] | nil,
  focusEntity: integer() | nil,
  golden: boolean() | nil,
  httpHeaders: String.t() | nil,
  hyperlink: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftHyperlink.t()] | nil,
  labeledSpans:
    %{
      optional(String.t()) =>
        GoogleApi.ContentWarehouse.V1.Model.NlpSaftLabeledSpans.t()
    }
    | nil,
  language: integer() | nil,
  lastSignificantUpdate: String.t() | nil,
  measure: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftMeasure.t()] | nil,
  privacySensitive: boolean() | nil,
  relation: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftRelation.t()] | nil,
  rpcError: boolean() | nil,
  semanticNode:
    [GoogleApi.ContentWarehouse.V1.Model.NlpSaftSemanticNode.t()] | nil,
  subsection: [t()] | nil,
  syntacticDate: String.t() | nil,
  text: String.t() | nil,
  title: String.t() | nil,
  token: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken.t()] | nil,
  topic: [GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocumentTopic.t()] | nil,
  trace: boolean() | nil,
  url: String.t() | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.