View Source GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken (google_api_content_warehouse v0.4.0)

A document token marks a span of bytes in the document text as a token or word. Next available index: 16.

Attributes

  • breakLevel (type: String.t, default: nil) -
  • breakSkippedText (type: boolean(), default: nil) - Whether the break skipped over non-tag text (excluding script/style).
  • category (type: String.t, default: nil) - Coarse-grained word category for token. See README.categories for category inventory.
  • end (type: integer(), default: nil) -
  • head (type: integer(), default: nil) - Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.
  • info (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t, default: nil) - Annotation for this token.
  • label (type: String.t, default: nil) - Label for dependency relation between this token and its head. See README.labels for label inventory.
  • lemma (type: String.t, default: nil) - Word lemma. This is only filled if the lemma is different from the word form.
  • morph (type: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t, default: nil) - Morphology information.
  • scriptCode (type: String.t, default: nil) - A string representation (typically four letters, sometimes longer) of the token's Unicode script code, based on BCP 47/CLDR, capitalized according to ISO 15924. See i18n/identifiers/scriptcode.h for details.
  • start (type: integer(), default: nil) - [start, end] describe the inclusive byte range of the UTF-8 encoded token in document.text. End gives the index of the last byte, which may be a UTF-8 continuation byte, and the length in bytes is end - start + 1. begin/end options are for goldmine AnnotationsFinder to locate the offsets of saft tokens. Start is inclusive by default and end is marked.
  • tag (type: String.t, default: nil) - Part-of-speech tag for token. See README.tags for tag inventory.
  • tagConfidence (type: number(), default: nil) - Confidence score for the tag prediction -- should be interpreted as a probability estimate that the tag is correct.
  • textProperties (type: integer(), default: nil) -
  • word (type: String.t, default: nil) - Token word form. This may not be identical to the original. For example, in goldmine annotation we do UTF-8 normalization and punctuation normalization. The punctuation normalization includes inferring the directionality of straight doublequotes -- that is, we map " to open quote (``) or close quote (''), and sometimes we get it wrong. SAFT processing in other contexts (such as queries in qrewrite) involves different normalizations.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken{
  breakLevel: String.t() | nil,
  breakSkippedText: boolean() | nil,
  category: String.t() | nil,
  end: integer() | nil,
  head: integer() | nil,
  info: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
  label: String.t() | nil,
  lemma: String.t() | nil,
  morph: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t() | nil,
  scriptCode: String.t() | nil,
  start: integer() | nil,
  tag: String.t() | nil,
  tagConfidence: number() | nil,
  textProperties: integer() | nil,
  word: String.t() | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.