View Source GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabel (google_api_content_warehouse v0.3.0)

Label identifying a logical part of the page content. This applies mostly at Block level or Paragraph level (but can apply to Words or to arbitrary spans if needed).

Attributes

  • AlternateText (type: String.t, default: nil) - Alternate text for a sequence of the Goodoc, just for the element containing this label, or for a sequence starting from this element to the EndOfSpanningLabel. Typically this is inserted by automatic or manual OCR correction. We use text instead of editing the Goodoc directly since we dont usually have accurate symbol level bboxes for the alternate text. Also the original values from OCR are preserved. It is upto the application to do anything more intelligent like mapping words and finding potential symbol/word bboxes.
  • Attribute (type: list(String.t), default: nil) - Page elements can be given Attributes refining meaning/role. We keep this flexible by using strings instead of pre-determined enum values. But it is useful to list all such Attributes in use in ocr/goodoc/goodoc-semantics-attributes.h
  • ChapterStart (type: boolean(), default: nil) - Blocks that are at the beginning of chapters have this set:
  • CleanupAnnotation (type: list(integer()), default: nil) -
  • ContinuesFromPreviousPage (type: boolean(), default: nil) -
  • ContinuesFromPreviousPageHyphenated (type: boolean(), default: nil) - When ContinuesFromPreviousPage=true, this bit can be set to note that the word fragment on the previous page ends in a hyphen.
  • ContinuesOnNextPage (type: boolean(), default: nil) - Paragraphs that span across pages can be identified with the following flags. Note that flows just connect Blocks across pages. These continuation flags imply something more specific -- the case of a single logical paragraph split over pages. Only the last Paragraph in the last Block within a given FlowThread() on a page can have ContinuesOnNextPage set. Similarly, only the first Paragraph in the first Block with a given FlowThread() on a page may have ContinuesFromPreviousPage set.
  • EndOfSpanningLabel (type: GoogleApi.ContentWarehouse.V1.Model.GoodocLogicalEntity.t, default: nil) - Normally, a SemanticLabel applies exactly to the goodoc element that it is contained in (usually Block or Paragraph, sometimes Word). Occasionally, we need a SemanticLabel to span across the boundary or end before the boundary. For example, a URL may just be a few words within a Paragraph. In such cases, the SemanticLabel is added to the first element of the span and contains this LogicalEntity pointing to the last element of the span:
  • ExperimentalData (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t, default: nil) - Message set for experimental algorithm data. Use case: We keep a set of features that was computed for the unsupervised caption extraction and store it here. Agora question producer will consume this message set to be embedded in a question. The experimental feature set can then be used later to pair up with ground truth labels for designing a supervised algorithm. Currently holding: o ocean/analysis/content/caption_data.proto's TextualElement
  • Flow (type: String.t, default: nil) - Flow identifies a single sequential unit of text (or other content). It is only set on Blocks -- a flow identifies a sequence of Blocks. The default, main flow is just the empty string. The "FlowThread" of a block is the flow (if non-empty), suffixed with the block appearance. This is computed by GoodocUtils::FlowThread(). Paragraphs may be split over blocks in the same FlowThread, across pages. The following table shows how FlowThread gets computed: ## Flow Appearance FlowThread (empty) UNSPECIFIED "UNSPECIFIED" foo BODY "foo:BODY" Please use lower-case strings for flows (such as article-33-box). One useful way to think of flows is this: A logical unit of interest in a a Document (for example, an article) would be identified by a starting block, an ending block, and a list of flows of interest within the [start, end) span. message Article { (page#, block#): article_start; (page#, block#): article_end; repeated string flows; } The reading order of blocks, paragraphs/etc within this article would be the same order as present in the goodoc itself. Some applications (such as rendering) may want to process the article by running over all the flows together, others (such as indexing) may want to deal with the FlowThreads one after the other.
  • ModificationRecord (type: String.t, default: nil) - This field can be used to record the steps by which AlternateText for a sequence of the Goodoc is generated.
  • PageNumberOrdinal (type: GoogleApi.ContentWarehouse.V1.Model.GoodocOrdinal.t, default: nil) - If Appearence is PAGE_NUMBER:
  • appearance (type: integer(), default: nil) -
  • columndetails (type: GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelColumnDetails.t, default: nil) -
  • contentlink (type: GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelContentLink.t, default: nil) -
  • editcorrectioncandidate (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelEditCorrectionCandidate.t), default: nil) -
  • overrides (type: GoogleApi.ContentWarehouse.V1.Model.GoodocOverrides.t, default: nil) - Structure overrides: typically manual corrections to goodoc renderings.
  • snippetfilter (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelSnippetFilter.t), default: nil) -
  • tablecelldetails (type: GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelTableCellDetails.t, default: nil) -
  • tabledetails (type: GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelTableDetails.t, default: nil) -

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabel{
  AlternateText: String.t() | nil,
  Attribute: [String.t()] | nil,
  ChapterStart: boolean() | nil,
  CleanupAnnotation: [integer()] | nil,
  ContinuesFromPreviousPage: boolean() | nil,
  ContinuesFromPreviousPageHyphenated: boolean() | nil,
  ContinuesOnNextPage: boolean() | nil,
  EndOfSpanningLabel:
    GoogleApi.ContentWarehouse.V1.Model.GoodocLogicalEntity.t() | nil,
  ExperimentalData:
    GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
  Flow: String.t() | nil,
  ModificationRecord: String.t() | nil,
  PageNumberOrdinal:
    GoogleApi.ContentWarehouse.V1.Model.GoodocOrdinal.t() | nil,
  appearance: integer() | nil,
  columndetails:
    GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelColumnDetails.t()
    | nil,
  contentlink:
    GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelContentLink.t() | nil,
  editcorrectioncandidate:
    [
      GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelEditCorrectionCandidate.t()
    ]
    | nil,
  overrides: GoogleApi.ContentWarehouse.V1.Model.GoodocOverrides.t() | nil,
  snippetfilter:
    [GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelSnippetFilter.t()]
    | nil,
  tablecelldetails:
    GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelTableCellDetails.t()
    | nil,
  tabledetails:
    GoogleApi.ContentWarehouse.V1.Model.GoodocSemanticLabelTableDetails.t()
    | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.