View Source GoogleApi.ContentWarehouse.V1.Model.GoodocWord (google_api_content_warehouse v0.3.0)
A word representation
Attributes
-
Baseline
(type:integer()
, default:nil
) - The baseline's y-axis offset from the bottom of the word's bounding box, given in pixels. (A value of 2, for instance, indicates the baseline is 2px above the bottom of the box.) -
Box
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t
, default:nil
) - -
Capline
(type:integer()
, default:nil
) - The capline is the y-axis offset from the top of the word bounding box. A positive value n indicates that capline is n-pixels above the top of this word. -
CompactSymbolBoxes
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoxPartitions.t
, default:nil
) - For space efficiency, we sometimes skip the detailed per-symbol bounding boxes in Symbol.Box, and use this coarser representation instead, where we just store Symbol boundaries within the Word box. Most client code should not have to worry directly about this, it should be handled in the deepest layers of writing/reading goodocs (for example, see Compress() and Uncompress() in ocean/goodoc/goovols-bigtable-volume.h). Note(viresh): I experimented with this compression, and here are some numbers for reference. If the zlib-compressed page goodoc string size was 100 to start with, then this compaction makes it 65. As a possible future relaxation to consider: if we add in, for each symbol, a "top" and "bottom" box offset then the size would be 75 (that's with "repeated int32 top/bottom_offset" fields inside BoxPartitions, instead of inside each symbol). -
Confidence
(type:integer()
, default:nil
) - Word recognition confidence. Range depends upon OCR Engine. -
IsFromDictionary
(type:boolean()
, default:nil
) - word. The meaning and range depends on the OCR engine or subsequent processing. Specifies whether the word was found -
IsIdentifier
(type:boolean()
, default:nil
) - a number True if word represents -
IsLastInSentence
(type:boolean()
, default:nil
) - True if the word is the last word in any sub-paragraph unit that functions at the same level of granularity as a sentence. Examples: "She hit the ball." (regular sentence) "Dewey defeats Truman" (heading) "The more, the merrier." (no verb) Note: not currently used. Code to set this was introduced in CL 7038338 and removed in OCL=10678722. -
IsNumeric
(type:boolean()
, default:nil
) - in the dictionary True if the word represents -
Label
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocLabel.t
, default:nil
) - -
Penalty
(type:integer()
, default:nil
) - Penalty for discordance of characters in a -
RotatedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocRotatedBoundingBox.t
, default:nil
) - If RotatedBox is set, Box must be set as well. See RotatedBoundingBox. -
Symbol
(type:list(GoogleApi.ContentWarehouse.V1.Model.GoodocSymbol.t)
, default:nil
) - Word characters, the text may -
alternates
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocWordAlternates.t
, default:nil
) - -
text
(type:String.t
, default:nil
) - As a shortcut, the content API provides the text of words instead of individual symbols (NOTE: this is experimental). This is UTF8. And the main font for the word is stored in Label.CharLabel. -
writingDirection
(type:String.t
, default:nil
) - Writing direction for this word.
Summary
Functions
Unwrap a decoded JSON object into its complex fields.
Types
@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocWord{ Baseline: integer() | nil, Box: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil, Capline: integer() | nil, CompactSymbolBoxes: GoogleApi.ContentWarehouse.V1.Model.GoodocBoxPartitions.t() | nil, Confidence: integer() | nil, IsFromDictionary: boolean() | nil, IsIdentifier: boolean() | nil, IsLastInSentence: boolean() | nil, IsNumeric: boolean() | nil, Label: GoogleApi.ContentWarehouse.V1.Model.GoodocLabel.t() | nil, Penalty: integer() | nil, RotatedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocRotatedBoundingBox.t() | nil, Symbol: [GoogleApi.ContentWarehouse.V1.Model.GoodocSymbol.t()] | nil, alternates: GoogleApi.ContentWarehouse.V1.Model.GoodocWordAlternates.t() | nil, text: String.t() | nil, writingDirection: String.t() | nil }
Functions
Unwrap a decoded JSON object into its complex fields.