View Source GoogleApi.ContentWarehouse.V1.Model.CompositeDocIndexingInfo (google_api_content_warehouse v0.4.0)

Contains information mostly used within indexing (e.g. not used for building the production serving shards). Most of this data is generated only in Alexandria, however there are exceptions.

Attributes

  • cdocBuildInfo (type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerCDocBuildInfo.t, default: nil) - To hold extra info for building a final cdoc from raw cdoc and goldmine annotations.
  • contentProtected (type: boolean(), default: nil) - Whether current page is under content protection, i.e. a page has been crawled as an error page, but we preserve its last known good content and keep its crawl_status as converter.CrawlStatus::CONTENT.
  • convertToRobotedReason (type: integer(), default: nil) - If set, indicates that the crawl status was converted to ROBOTED for the reason specified by the enum value in converter.RobotedReasons.ConvertToRobotedReasons. See indexing/converter/proto/converter.proto for details. If unset, then the document was not converted to roboted, and if the document crawl status is ROBOTED, then the document is disallowed (at least to Google) in robots.txt.
  • crawlStatus (type: integer(), default: nil) - One of the enum values in converter.CrawlStatus.State (see indexing/converter/proto/converter.proto for details). Default is converter.CrawlStatus::CONTENT. The document is roboted if the value is converter.CrawlStatus::ROBOTED.
  • demotionTags (type: list(String.t), default: nil) -
  • errorType (type: integer(), default: nil) - One of the enum values in converter.ErrorPageType (see indexing/converter/proto/error-page-detector-enum.proto for detail). Default is converter::ERROR_PAGE_NONE.
  • freshdocsCorpora (type: list(String.t), default: nil) -
  • hostid (type: String.t, default: nil) - The host id of the document. Used chiefly to determine whether the document is part of a parked domain.
  • ieIdentifier (type: String.t, default: nil) - A short descriptive string to help identify the IE application or setup where this CDoc is generated. For example: websearch_m3 This field is for debuggability purposes.
  • imageIndexingInfo (type: GoogleApi.ContentWarehouse.V1.Model.ImageSearchImageIndexingInfo.t, default: nil) - Indexing info about images (i.e. image links missing image data, etc).
  • indexingTs (type: String.t, default: nil) - The timestamp (the time since the Epoch, in microseconds) when the docjoin is exported from indexing. The main purpose of this field is to identify different versions of the same document.
  • noLongerCanonicalTimestamp (type: String.t, default: nil) - If set, the timestamp in microseconds when the URL stopped being canonical. This should never be set for exported canonical documents. This field is used by dups during canonical flip, and by webmain when doc selection switched between desktop and mobile. Union respects this timestamp to prevent old doc being deleted until the new doc is picked up
  • normalizedClickScore (type: number(), default: nil) - This score is calculated by re-mapping the back onto the partition's score distribution, such that the score represents the score of the equivalently ranked organically-selected document.
  • primaryVertical (type: String.t, default: nil) - Vertical membership of the document. - primary_vertical is the vertical that initiated indexing of this document (or empty if the vertical was websearch). - verticals is the full list of verticals that contained this document (excluding websearch) at indexing time. primary_vertical may or may not be an element of verticals because of vertical membership skew between the ingestion time and indexing time. See go/one-indexing-for-web for more background.
  • rawNavboost (type: integer(), default: nil) - The raw navboost count for the canonical url without aggregating the navboost from dup urls. This field is used when building forwarding map.
  • rowTimestamp (type: String.t, default: nil) - The timestamp (the time since the Epoch, in microseconds) to represent doc version, which is used in the downstream processing after Raffia. If it's not set, indexing_ts will be used as row_timestamp. The timestamp is generally set by reprocessing to set slightly newer indexing_ts such that the system can respect the reprocessed version to overwrite old data in storage.
  • selectionTierRank (type: number(), default: nil) - Selection tier rank is a language normalized score ranging from 0-1 over the serving tier (Base, Zeppelins, Landfills) for this document.
  • tracingId (type: list(String.t), default: nil) - The tracing ids is to label the version of url for url status tracking. This repeated field will carry at most 10 tracing id. See more details in go/rich-tracing-design There will be less than 2% base+uz cdocs carrying this field. The major sources of tracing ids include: Indexing API pushed urls Index Metrics sampling urls The tracing ids will be written into cdocs by Webmain Ramifier. The consumer of the tracing ids is Union serving notification collector see more at go/serving-notification-from-union
  • urlChangerate (type: GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlChangerate.t, default: nil) - Changerate information for this doc (see crawler/changerate/changerate.proto for details).
  • urlHistory (type: GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlHistory.t, default: nil) - Url change history for this doc (see crawler/changerate/changerate.proto for details). Note if a doc has more than 20 changes, we only keep the last 20 changes here to avoid adding to much data in its docjoin.
  • urlPatternSignals (type: GoogleApi.ContentWarehouse.V1.Model.IndexingSignalAggregatorUrlPatternSignals.t, default: nil) - UrlPatternSignals for this doc, used to compute document score in LTG (see indexing/signal_aggregator/proto/signal-aggregator.proto for details).
  • verticals (type: list(String.t), default: nil) -
  • videoIndexingInfo (type: GoogleApi.ContentWarehouse.V1.Model.ImageRepositoryVideoIndexingInfo.t, default: nil) - Indexing info about videos.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.CompositeDocIndexingInfo{
  cdocBuildInfo:
    GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerCDocBuildInfo.t() | nil,
  contentProtected: boolean() | nil,
  convertToRobotedReason: integer() | nil,
  crawlStatus: integer() | nil,
  demotionTags: [String.t()] | nil,
  errorType: integer() | nil,
  freshdocsCorpora: [String.t()] | nil,
  hostid: String.t() | nil,
  ieIdentifier: String.t() | nil,
  imageIndexingInfo:
    GoogleApi.ContentWarehouse.V1.Model.ImageSearchImageIndexingInfo.t() | nil,
  indexingTs: String.t() | nil,
  noLongerCanonicalTimestamp: String.t() | nil,
  normalizedClickScore: number() | nil,
  primaryVertical: String.t() | nil,
  rawNavboost: integer() | nil,
  rowTimestamp: String.t() | nil,
  selectionTierRank: number() | nil,
  tracingId: [String.t()] | nil,
  urlChangerate:
    GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlChangerate.t() | nil,
  urlHistory:
    GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlHistory.t() | nil,
  urlPatternSignals:
    GoogleApi.ContentWarehouse.V1.Model.IndexingSignalAggregatorUrlPatternSignals.t()
    | nil,
  verticals: [String.t()] | nil,
  videoIndexingInfo:
    GoogleApi.ContentWarehouse.V1.Model.ImageRepositoryVideoIndexingInfo.t()
    | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.