Search results for WebMirror
GoogleApi.ContentWarehouse.V1.Model.CompositeDocAlternateName (module)
...ternatenames instead of canonicals. Alternames in CompositeDoc should come from WebMirror pipeline.
Attributes - GoogleApi.ContentWarehouse.V1.Model.CompositeDocExtraDup (module)
* `ecnFp` (*type:* `String.t`, *default:* `nil`) - Fp96 of webmirror equivalence class as of last time this was exported. * `url` (*type:* `String...
Recommended way of reading: const string& doc_key = cdoc.doc().id().key(); ## CHECK(!doc_key.empty()); More background information can be found in google3/indexing/crawler_id/servingdocumentidentifier.proto The ServingDocumentIdentifier uniquely identifies a document in serving and also distinguishes between experimental vs. production documents. The SDI is also used as an input for the union/muppet key generation in serving. - GoogleApi.ContentWarehouse.V1.Model.GDocumentBase (module)
... `String.t`, *default:* `nil`) - The user agent name used to crawl the URL. See //crawler/engine/webmirror_user_agents.h for the list of user-agents (e.g. crawler::WebmirrorUserAgents::kGoogleBot). NO...
Attributes - GoogleApi.ContentWarehouse.V1.Model.CompositeDocAlternateName (module)
...ebutil/urlencoding * `ecnFp` (*type:* `String.t`, *default:* `nil`) - Fp96 of webmirror equivalence class as of last time this was exported.
GoogleApi.ContentWarehouse.V1.Model.LogsProtoIndexingCrawlerIdCrawlerIdProto (module)
... at //depot/google3/indexing/crawler_id Used within the following components: - WebMirror: To understand the parsed crawler-ID and apply attributes within their own table...
Attributes - GoogleApi.ContentWarehouse.V1.Model.CompositeDocForwardingDup (module)
* `ecn` (*type:* `String.t`, *default:* `nil`) - The name of the url's webmirror equivalence class. * `ecnFp` (*type:* `String.t`, *default:* `nil`) - * `p...
GoogleApi.ContentWarehouse.V1.Model.TrawlerClientServiceInfo (module)
...rve multiple other clients. In this case they can store their client name here. Webmirror may also store the feed name here even though a feed is technically not a servi...
Attributes - GoogleApi.ContentWarehouse.V1.Model.GDocumentBase (module)
... teams should use. * `Repid` (*type:* `String.t`, *default:* `nil`) - is the webmirror representative id of the canonical url. Urls with the same repid are considered...
Attributes - GoogleApi.ContentWarehouse.V1.Model.IndexingConverterLocalizedAlternateName (module)
...ly by URL pattern. * `ecnFp` (*type:* `String.t`, *default:* `nil`) - Fp96 of webmirror ECN as of the last time the canonical was processed. * `feedUrl` (*type:* `St...
here. - GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchReplyDataRedirects (module)
...rawler does not fill this in; this is intended as a placeholder for crawls like webmirror that fill in and want to track this across redirect hops. * `RawTargetUrl` (*...
Attributes - GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchReplyData (module)
...rawler does not fill this in; this is intended as a placeholder for crawls like webmirror that fill in and want to track this across redirect hops. * `RedirectSourceFe...