View Source GoogleApi.ContentWarehouse.V1.Model.GDocumentBase (google_api_content_warehouse v0.3.0)

Next id: 127

Attributes

  • ContentExpiryTime (type: integer(), default: nil) - unix secs from epoch
  • DisplayUrl (type: String.t, default: nil) - Sometimes the URL displayed in search results should be different from what gets indexed (e.g. in enterprise, content management systems). If this value is not set, we default to the regular URL.
  • DocId (type: String.t, default: nil) - 64-bit docid of the document (usually fingerprint of URL, but not always). WARNING: This does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above.
  • ExternalFeedMetadata (type: String.t, default: nil) -
  • ExternalHttpMetadata (type: String.t, default: nil) - Enterprise-specific external metadata. See http://engdoc/eng/designdocs/enterprise/enterprise_indexing_metadata.html
  • FilterForSafeSearch (type: integer(), default: nil) - Deprecated, do not use, this field is not populated since 2012.
  • IPAddr (type: String.t, default: nil) - IP addr in binary (allows for IPv6)
  • NoArchiveReason (type: integer(), default: nil) -
  • NoFollowReason (type: integer(), default: nil) -
  • NoImageIndexReason (type: integer(), default: nil) -
  • NoImageframeOverlayReason (type: integer(), default: nil) -
  • NoIndexReason (type: integer(), default: nil) - When these reasons are set to a non zero value, the document should not be indexed, or show a snippet, or show a cache, etc. These reasons are bit maps of indexing.converter.RobotsInfo.RobotedReasons enum values reflecting the places where the restriction was found.
  • NoPreviewReason (type: integer(), default: nil) -
  • NoSnippetReason (type: integer(), default: nil) -
  • NoTranslateReason (type: integer(), default: nil) -
  • Pagerank (type: integer(), default: nil) - This field is long-deprecated in favour of Pagerank_NS, it is no longer maintained and can break at any moment.
  • PagerankNS (type: integer(), default: nil) - Pagerank-NearestSeeds is a pagerank score for the doc, calculated using NearestSeeds method. This is the production PageRank value teams should use.
  • Repid (type: String.t, default: nil) - is the webmirror representative id of the canonical url. Urls with the same repid are considered as dups in webmirror. WARNING: use this field with caution! The webmirror duprules change frequently, so this value only reflects the duprules at the time when the canonical's docjoin is built.
  • ScienceMetadata (type: GoogleApi.ContentWarehouse.V1.Model.ScienceCitation.t, default: nil) - Citation data for science articles.
  • URL (type: String.t, default: nil) - WARNING: the URL does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above. Reason: foo.bar:/http and foo.bar:/http:SMARTPHONE share the same URL, but the body of the two documents might differ because of different crawl-context (desktop vs. smartphone in this example).
  • URLAfterRedirects (type: String.t, default: nil) -
  • URLEncoding (type: integer(), default: nil) - See webutil/urlencoding
  • content (type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseContent.t, default: nil) -
  • directory (type: list(GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseDirectory.t), default: nil) -
  • ecnFp (type: String.t, default: nil) - 96-bit fingerprint of the canonical url's webmirror equivalence class name as of when this cdoc was exported.
  • id (type: GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier.t, default: nil) - The primary identifier of a production document is the document key given in the ServingDocumentIdentifier, which is the same as the row-key in Alexandria, and represents a URL and its crawling context. In your production code, please always assume that the document key is the only way to uniquely identify a document. ## Recommended way of reading: const string& doc_key = cdoc.doc().id().key(); ## CHECK(!doc_key.empty()); More background information can be found in google3/indexing/crawler_id/servingdocumentidentifier.proto The ServingDocumentIdentifier uniquely identifies a document in serving and also distinguishes between experimental vs. production documents. The SDI is also used as an input for the union/muppet key generation in serving.
  • localsearchDocInfo (type: GoogleApi.ContentWarehouse.V1.Model.LocalsearchDocInfo.t, default: nil) - Localsearch-specific data.
  • oceanDocInfo (type: GoogleApi.ContentWarehouse.V1.Model.OceanDocInfo.t, default: nil) - Ocean-specific data.
  • originalcontent (type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseOriginalContent.t, default: nil) -
  • userAgentName (type: String.t, default: nil) - The user agent name used to crawl the URL. See //crawler/engine/webmirror_user_agents.h for the list of user-agents (e.g. crawler::WebmirrorUserAgents::kGoogleBot). NOTE: This field is copied from the first WEBMIRROR FetchReplyClientInfo in trawler_fetch_info column. We leave this field unpopulated if no WEBMIRROR FecthReplyClientInfo is found. As the submission of cl/51488336, Alexandria starts to populate this field. However, docjoins from freshdocs (or any other source), won't have this field populated, because we believe no one needs to read this field from freshdocs docjoins.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GDocumentBase{
  ContentExpiryTime: integer() | nil,
  DisplayUrl: String.t() | nil,
  DocId: String.t() | nil,
  ExternalFeedMetadata: String.t() | nil,
  ExternalHttpMetadata: String.t() | nil,
  FilterForSafeSearch: integer() | nil,
  IPAddr: String.t() | nil,
  NoArchiveReason: integer() | nil,
  NoFollowReason: integer() | nil,
  NoImageIndexReason: integer() | nil,
  NoImageframeOverlayReason: integer() | nil,
  NoIndexReason: integer() | nil,
  NoPreviewReason: integer() | nil,
  NoSnippetReason: integer() | nil,
  NoTranslateReason: integer() | nil,
  Pagerank: integer() | nil,
  PagerankNS: integer() | nil,
  Repid: String.t() | nil,
  ScienceMetadata:
    GoogleApi.ContentWarehouse.V1.Model.ScienceCitation.t() | nil,
  URL: String.t() | nil,
  URLAfterRedirects: String.t() | nil,
  URLEncoding: integer() | nil,
  content: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseContent.t() | nil,
  directory:
    [GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseDirectory.t()] | nil,
  ecnFp: String.t() | nil,
  id:
    GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier.t()
    | nil,
  localsearchDocInfo:
    GoogleApi.ContentWarehouse.V1.Model.LocalsearchDocInfo.t() | nil,
  oceanDocInfo: GoogleApi.ContentWarehouse.V1.Model.OceanDocInfo.t() | nil,
  originalcontent:
    GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseOriginalContent.t() | nil,
  userAgentName: String.t() | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.