View Source GoogleApi.ContentWarehouse.V1.Model.PerDocData (google_api_content_warehouse v0.4.0)

=========================================================================== # Make sure you read the comments in the bottom before you add any new field. NB: As noted in the comments, this protocol buffer is used in both indexing and serving. In mustang serving implementations we only decode perdocdata during the search phase, and so this protocol should only contain data used during search. See mustang/repos_www/attachments.proto:{MustangBasicInfo,MustangContentInfo} for protocols used during search and/or docinfo. Next available tag deprecated, use this (and look for commented out fields): blaze-bin/net/proto_compiler/protocol-compiler --freetags \ indexer/perdocdata/perdocdata.proto Next tag: 225

Attributes

  • scienceDoctype (type: integer(), default: nil) - Scholar/Science Document type: <0 == not a Science Document -- default 0 == Science doc fully visible >0 == Science doc but limited visibility, the number is the visible terms
  • ScaledExptIndyRank2 (type: integer(), default: nil) - experimental
  • videoLanguage (type: GoogleApi.ContentWarehouse.V1.Model.QualityVidyaVideoLanguageVideoLanguage.t, default: nil) - Audio-based language classified by Automatic Language Identification (only for watch pages).
  • phildata (type: GoogleApi.ContentWarehouse.V1.Model.PhilPerDocData.t, default: nil) -
  • uacSpamScore (type: integer(), default: nil) - The uac spam score is represented in 7 bits, going from 0 to 127. Threshold is 64. Score >= 64 is considered as uac spam.
  • DEPRECATEDAuthorObfuscatedGaia (type: list(String.t), default: nil) - The obfuscated google profile gaia id(s) of the author(s) of the document. This field is deprecated, use the string version.
  • spamtokensContentScore (type: number(), default: nil) - For SpamTokens content scores. Used in SiteBoostTwiddler to determine whether a page is UGC Spam. See go/spamtokens-dd for details.
  • webrefEntities (type: GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefWebrefMustangAttachment.t, default: nil) - WebRef entities associated to the document. See go/webref for details.
  • PremiumData (type: GoogleApi.ContentWarehouse.V1.Model.PremiumPerDocData.t, default: nil) - Additional metadata for Premium document in the Google index.
  • spamMuppetSignals (type: GoogleApi.ContentWarehouse.V1.Model.SpamMuppetjoinsMuppetSignals.t, default: nil) - Contains hacked site signals which will be used in query time joins. As of Oct'19, the field is stored in a separate corpus. It'll only be populated for in-flight requests between retrieve and full-score in perdocdata. So no extra storage is needed on muppet side.
  • knexAnnotation (type: GoogleApi.ContentWarehouse.V1.Model.SocialPersonalizationKnexAnnotation.t, default: nil) - For indexing k'nex annotations for FreshDocs.
  • smartphoneData (type: GoogleApi.ContentWarehouse.V1.Model.SmartphonePerDocData.t, default: nil) - Additional metadata for smartphone documents in the Google index.
  • semanticDateConfidence (type: integer(), default: nil) - DEPRECATED: semantic_date_confidence replaced by semantic_date_info.
  • trendspamScore (type: integer(), default: nil) - For now, the count of matching trendspam queries.
  • ScaledSpamScoreYoram (type: integer(), default: nil) - Spamscores are represented as a 7-bit integer, going from 0 to 127.
  • numUrls (type: integer(), default: nil) - Total number of urls encoded in the url section = # of alternate urls + 1
  • datesInfo (type: String.t, default: nil) - Stores dates-related info (e.g. page is old based on its date annotations). Used in FreshnessTwiddler. Use encode/decode functions from quality/timebased/utils/dates-info-helper-inl.h
  • pagerank2 (type: number(), default: nil) -
  • nsrDataProto (type: GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData.t, default: nil) - Stripped site-level signals, not present in the explicit nsr_* fields, nor compressed_quality_signals.
  • fringeQueryPrior (type: GoogleApi.ContentWarehouse.V1.Model.QualityFringeFringeQueryPriorPerDocData.t, default: nil) - Contains encoded FringeQueryPrior information. Unlikely to be meaningful for anyone other than fringe-ranking team. Contact fringe-ranking team if any questions, but do NOT use directly without consulting them.
  • kaltixdata (type: GoogleApi.ContentWarehouse.V1.Model.KaltixPerDocData.t, default: nil) -
  • ymylHealthScore (type: integer(), default: nil) - Stores scores of ymyl health classifier as defined at go/ymyl-classifier-dd. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_nfg9oAldou.
  • authorObfuscatedGaiaStr (type: list(String.t), default: nil) -
  • lastSignificantUpdate (type: String.t, default: nil) - Last significant update of the document. This is sourced from the quality_timebased.LastSignificantUpdate proto as computed by the LSUSelector from various signals. The value is a UNIX timestamp in seconds.
  • spambrainData (type: GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.t, default: nil) - Host-v1 sitechunk level scores coming from spambrain.
  • DEPRECATEDQuarantineWhitelist (type: boolean(), default: nil) -
  • tundraClusterId (type: integer(), default: nil) - This field is propagated to shards. Stores clustering information on a site level for the Tundra project. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
  • bodyWordsToTokensRatioTotal (type: number(), default: nil) -
  • homepagePagerankNs (type: integer(), default: nil) - The page-rank of the homepage of the site. Copied from the cdoc.doc().pagerank_ns() of the homepage.
  • topPetacatTaxId (type: integer(), default: nil) - Top petacat of the site. Used in SiteboostTwiddler to determine result/query matching.
  • OriginalContentScore (type: integer(), default: nil) - The original content score is represented as a 7-bits, going from 0 to 127. Only pages with little content have this field. The actual original content score ranges from 0 to 512. It is encoded with quality_q2::OriginalContentUtil::EncodeOriginalContentScore(). To decode the value, use quality_q2::OriginalContentUtil::DecodeOriginalContentScore().
  • contentAttributions (type: GoogleApi.ContentWarehouse.V1.Model.ContentAttributions.t, default: nil) -
  • webmirrorEcnFp (type: String.t, default: nil) -
  • DocLevelSpamScore (type: integer(), default: nil) - The document spam score is represented as a 7-bits, going from 0 to 127.
  • urlPoisoningData (type: GoogleApi.ContentWarehouse.V1.Model.UrlPoisoningData.t, default: nil) - Contains url poisoning data for suppressing spam documents.
  • Event (type: list(GoogleApi.ContentWarehouse.V1.Model.PerDocDebugEvent.t), default: nil) - Free form debug info. NB2: consider carefully what to save here. It's easy to eat lots of gfs space with debug info that nobody needs...
  • mediaOrPeopleEntities (type: GoogleApi.ContentWarehouse.V1.Model.ImageQualitySensitiveMediaOrPeopleEntities.t, default: nil) - Contains the mids of the 5 most topical entities annotated with selected KG collections. This information is currently used on Image Search to detect cases where results converged to mostly a single person or media entity. More details: go/result-set-convergence.
  • scaledSelectionTierRank (type: integer(), default: nil) - Selection tier rank is a language normalized score ranging from 0-32767 over the serving tier (Base, Zeppelins, Landfills) for this document. This is converted back to fractional position within the index tier by scaled_selection_tier_rank/32767.
  • pageTags (type: list(integer()), default: nil) -
  • smearingMaxTotalOffdomainAnchors (type: integer(), default: nil) -
  • pagerank (type: number(), default: nil) - Experimental pageranks (DEPRECATED; only pagerank in MustangBasicInfo is used).
  • QuarantineInfo (type: integer(), default: nil) - bitmask of QuarantineBits (or'd together) used to store quarantine related information. For example: QUARANTINE_WHITELIST | QUARANTINE_URLINURL.

  • rosettaLanguages (type: list(String.t), default: nil) - Top two document language BCP-47 codes as generated by the RosettaLanguageAnnotator in the decreasing order of probability.
  • freshnessEncodedSignals (type: String.t, default: nil) - Stores freshness and aging related data, such as time-related quality metrics predicted from url-pattern level signals. Use the encoding decoding API in quality/freshness/docclassifier/aging/encoded-pattern-signals.h This field is deprecated.
  • imagedata (type: GoogleApi.ContentWarehouse.V1.Model.ImagePerDocData.t, default: nil) -
  • videoCorpusDocid (type: String.t, default: nil) -
  • queriesForWhichOfficial (type: GoogleApi.ContentWarehouse.V1.Model.OfficialPagesQuerySet.t, default: nil) - The set of (query, country, language) triples for which this document is considered to be the official page. For example, www.britneyspears.com would be official for ("britney spears", "us", 0) and others (0 is English).
  • nsrIsCovidLocalAuthority (type: boolean(), default: nil) - This field is propagated to shards. In addition, it is populated at serving time by go/web-signal-joins. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
  • crawlerIdProto (type: GoogleApi.ContentWarehouse.V1.Model.LogsProtoIndexingCrawlerIdCrawlerIdProto.t, default: nil) - For crawler-ID variations, the crawling context applied to the document. See go/url, and the description in google3/indexing/crawler_id
  • ScaledSpamScoreEric (type: integer(), default: nil) -
  • biasingdata (type: GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData.t, default: nil) -
  • ScaledExptSpamScoreEric (type: integer(), default: nil) -
  • v2KnexAnnotation (type: GoogleApi.ContentWarehouse.V1.Model.QualitySherlockKnexAnnotation.t, default: nil) - For indexing v2 k'nex, see/go/knex-v2-doc-annotation for details.
  • MobileData (type: GoogleApi.ContentWarehouse.V1.Model.MobilePerDocData.t, default: nil) - Additional metadata for lowend mobile documents in the Google index.
  • BookCitationData (type: GoogleApi.ContentWarehouse.V1.Model.BookCitationPerDocData.t, default: nil) - the book citation data for each web page, the average size is about 10 bytes
  • semanticDate (type: integer(), default: nil) - SemanticDate, estimated date of the content of a document based on the contents of the document (via parsing), anchors and related documents. Date is encoded as a 32-bits UNIX date (1970 Jan 1 epoch). Confidence is encoded using a SemanticDate specific format. For details of encoding, please refer to quality/freshness/docclassifier/semanticdate/public/semantic_date.proto
  • biasingdata2 (type: GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData2.t, default: nil) - A replacement for BiasingPerDocData that is more space efficient. Once this is live everywhere, biasingdata will be deprecated.
  • ymylNewsScore (type: integer(), default: nil) - Stores scores of ymyl news classifier as defined at go/ymyl-classifier-dd. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_nfg9oAldou.
  • saftLanguageInt (type: list(integer()), default: nil) - Top document language as generated by SAFT LangID. For now we store bare minimum: just the top 1 language value, converted to the language enum, and only when different from the first value in 'languages'.
  • rsApplication (type: GoogleApi.ContentWarehouse.V1.Model.RepositoryAnnotationsRdfaRdfaRichSnippetsApplication.t, default: nil) - Application information associated to the document.
  • domainAge (type: integer(), default: nil) - 16-bit
  • lastSignificantUpdateInfo (type: String.t, default: nil) - Metadata about last significant update. Currently this only encodes the quality_timebased.LastSignificantUpdate.source field which contains the info on the source of the signal. NOTE: Please do not read the value directly. Use helpers from quality/timebased/lastsignificantupdate/lsu-helper.h instead.
  • pagerank1 (type: number(), default: nil) -
  • spamCookbookAction (type: GoogleApi.ContentWarehouse.V1.Model.SpamCookbookAction.t, default: nil) - Actions based on Cookbook recipes that match the page.
  • compressedUrl (type: String.t, default: nil) - Compressed URL string used for SETI.
  • extraData (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t, default: nil) - This field is available only in the docjoins: it is cleared before building per-doc data in both Mustang and Teragoogle. (MessageSet is inefficient in space for serving data) Use this for all new fields that aren't needed during serving. Currently this field contains: UrlSignals for the document level spam classifier (when the doclevelspamscore is set). PerDocLangidData and realtimespam::ClassifierResult for the document level fresh spam classifier (when the doc-level fresh spam score is generated). MicroblogDocQualitySignals for document-level microblog spam classifier. This only exists in Firebird for now. spam_buckets::BucketsData for a document-structure hash This field is non-personal since the personal fields in MessageSet are not populated in production.
  • socialgraphNodeNameFp (type: String.t, default: nil) - For Social Search we store the fingerprint of the SG node name. This is used in one of the superroot's PRE_DOC twiddlers as a lookup key for the full Social Search data. PRE_DOC = twiddlers firing before the DocInfo request is sent to the mustang backend.
  • urlAfterRedirectsFp (type: String.t, default: nil) - These two fingerprints are used for de-duping results in a twiddler. They should only be populated by freshdocs, and will only be present for documents that are chosen to be canonicals in a cluster whose previous canonical is also in the index. Additionally, url_after_redirects_fp is only present if it is different from a fingerprint of the URL.
  • localizedCluster (type: GoogleApi.ContentWarehouse.V1.Model.IndexingDupsLocalizedLocalizedCluster.t, default: nil) - Information on localized clusters, which is the relationship of translated and/or localized pages.
  • pageregions (type: String.t, default: nil) - String that encodes the position ranges for different regions of the document. See "indexer/pageregion.h" for an explanation, and how to decode the string
  • KeywordStuffingScore (type: integer(), default: nil) - The keyword stuffing score is represented in 7 bits, going from 0 to 127.
  • spambrainTotalDocSpamScore (type: number(), default: nil) - The document total spam score identified by spambrain, going from 0 to 1.
  • noimageframeoverlayreason (type: integer(), default: nil) - If not 0, we should not show the image in overlay mode in image snippets
  • scienceHoldingsIds (type: list(String.t), default: nil) - Deprecated 2016/01/14.
  • crawlPagerank (type: integer(), default: nil) - This field is used internally by the docjoiner to forward the crawl pageranks from original canonicals to canonicals we actually chose; outside sources should not set it, and it should not be present in actual docjoins or the index.
  • BlogData (type: GoogleApi.ContentWarehouse.V1.Model.BlogPerDocData.t, default: nil) -
  • nsrIsVideoFocusedSite (type: boolean(), default: nil) - This field is propagated to shards. It will also be populated at serving time by go/web-signal-joins (see b/170607253). Bit indicating whether this site is video-focused, but not hosted on any major known video hosting domains. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
  • ScaledExptSpamScoreYoram (type: integer(), default: nil) -
  • spamrank (type: integer(), default: nil) - The spamrank measures the likelihood that this document links to known spammers. Its value is between 0 and 65535.
  • compressedQualitySignals (type: GoogleApi.ContentWarehouse.V1.Model.CompressedQualitySignals.t, default: nil) -
  • videodata (type: GoogleApi.ContentWarehouse.V1.Model.VideoPerDocData.t, default: nil) -
  • s3AudioLanguage (type: GoogleApi.ContentWarehouse.V1.Model.S3AudioLanguageS3AudioLanguage.t, default: nil) - Primary video's audio language classified by S3 based Automatic Language Identification (only for watch pages).
  • watchpageLanguageResult (type: GoogleApi.ContentWarehouse.V1.Model.WatchpageLanguageWatchPageLanguageResult.t, default: nil) - Language classified by the WatchPageLanguage Model (go/watchpage-language). Only present for watch pages.
  • appsLink (type: GoogleApi.ContentWarehouse.V1.Model.QualityCalypsoAppsLink.t, default: nil) - AppsLink contains Android application IDs in outlinks. It is used to improve results ranking within applications universal. See http://go/apps-universal for the project details.
  • desktopInterstitials (type: GoogleApi.ContentWarehouse.V1.Model.IndexingMobileInterstitialsProtoDesktopInterstitials.t, default: nil) - Contains desktop interstitials signal for VOLT ranking change.
  • liveResultsData (type: GoogleApi.ContentWarehouse.V1.Model.WeboftrustLiveResultsDocAttachments.t, default: nil) -
  • crowdingdata (type: GoogleApi.ContentWarehouse.V1.Model.CrowdingPerDocData.t, default: nil) -
  • nsrSitechunk (type: String.t, default: nil) - SiteChunk computed for nsr. It some cases it can use more information than just url (e.g. youtube channels). See NsrAnnotator for details. If sitechunk is longer than --populate_nsr_sitechunk_max_length (default=100), it will not get populated. This field might be compressed and needs to be decoded with quality_nsr::util::DecodeNsrSitechunk. See go/nsr-chunks for more details. This field contains only nontrivial primary chunks.
  • originalTitleHardTokenCount (type: integer(), default: nil) - The number of hard tokens in the title.
  • hostAge (type: integer(), default: nil) - The earliest firstseen date of all pages in this host/domain. These data are used in twiddler to sandbox fresh spam in serving time. It is 16 bit and the time is day number after 2005-12-31, and all the previous time are set to 0. If this url's host_age == domain_age, then omit domain_age Please use //spam/content/siteage-util.h to convert the day between epoch second. Regarding usage of Sentinel values: We would like to check if a value exists in scoring bundle while using in Ranklab AST. For this having a sentinel value will help us know if the field exists or has a sentinel value (in the case it does not exist). 16-bit
  • inNewsstand (type: boolean(), default: nil) - This field indicates whether the document is in the newsstand corpus.
  • origin (type: integer(), default: nil) -
  • launchAppInfo (type: GoogleApi.ContentWarehouse.V1.Model.QualityRichsnippetsAppsProtosLaunchAppInfoPerDocData.t, default: nil) - Info on how to launch a mobile app to consume this document's content, if applicable (see go/calypso).
  • eventsDate (type: list(String.t), default: nil) - Date for Events. A web page might list multiple events with different dates. We only take one date (start date) per event.
  • homePageInfo (type: integer(), default: nil) -
  • GibberishScore (type: integer(), default: nil) - The gibberish score is represented in 7 bits, going from 0 to 127.
  • toolbarPagerank (type: integer(), default: nil) - A copy of the value stored in /namespace/indexing/wwwglobal//fakepr/* for this document. A value of quality_bakery::FakeprUtils::kUnknownToolbarPagerank indicates that we don't have toolbar pagerank for this document. A value between 0 and 10 (inclusive) means that this is the toolbar pagerank of the page. Finally, if this value is not set it means that the toolbar pagerank is equivalent to: quality_bakery::FakeprUtils::EstimatePreDemotionFromPagerankNearestSeeds( basic_info.pagerank_ns()) called on the MustangBasicInfo attachment for the same document.
  • freshboxArticleScores (type: integer(), default: nil) - Stores scores of freshness-related classifiers: freshbox article score, live blog score and host-level article score. The encoding/decoding API is in quality/freshness/freshbox/goldmine/freshbox_annotation_encoder.h. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_RYXS2lX2IV.
  • WhirlpoolDiscount (type: number(), default: nil) -
  • ScaledExptIndyRank3 (type: integer(), default: nil) - experimental
  • ToolBarData (type: GoogleApi.ContentWarehouse.V1.Model.ToolBarPerDocData.t, default: nil) -
  • nsrIsElectionAuthority (type: boolean(), default: nil) - This field is propagated to shards. It will also be populated at serving time by go/web-signal-joins (see b/168114815). This field is deprecated - used the equivalent field inside nsr_data_proto instead.
  • onsiteProminence (type: integer(), default: nil) - Onsite prominence measures the importance of the document within its site. It is computed by propagating simulated traffic from the homepage and high craps click pages. It is a 13-bit int.
  • travelGoodSitesInfo (type: GoogleApi.ContentWarehouse.V1.Model.QualityTravelGoodSitesData.t, default: nil) - This field stores information about good travel sites.
  • IsAnchorBayesSpam (type: boolean(), default: nil) - Is this document considered spam by the anchor bayes classifier?
  • isHotdoc (type: boolean(), default: nil) - Set by the FreshDocs instant doc joiner. See //indexing/instant/hotdocs/README and http://go/freshdocs-hotdocs.
  • commercialScore (type: number(), default: nil) - A measure of commerciality of the document Score > 0 indicates document is commercial (i.e. sells something) Computed by repository/pageclassifiers/parsehandler-commercial.cc
  • asteroidBeltIntents (type: GoogleApi.ContentWarehouse.V1.Model.QualityOrbitAsteroidBeltDocumentIntentScores.t, default: nil) - For indexing Asteroid Belt intent scores. See go/asteroid-belt for details.
  • TagPageScore (type: integer(), default: nil) - Tag-site-ness of a page, repesented in 7-bits range from 0 to 100. Smaller value means worse tag page.
  • geodata (type: String.t, default: nil) - geo data; approx 24 bytes for 23M U.S. pages
  • oceandata (type: GoogleApi.ContentWarehouse.V1.Model.OceanPerDocData.t, default: nil) - 28 bytes per page, only in the Ocean index
  • pagerank0 (type: number(), default: nil) -
  • SpamWordScore (type: integer(), default: nil) - The spamword score is represented in 7-bits, going from 0 to 127.
  • ScaledIndyRank (type: integer(), default: nil) - The independence rank is represented as a 16-bit integer, which is multiplied by (max_indy_rank / 65536) to produce actual independence rank values. max_indy_rank is typically 0.84.
  • bodyWordsToTokensRatioBegin (type: number(), default: nil) - The body words over tokens ratios for the beginning part and whole doc. NB: To save space, field body_words_to_tokens_ratio_total is not set if it has the same value as body_words_to_tokens_ratio_begin (e.g., short docs).
  • topPetacatWeight (type: number(), default: nil) -
  • fireflySiteSignal (type: GoogleApi.ContentWarehouse.V1.Model.QualityCopiaFireflySiteSignal.t, default: nil) - Contains Site signal information for Firefly ranking change. See http://ariane/313938 for more details.
  • titleHardTokenCountWithoutStopwords (type: integer(), default: nil) - Number of hard tokens originally in title without counting the stopwords.
  • hostNsr (type: integer(), default: nil) - Site rank computed for host-level sitechunks. This value encodes nsr, site_pr and new_nsr. See quality_nsr::util::ConvertNsrDataToHostNsr and go/nsr. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
  • semanticDateInfo (type: integer(), default: nil) - Info is encoded using a SemanticDate specific format. Contains confidence scores for day/month/year components as well as various meta data required by the freshness twiddlers.
  • languages (type: list(integer()), default: nil) - Plausible languages in order of decreasing plausibility. Language values are small, IE < 127 so this should compress to one byte each.
  • GroupsData (type: GoogleApi.ContentWarehouse.V1.Model.GroupsPerDocData.t, default: nil) - 16 bytes of groups2 data: used only in groups2 index
  • countryInfo (type: GoogleApi.ContentWarehouse.V1.Model.CountryCountryAttachment.t, default: nil) - This field stores the country information for the document in the form of CountryAttachment.
  • brainloc (type: GoogleApi.ContentWarehouse.V1.Model.QualityGeoBrainlocBrainlocAttachment.t, default: nil) - Brainloc contains location information for the document. See ariane/273189 for details.
  • ScaledLinkAgeSpamScore (type: integer(), default: nil) - End DEPRECATED ------------------------------------------------------------ Link age score is represented as a 7-bit integer, going from 0 to 127.
  • ScaledExptIndyRank (type: integer(), default: nil) - DEPRECATED ---------------------------------------------------------------- Please do not use these fields in any new code. experimental
  • shingleInfo (type: GoogleApi.ContentWarehouse.V1.Model.ShingleInfoPerDocData.t, default: nil) -
  • productSitesInfo (type: GoogleApi.ContentWarehouse.V1.Model.QualityProductProductSiteData.t, default: nil) - This field stores information about product sites.
  • spambrainDomainSitechunkData (type: GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.t, default: nil) - Domain sitechunk level scores coming from spambrain.
  • voltData (type: GoogleApi.ContentWarehouse.V1.Model.IndexingMobileVoltVoltPerDocData.t, default: nil) - Contains page UX signals for VOLT ranking change. See http://ariane/4025970 for more details.
  • timeSensitivity (type: integer(), default: nil) - Encoded Document Time Sensitivity signal.
  • servingTimeClusterIds (type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerServingTimeClusterIds.t, default: nil) - A set of cluster ids which are generated in Alexandria and used to de-dup results at serving time.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.PerDocData{
  BlogData: GoogleApi.ContentWarehouse.V1.Model.BlogPerDocData.t() | nil,
  BookCitationData:
    GoogleApi.ContentWarehouse.V1.Model.BookCitationPerDocData.t() | nil,
  DEPRECATEDAuthorObfuscatedGaia: [String.t()] | nil,
  DEPRECATEDQuarantineWhitelist: boolean() | nil,
  DocLevelSpamScore: integer() | nil,
  Event: [GoogleApi.ContentWarehouse.V1.Model.PerDocDebugEvent.t()] | nil,
  GibberishScore: integer() | nil,
  GroupsData: GoogleApi.ContentWarehouse.V1.Model.GroupsPerDocData.t() | nil,
  IsAnchorBayesSpam: boolean() | nil,
  KeywordStuffingScore: integer() | nil,
  MobileData: GoogleApi.ContentWarehouse.V1.Model.MobilePerDocData.t() | nil,
  OriginalContentScore: integer() | nil,
  PremiumData: GoogleApi.ContentWarehouse.V1.Model.PremiumPerDocData.t() | nil,
  QuarantineInfo: integer() | nil,
  ScaledExptIndyRank: integer() | nil,
  ScaledExptIndyRank2: integer() | nil,
  ScaledExptIndyRank3: integer() | nil,
  ScaledExptSpamScoreEric: integer() | nil,
  ScaledExptSpamScoreYoram: integer() | nil,
  ScaledIndyRank: integer() | nil,
  ScaledLinkAgeSpamScore: integer() | nil,
  ScaledSpamScoreEric: integer() | nil,
  ScaledSpamScoreYoram: integer() | nil,
  SpamWordScore: integer() | nil,
  TagPageScore: integer() | nil,
  ToolBarData: GoogleApi.ContentWarehouse.V1.Model.ToolBarPerDocData.t() | nil,
  WhirlpoolDiscount: number() | nil,
  appsLink:
    GoogleApi.ContentWarehouse.V1.Model.QualityCalypsoAppsLink.t() | nil,
  asteroidBeltIntents:
    GoogleApi.ContentWarehouse.V1.Model.QualityOrbitAsteroidBeltDocumentIntentScores.t()
    | nil,
  authorObfuscatedGaiaStr: [String.t()] | nil,
  biasingdata: GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData.t() | nil,
  biasingdata2:
    GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData2.t() | nil,
  bodyWordsToTokensRatioBegin: number() | nil,
  bodyWordsToTokensRatioTotal: number() | nil,
  brainloc:
    GoogleApi.ContentWarehouse.V1.Model.QualityGeoBrainlocBrainlocAttachment.t()
    | nil,
  commercialScore: number() | nil,
  compressedQualitySignals:
    GoogleApi.ContentWarehouse.V1.Model.CompressedQualitySignals.t() | nil,
  compressedUrl: String.t() | nil,
  contentAttributions:
    GoogleApi.ContentWarehouse.V1.Model.ContentAttributions.t() | nil,
  countryInfo:
    GoogleApi.ContentWarehouse.V1.Model.CountryCountryAttachment.t() | nil,
  crawlPagerank: integer() | nil,
  crawlerIdProto:
    GoogleApi.ContentWarehouse.V1.Model.LogsProtoIndexingCrawlerIdCrawlerIdProto.t()
    | nil,
  crowdingdata:
    GoogleApi.ContentWarehouse.V1.Model.CrowdingPerDocData.t() | nil,
  datesInfo: String.t() | nil,
  desktopInterstitials:
    GoogleApi.ContentWarehouse.V1.Model.IndexingMobileInterstitialsProtoDesktopInterstitials.t()
    | nil,
  domainAge: integer() | nil,
  eventsDate: [String.t()] | nil,
  extraData:
    GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
  fireflySiteSignal:
    GoogleApi.ContentWarehouse.V1.Model.QualityCopiaFireflySiteSignal.t() | nil,
  freshboxArticleScores: integer() | nil,
  freshnessEncodedSignals: String.t() | nil,
  fringeQueryPrior:
    GoogleApi.ContentWarehouse.V1.Model.QualityFringeFringeQueryPriorPerDocData.t()
    | nil,
  geodata: String.t() | nil,
  homePageInfo: integer() | nil,
  homepagePagerankNs: integer() | nil,
  hostAge: integer() | nil,
  hostNsr: integer() | nil,
  imagedata: GoogleApi.ContentWarehouse.V1.Model.ImagePerDocData.t() | nil,
  inNewsstand: boolean() | nil,
  isHotdoc: boolean() | nil,
  kaltixdata: GoogleApi.ContentWarehouse.V1.Model.KaltixPerDocData.t() | nil,
  knexAnnotation:
    GoogleApi.ContentWarehouse.V1.Model.SocialPersonalizationKnexAnnotation.t()
    | nil,
  languages: [integer()] | nil,
  lastSignificantUpdate: String.t() | nil,
  lastSignificantUpdateInfo: String.t() | nil,
  launchAppInfo:
    GoogleApi.ContentWarehouse.V1.Model.QualityRichsnippetsAppsProtosLaunchAppInfoPerDocData.t()
    | nil,
  liveResultsData:
    GoogleApi.ContentWarehouse.V1.Model.WeboftrustLiveResultsDocAttachments.t()
    | nil,
  localizedCluster:
    GoogleApi.ContentWarehouse.V1.Model.IndexingDupsLocalizedLocalizedCluster.t()
    | nil,
  mediaOrPeopleEntities:
    GoogleApi.ContentWarehouse.V1.Model.ImageQualitySensitiveMediaOrPeopleEntities.t()
    | nil,
  noimageframeoverlayreason: integer() | nil,
  nsrDataProto: GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData.t() | nil,
  nsrIsCovidLocalAuthority: boolean() | nil,
  nsrIsElectionAuthority: boolean() | nil,
  nsrIsVideoFocusedSite: boolean() | nil,
  nsrSitechunk: String.t() | nil,
  numUrls: integer() | nil,
  oceandata: GoogleApi.ContentWarehouse.V1.Model.OceanPerDocData.t() | nil,
  onsiteProminence: integer() | nil,
  origin: integer() | nil,
  originalTitleHardTokenCount: integer() | nil,
  pageTags: [integer()] | nil,
  pagerank: number() | nil,
  pagerank0: number() | nil,
  pagerank1: number() | nil,
  pagerank2: number() | nil,
  pageregions: String.t() | nil,
  phildata: GoogleApi.ContentWarehouse.V1.Model.PhilPerDocData.t() | nil,
  productSitesInfo:
    GoogleApi.ContentWarehouse.V1.Model.QualityProductProductSiteData.t() | nil,
  queriesForWhichOfficial:
    GoogleApi.ContentWarehouse.V1.Model.OfficialPagesQuerySet.t() | nil,
  rosettaLanguages: [String.t()] | nil,
  rsApplication:
    GoogleApi.ContentWarehouse.V1.Model.RepositoryAnnotationsRdfaRdfaRichSnippetsApplication.t()
    | nil,
  s3AudioLanguage:
    GoogleApi.ContentWarehouse.V1.Model.S3AudioLanguageS3AudioLanguage.t() | nil,
  saftLanguageInt: [integer()] | nil,
  scaledSelectionTierRank: integer() | nil,
  scienceDoctype: integer() | nil,
  scienceHoldingsIds: [String.t()] | nil,
  semanticDate: integer() | nil,
  semanticDateConfidence: integer() | nil,
  semanticDateInfo: integer() | nil,
  servingTimeClusterIds:
    GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerServingTimeClusterIds.t()
    | nil,
  shingleInfo:
    GoogleApi.ContentWarehouse.V1.Model.ShingleInfoPerDocData.t() | nil,
  smartphoneData:
    GoogleApi.ContentWarehouse.V1.Model.SmartphonePerDocData.t() | nil,
  smearingMaxTotalOffdomainAnchors: integer() | nil,
  socialgraphNodeNameFp: String.t() | nil,
  spamCookbookAction:
    GoogleApi.ContentWarehouse.V1.Model.SpamCookbookAction.t() | nil,
  spamMuppetSignals:
    GoogleApi.ContentWarehouse.V1.Model.SpamMuppetjoinsMuppetSignals.t() | nil,
  spambrainData: GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.t() | nil,
  spambrainDomainSitechunkData:
    GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.t() | nil,
  spambrainTotalDocSpamScore: number() | nil,
  spamrank: integer() | nil,
  spamtokensContentScore: number() | nil,
  timeSensitivity: integer() | nil,
  titleHardTokenCountWithoutStopwords: integer() | nil,
  toolbarPagerank: integer() | nil,
  topPetacatTaxId: integer() | nil,
  topPetacatWeight: number() | nil,
  travelGoodSitesInfo:
    GoogleApi.ContentWarehouse.V1.Model.QualityTravelGoodSitesData.t() | nil,
  trendspamScore: integer() | nil,
  tundraClusterId: integer() | nil,
  uacSpamScore: integer() | nil,
  urlAfterRedirectsFp: String.t() | nil,
  urlPoisoningData:
    GoogleApi.ContentWarehouse.V1.Model.UrlPoisoningData.t() | nil,
  v2KnexAnnotation:
    GoogleApi.ContentWarehouse.V1.Model.QualitySherlockKnexAnnotation.t() | nil,
  videoCorpusDocid: String.t() | nil,
  videoLanguage:
    GoogleApi.ContentWarehouse.V1.Model.QualityVidyaVideoLanguageVideoLanguage.t()
    | nil,
  videodata: GoogleApi.ContentWarehouse.V1.Model.VideoPerDocData.t() | nil,
  voltData:
    GoogleApi.ContentWarehouse.V1.Model.IndexingMobileVoltVoltPerDocData.t()
    | nil,
  watchpageLanguageResult:
    GoogleApi.ContentWarehouse.V1.Model.WatchpageLanguageWatchPageLanguageResult.t()
    | nil,
  webmirrorEcnFp: String.t() | nil,
  webrefEntities:
    GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefWebrefMustangAttachment.t()
    | nil,
  ymylHealthScore: integer() | nil,
  ymylNewsScore: integer() | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.