View Source GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlVersion (google_api_content_warehouse v0.3.0)

NEXT_TAG: 15

Attributes

  • additionalChangesMerged (type: integer(), default: nil) - Same as the field in UrlChange. This allows us to merge identical UrlVersions into a single UrlVersion.
  • contentType (type: integer(), default: nil) - The content type of the page.
  • isImsNotModified (type: boolean(), default: nil) - Whether this is an IMS response (a 304, not modified).
  • lastModified (type: integer(), default: nil) - The date from the LastModified header, if present.
  • offDomainLinksChecksum (type: integer(), default: nil) - The checksum of all the off-domain links on the page.
  • offDomainLinksCount (type: integer(), default: nil) - The count of all the off-domain links on the page.
  • onDomainLinksCount (type: integer(), default: nil) - The count of all the on-domain links on the page. We aren't worried about the contents themselves, since they might often change (e.g., session ids). We assume that a change in the number of links is significant, however.
  • shingleSimhash (type: GoogleApi.ContentWarehouse.V1.Model.IndexingConverterShingleFingerprint.t, default: nil) - The simhash value obtained from shingles.
  • simhash (type: String.t, default: nil) - The simhash-v1 value. The simhash-v1 is now deprecated and new UrlVersions should only populate simhash-v2. During migration phase from using simhash-v1 to simhash-v2, it is possible that previous UrlChange only contain simhash-v1 and the next UrlChange / UrlVersion could only contain simhash-v2. In this case, we skip that interval in our changerate computation. [go/changerate-simhash-v2-migration]
  • simhashIsTrusted (type: boolean(), default: nil) - Whether the simhash-v1 should be trusted.
  • simhashV2 (type: String.t, default: nil) - The simhash-v2 value.
  • simhashV2IsTrusted (type: boolean(), default: nil) - Whether the simhash-v2 value should be trusted.
  • tile (type: list(integer()), default: nil) - The tiles of the document body. We use int32s instead of int64s (the norm) in order to save space. Since rare inaccuracy doesn't really matter, we've decided this is an okay tradeoff.
  • timestamp (type: integer(), default: nil) - The timestamp we crawled the page.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlVersion{
  additionalChangesMerged: integer() | nil,
  contentType: integer() | nil,
  isImsNotModified: boolean() | nil,
  lastModified: integer() | nil,
  offDomainLinksChecksum: integer() | nil,
  offDomainLinksCount: integer() | nil,
  onDomainLinksCount: integer() | nil,
  shingleSimhash:
    GoogleApi.ContentWarehouse.V1.Model.IndexingConverterShingleFingerprint.t()
    | nil,
  simhash: String.t() | nil,
  simhashIsTrusted: boolean() | nil,
  simhashV2: String.t() | nil,
  simhashV2IsTrusted: boolean() | nil,
  tile: [integer()] | nil,
  timestamp: integer() | nil
}

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.