View Source GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchor (google_api_content_warehouse v0.3.0)
Attributes
-
creationDate
(type:integer()
, default:nil
) - used for history - the first and last time we have seen this anchor. creation_date also used for Freshdocs Twitter indexing, a retweet is an anchor of the original tweet. This field records the time when a retweet is created. -
origText
(type:String.t
, default:nil
) - Original text, including capitalization and punctuation. Runs of whitespace are collapsed into a single space. -
context2
(type:integer()
, default:nil
) - This is a hash of terms near the anchor. (This is a second-generation hash replacing the value stored in the 'context' field.) -
fontsize
(type:integer()
, default:nil
) - -
experimental
(type:boolean()
, default:nil
) - If true, the anchor is for experimental purposes and should not be used in serving. -
fragment
(type:String.t
, default:nil
) - The URL fragment for this anchor (the foo in http://www.google.com#foo) -
sourceType
(type:integer()
, default:nil
) - is to record the quality of the anchor's source page and is correlated with but not identical to the index tier of the source page. In the docjoins built by the indexing pipeline (Alexandria), - Anchors marked TYPE_HIGH_QUALITY are from base documents. - Anchors marked TYPE_MEDIUM_QUALITY are from documents of medium quality (roughly but not exactly supplemental tier documents). - Anchors marked TYPE_LOW_QUALITY are from documents of low quality (roughly but not exactly blackhole documents). Note that the source_type can also be used as an importance indicator of an anchor (a lower source_type value indicates a more important anchor), so it is important to enforce that TYPE_HIGH_QUALITY < TYPE_MEDIUM_QUALITY < TYPE_LOW_QUALITY To add a new source type in future, please maintain the proper relationship among the types as well. TYPE_FRESHDOCS, only available in freshdocs indexing, is a special case and is considered the same type as TYPE_HIGH_QUALITY for the purpose of anchor importance in duplicate anchor removal. -
pagerankWeight
(type:number()
, default:nil
) - Weight to be stored in linkmaps for pageranker -
isLocal
(type:boolean()
, default:nil
) - The bit ~roughly~ indicates whether an anchor's source and target pages are on the same domain. Note: this plays no role in determining whether an anchor is onsite, ondomain, or offdomain in mustang (i.e., the bit above). -
originalTargetDocid
(type:String.t
, default:nil
) - The docid of the anchor's original target. This field is available if and only if the anchor is forwarded. -
fullLeftContext
(type:list(String.t)
, default:nil
) - The full context. These are not written out in the linklogs. -
expired
(type:boolean()
, default:nil
) - true iff exp domain -
catfishTags
(type:list(integer())
, default:nil
) - CATfish tags attached to a link. These are similar to link tags, except the values are created on the fly within Cookbook. See: http://sites/cookbook/exporting/indexing -
deletionDate
(type:integer()
, default:nil
) - -
linkTags
(type:list(integer())
, default:nil
) - Contains info on link type, source page, etc. -
forwardingTypes
(type:integer()
, default:nil
) - How the anchor is forwarded to the canonical, available only for forwarded anchors (i.e., the field is set). The forwarding types are defined in URLForwardingUtil (segindexer/segment-indexer-util.h). Always use URLForwardingUtil to access this field and use URLForwardingUtil::GetAnchorForwardingReason to get the explanation how the anchor is forwarded to the canonical. NOTE: Use with caution as it is only set for docjoins generated using the urlmap from repository/updater. -
possiblyOldFirstseenDate
(type:boolean()
, default:nil
) - DEPRECATED. It used to be set if firstseen_date is not set. It's to indicate that the anchor is possibly old, but we don't have enough information to tell until the linkage map is updated. TODO(hxu) rename it to possibly_old_firstseen_date_DEPRECATED after clean up other dependencies. -
locality
(type:integer()
, default:nil
) - For ranking purposes, the quality of an anchor is measured by its "locality" and "bucket". See quality/anchors/definitions.h for more information. -
demotionreason
(type:integer()
, default:nil
) - DEPRECATED -
parallelLinks
(type:integer()
, default:nil
) - The number of additional links from the same source page to the same target domain. Not populated if is_local is true. -
text
(type:String.t
, default:nil
) - Space-delimited anchor words. Text that needs segmentation (like CJK or Thai) is unsegmented, since we set FLAGS_segment_during_lexing to false in mr-linkextractor.cc . -
source
(type:GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchorSource.t
, default:nil
) - -
bucket
(type:integer()
, default:nil
) - -
fullRightContext
(type:list(String.t)
, default:nil
) - -
targetUrlEncoding
(type:integer()
, default:nil
) - A given target URL may be found in different encodings in different documents. We store the URL encoding with each source anchor so that we can count them later to find the encoding most likely to be expected by the Web site. Around 0.7% of target URLs are expected to require a non-default value here. The default value 0 is referenced in C++ as webutil::kDefaultUrlEncoding. See also webutil/urlencoding. -
compressedOriginalTargetUrl
(type:String.t
, default:nil
) - The anchor's original target url, compressed. Available only in Alexandria docjoins when the anchor is forwarded. -
firstseenDate
(type:integer()
, default:nil
) - # days past Dec 31, 1994, 23:00:00 UTC (Unix time @788914800) that this link was first seen. Should never occupy more than 15 bits. NOTE: this is NOT the same as creation_date; firstseen_date is filled during link extraction -
setiPagerankWeight
(type:number()
, default:nil
) - TEMPORARY -
context
(type:integer()
, default:nil
) - -
linkAdditionalInfo
(type:GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t
, default:nil
) - Additional information related to the anchor, such as additional anchor text or scores. -
type
(type:integer()
, default:nil
) - DEPRECATED: Now in link_tags -
firstseenNearCreation
(type:boolean()
, default:nil
) - true if we think 'firstseen_date' is an accurate estimate of when the link was actually added to the source page. false if it may have existed for some time before we saw it. -
lastUpdateTimestamp
(type:integer()
, default:nil
) - Used for history and freshness tracking - the timestamp this anchor is updated in indexing. -
offset
(type:integer()
, default:nil
) - This is the offset for the first term in the anchor - it can be used as a unique ID for the anchor within the document and compared against all per-tag data. This is measured in bytes from the start of the document. We write this out to the linklogs to recover the original order of links after source/target forwarding. This is necessary for computing the global related data. -
weight
(type:integer()
, default:nil
) - weights are 0-127 -
deleted
(type:boolean()
, default:nil
) - -
encodedNewsAnchorData
(type:integer()
, default:nil
) - Encoded data containing information about newsiness of anchor. Populated only if anchor is classified as coming from a newsy, high quality site. Encoded data for anchor sources are being stored in googledata/quality/freshness/news_anchors/encoded_news_anchors_data.txt Scores are being computed with quality/freshness/news_anchors/ routines. -
compressedImageUrls
(type:list(String.t)
, default:nil
) - If the anchor contained images, these image urls are stored here in compressed form. -
timestamp
(type:String.t
, default:nil
) - This field is DEPRECATED and no longer filled. For source page crawl timestamp, use Source.crawl_timestamp. Next tag id should be 62.
Summary
Functions
Unwrap a decoded JSON object into its complex fields.
Types
@type t() :: %GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchor{ bucket: integer() | nil, catfishTags: [integer()] | nil, compressedImageUrls: [String.t()] | nil, compressedOriginalTargetUrl: String.t() | nil, context: integer() | nil, context2: integer() | nil, creationDate: integer() | nil, deleted: boolean() | nil, deletionDate: integer() | nil, demotionreason: integer() | nil, encodedNewsAnchorData: integer() | nil, experimental: boolean() | nil, expired: boolean() | nil, firstseenDate: integer() | nil, firstseenNearCreation: boolean() | nil, fontsize: integer() | nil, forwardingTypes: integer() | nil, fragment: String.t() | nil, fullLeftContext: [String.t()] | nil, fullRightContext: [String.t()] | nil, isLocal: boolean() | nil, lastUpdateTimestamp: integer() | nil, linkAdditionalInfo: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil, linkTags: [integer()] | nil, locality: integer() | nil, offset: integer() | nil, origText: String.t() | nil, originalTargetDocid: String.t() | nil, pagerankWeight: number() | nil, parallelLinks: integer() | nil, possiblyOldFirstseenDate: boolean() | nil, setiPagerankWeight: number() | nil, source: GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchorSource.t() | nil, sourceType: integer() | nil, targetUrlEncoding: integer() | nil, text: String.t() | nil, timestamp: String.t() | nil, type: integer() | nil, weight: integer() | nil }
Functions
Unwrap a decoded JSON object into its complex fields.