View Source GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics (google_api_ai_platform v0.13.0)

Metrics for general pairwise text generation evaluation results.

Attributes

accuracy (type: number(), default: nil) - Fraction of cases where the autorater agreed with the human raters.
baselineModelWinRate (type: number(), default: nil) - Percentage of time the autorater decided the baseline model had the better response.
cohensKappa (type: number(), default: nil) - A measurement of agreement between the autorater and human raters that takes the likelihood of random agreement into account.
f1Score (type: number(), default: nil) - Harmonic mean of precision and recall.
falseNegativeCount (type: String.t, default: nil) - Number of examples where the autorater chose the baseline model, but humans preferred the model.
falsePositiveCount (type: String.t, default: nil) - Number of examples where the autorater chose the model, but humans preferred the baseline model.
humanPreferenceBaselineModelWinRate (type: number(), default: nil) - Percentage of time humans decided the baseline model had the better response.
humanPreferenceModelWinRate (type: number(), default: nil) - Percentage of time humans decided the model had the better response.
modelWinRate (type: number(), default: nil) - Percentage of time the autorater decided the model had the better response.
precision (type: number(), default: nil) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the autorater thought the model had a better response. True positive divided by all positive.
recall (type: number(), default: nil) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the humans thought the model had a better response.
trueNegativeCount (type: String.t, default: nil) - Number of examples where both the autorater and humans decided that the model had the worse response.
truePositiveCount (type: String.t, default: nil) - Number of examples where both the autorater and humans decided that the model had the better response.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

t()

@type t() ::
  %GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics{
    accuracy: number() | nil,
    baselineModelWinRate: number() | nil,
    cohensKappa: number() | nil,
    f1Score: number() | nil,
    falseNegativeCount: String.t() | nil,
    falsePositiveCount: String.t() | nil,
    humanPreferenceBaselineModelWinRate: number() | nil,
    humanPreferenceModelWinRate: number() | nil,
    modelWinRate: number() | nil,
    precision: number() | nil,
    recall: number() | nil,
    trueNegativeCount: String.t() | nil,
    truePositiveCount: String.t() | nil
  }

Functions

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.