View Source GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics (google_api_ai_platform v0.13.0)

Metrics for general pairwise text generation evaluation results.

Attributes

  • accuracy (type: number(), default: nil) - Fraction of cases where the autorater agreed with the human raters.
  • baselineModelWinRate (type: number(), default: nil) - Percentage of time the autorater decided the baseline model had the better response.
  • cohensKappa (type: number(), default: nil) - A measurement of agreement between the autorater and human raters that takes the likelihood of random agreement into account.
  • f1Score (type: number(), default: nil) - Harmonic mean of precision and recall.
  • falseNegativeCount (type: String.t, default: nil) - Number of examples where the autorater chose the baseline model, but humans preferred the model.
  • falsePositiveCount (type: String.t, default: nil) - Number of examples where the autorater chose the model, but humans preferred the baseline model.
  • humanPreferenceBaselineModelWinRate (type: number(), default: nil) - Percentage of time humans decided the baseline model had the better response.
  • humanPreferenceModelWinRate (type: number(), default: nil) - Percentage of time humans decided the model had the better response.
  • modelWinRate (type: number(), default: nil) - Percentage of time the autorater decided the model had the better response.
  • precision (type: number(), default: nil) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the autorater thought the model had a better response. True positive divided by all positive.
  • recall (type: number(), default: nil) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the humans thought the model had a better response.
  • trueNegativeCount (type: String.t, default: nil) - Number of examples where both the autorater and humans decided that the model had the worse response.
  • truePositiveCount (type: String.t, default: nil) - Number of examples where both the autorater and humans decided that the model had the better response.

Summary

Functions

Unwrap a decoded JSON object into its complex fields.

Types

@type t() ::
  %GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics{
    accuracy: number() | nil,
    baselineModelWinRate: number() | nil,
    cohensKappa: number() | nil,
    f1Score: number() | nil,
    falseNegativeCount: String.t() | nil,
    falsePositiveCount: String.t() | nil,
    humanPreferenceBaselineModelWinRate: number() | nil,
    humanPreferenceModelWinRate: number() | nil,
    modelWinRate: number() | nil,
    precision: number() | nil,
    recall: number() | nil,
    trueNegativeCount: String.t() | nil,
    truePositiveCount: String.t() | nil
  }

Functions

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.