View Source GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics (google_api_ai_platform v0.13.0)
Metrics for general pairwise text generation evaluation results.
Attributes
-
accuracy
(type:number()
, default:nil
) - Fraction of cases where the autorater agreed with the human raters. -
baselineModelWinRate
(type:number()
, default:nil
) - Percentage of time the autorater decided the baseline model had the better response. -
cohensKappa
(type:number()
, default:nil
) - A measurement of agreement between the autorater and human raters that takes the likelihood of random agreement into account. -
f1Score
(type:number()
, default:nil
) - Harmonic mean of precision and recall. -
falseNegativeCount
(type:String.t
, default:nil
) - Number of examples where the autorater chose the baseline model, but humans preferred the model. -
falsePositiveCount
(type:String.t
, default:nil
) - Number of examples where the autorater chose the model, but humans preferred the baseline model. -
humanPreferenceBaselineModelWinRate
(type:number()
, default:nil
) - Percentage of time humans decided the baseline model had the better response. -
humanPreferenceModelWinRate
(type:number()
, default:nil
) - Percentage of time humans decided the model had the better response. -
modelWinRate
(type:number()
, default:nil
) - Percentage of time the autorater decided the model had the better response. -
precision
(type:number()
, default:nil
) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the autorater thought the model had a better response. True positive divided by all positive. -
recall
(type:number()
, default:nil
) - Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the humans thought the model had a better response. -
trueNegativeCount
(type:String.t
, default:nil
) - Number of examples where both the autorater and humans decided that the model had the worse response. -
truePositiveCount
(type:String.t
, default:nil
) - Number of examples where both the autorater and humans decided that the model had the better response.
Summary
Functions
Unwrap a decoded JSON object into its complex fields.
Types
@type t() :: %GoogleApi.AIPlatform.V1.Model.GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics{ accuracy: number() | nil, baselineModelWinRate: number() | nil, cohensKappa: number() | nil, f1Score: number() | nil, falseNegativeCount: String.t() | nil, falsePositiveCount: String.t() | nil, humanPreferenceBaselineModelWinRate: number() | nil, humanPreferenceModelWinRate: number() | nil, modelWinRate: number() | nil, precision: number() | nil, recall: number() | nil, trueNegativeCount: String.t() | nil, truePositiveCount: String.t() | nil }
Functions
Unwrap a decoded JSON object into its complex fields.