Context for managing evaluation suites, test cases, and runs.
Summary
Functions
Returns a changeset for tracking suite changes.
Returns a changeset for tracking test case changes.
Creates a new suite.
Creates a new suite run.
Creates a new test case.
Creates a test case document.
Deletes a suite.
Deletes a suite run.
Deletes a test case.
Deletes a test case document.
Executes a test suite against a prompt version and provider.
Gets a suite by ID, raising if not found.
Gets a suite run by ID, raising if not found.
Gets a suite by ID with prompt preloaded, raising if not found.
Gets a suite with all test cases preloaded.
Gets a suite with test cases and prompt preloaded.
Gets a test case by ID, raising if not found.
Gets a single test case document.
Gets a test case with documents preloaded.
Launches suite execution in a supervised task and reports completion back to the given recipient process.
Lists all suite runs in the system.
Gets suite runs for a specific suite.
Gets suite runs for a specific suite with prompt_version and provider preloaded.
Lists all suites in the system.
Lists all suites with their associated prompt preloaded.
Lists all test cases in the system.
Calculates pass rates grouped by prompt.
Reloads a suite run with associations preloaded.
Retries a single test case result within an existing suite run.
Updates an existing suite.
Updates an existing test case.
Functions
@spec change_suite(Aludel.Evals.Suite.t(), map()) :: Ecto.Changeset.t()
Returns a changeset for tracking suite changes.
@spec change_test_case(Aludel.Evals.TestCase.t(), map()) :: Ecto.Changeset.t()
Returns a changeset for tracking test case changes.
@spec create_suite(map()) :: {:ok, Aludel.Evals.Suite.t()} | {:error, Ecto.Changeset.t()}
Creates a new suite.
@spec create_suite_run(map()) :: {:ok, Aludel.Evals.SuiteRun.t()} | {:error, Ecto.Changeset.t()}
Creates a new suite run.
@spec create_test_case(map()) :: {:ok, Aludel.Evals.TestCase.t()} | {:error, Ecto.Changeset.t()}
Creates a new test case.
@spec create_test_case_document(map()) :: {:ok, Aludel.Evals.TestCaseDocument.t()} | {:error, Ecto.Changeset.t()}
Creates a test case document.
@spec delete_suite(Aludel.Evals.Suite.t()) :: {:ok, Aludel.Evals.Suite.t()} | {:error, Ecto.Changeset.t()}
Deletes a suite.
@spec delete_suite_run(Aludel.Evals.SuiteRun.t()) :: {:ok, Aludel.Evals.SuiteRun.t()} | {:error, Ecto.Changeset.t()}
Deletes a suite run.
@spec delete_test_case(Aludel.Evals.TestCase.t()) :: {:ok, Aludel.Evals.TestCase.t()} | {:error, Ecto.Changeset.t()}
Deletes a test case.
@spec delete_test_case_document(Aludel.Evals.TestCaseDocument.t()) :: {:ok, Aludel.Evals.TestCaseDocument.t()} | {:error, Ecto.Changeset.t()}
Deletes a test case document.
@spec execute_suite( Aludel.Evals.Suite.t(), Aludel.Prompts.PromptVersion.t(), Aludel.Providers.Provider.t() ) :: {:ok, Aludel.Evals.SuiteRun.t()} | {:error, term()}
Executes a test suite against a prompt version and provider.
Runs all test cases for the suite, evaluating their assertions against the LLM output and creating a suite_run with results.
Parameters
- suite: The test suite to execute
- prompt_version: The prompt version to use
- provider: The LLM provider to call
Returns
{:ok, suite_run}with execution results{:error, reason}if execution fails
@spec get_suite!(binary()) :: Aludel.Evals.Suite.t()
Gets a suite by ID, raising if not found.
@spec get_suite_run!(binary()) :: Aludel.Evals.SuiteRun.t()
Gets a suite run by ID, raising if not found.
@spec get_suite_with_prompt!(binary()) :: Aludel.Evals.Suite.t()
Gets a suite by ID with prompt preloaded, raising if not found.
@spec get_suite_with_test_cases!(binary()) :: Aludel.Evals.Suite.t()
Gets a suite with all test cases preloaded.
@spec get_suite_with_test_cases_and_prompt!(binary()) :: Aludel.Evals.Suite.t()
Gets a suite with test cases and prompt preloaded.
@spec get_test_case!(binary()) :: Aludel.Evals.TestCase.t()
Gets a test case by ID, raising if not found.
@spec get_test_case_document!(binary()) :: Aludel.Evals.TestCaseDocument.t()
Gets a single test case document.
Raises Ecto.NoResultsError if the document does not exist.
@spec get_test_case_with_documents!(binary()) :: Aludel.Evals.TestCase.t()
Gets a test case with documents preloaded.
@spec launch_suite_execution(pid(), binary(), binary(), binary()) :: {:ok, reference()} | {:error, term()}
Launches suite execution in a supervised task and reports completion back to the given recipient process.
@spec list_suite_runs() :: [Aludel.Evals.SuiteRun.t()]
Lists all suite runs in the system.
Gets suite runs for a specific suite.
Gets suite runs for a specific suite with prompt_version and provider preloaded.
@spec list_suites() :: [Aludel.Evals.Suite.t()]
Lists all suites in the system.
@spec list_suites_with_prompt() :: [Aludel.Evals.Suite.t()]
Lists all suites with their associated prompt preloaded.
@spec list_test_cases() :: [Aludel.Evals.TestCase.t()]
Lists all test cases in the system.
@spec pass_rates_by_prompt() :: [map()]
Calculates pass rates grouped by prompt.
Returns a list of maps with prompt info and pass rate statistics.
@spec reload_suite_run_with_associations(Aludel.Evals.SuiteRun.t()) :: Aludel.Evals.SuiteRun.t()
Reloads a suite run with associations preloaded.
@spec retry_suite_run_test_case(Aludel.Evals.SuiteRun.t(), binary()) :: {:ok, Aludel.Evals.SuiteRun.t()} | {:error, term()}
Retries a single test case result within an existing suite run.
The existing embedded result is replaced in-place and the suite run aggregates are recalculated from the updated result set.
@spec update_suite(Aludel.Evals.Suite.t(), map()) :: {:ok, Aludel.Evals.Suite.t()} | {:error, Ecto.Changeset.t()}
Updates an existing suite.
@spec update_test_case(Aludel.Evals.TestCase.t(), map()) :: {:ok, Aludel.Evals.TestCase.t()} | {:error, Ecto.Changeset.t()}
Updates an existing test case.