GPT Evaluation

GPT evaluation allows you to test questions using a specific configuration to identify issues and compare results across different settings.

Evaluation set

An evaluation set is a collection of questions you want to test. You can add questions manually or import them from a CSV file.

Question: The message you want to test with the LLM.

Expected answer: The answer you're expecting from the LLM. This field is optional and can be left empty.

The expected answer is currently not automatically compared to the actual output. Use it as a reference to remind yourself of what answer you expect from the model.

Runs

A run is the process of testing questions against the LLM using your defined settings. You can compare different runs to see how changes, such as using a different GPT model or persona, affect the answers.

Runs may take some time to complete, depending on the time of day they are started. They are processed with lower priority and placed in a queue to avoid impacting the speed and performance of the LLMs.

Export a Run

You can export a run as a CSV from the Settings view or from the Run view. In the settings view you can select to only support export selected questions or all questions.

Use cases

Testing a new LLM

You want to switch to a newly released GPT model and ensure your existing configuration settings still work as expected.

Changing the persona

You’ve just made changes to the persona to solve a specific issue, and now you want to ensure those changes haven’t negatively affected the output for other questions.

New sources

You’ve added or updated sources and want to see how those changes affect the model’s output.

Last updated

Was this helpful?