GPT Evaluation
GPT evaluation allows you to test questions using a specific configuration to identify issues and compare results across different settings.
Evaluation set
An evaluation set is a collection of questions you want to test. You can add questions manually or import them from a CSV or XLSX file.
Question: The prompt or input you want to test with the LLM.
Expected answer: The answer you're expecting from the LLM. This field is optional and can be left empty.
The expected answer is currently not automatically compared to the actual output. Use it as a reference to remind yourself of what you expected from the model.
Runs
A run is the process of testing questions against the LLM using your defined settings. You can compare different runs to see how changes, such as using a different GPT model or persona, affect the answers.
Runs may take some time to complete, depending on the time of day they are started. They are processed with lower priority and placed in a queue to avoid impacting the speed and performance of the LLMs.
Use cases
Testing a new LLM
You want to switch to a newly released GPT model and ensure your existing configuration settings still work as expected.
Changing the persona
You’ve just made changes to the persona to solve a specific issue, and now you want to ensure those changes haven’t negatively affected the output for other questions.
New sources
You’ve added or updated sources and want to see how those changes affect the model’s output.
Last updated
Was this helpful?