The GPT evaluation test allows you to evaluate text using an LLM. You can write descriptive evaluations like “Make sure the outputs are in Portuguese,” and Openlayer will use an LLM to grade your agent or model given this criterion. The LLM will also explain its evaluation.

To use this test, you must provide your OpenAI API key. You can do so on your workspace settings page.


  • Category: Performance.
  • Task types: LLM.
  • Availability: and .

Why it matters

  • Sometimes, it is hard to evaluate a model’s performance using only a metric. For example, if you are building a chatbot, you might want to make sure that the bot does not use profanity. You can encode these more subjective criteria using the GPT evaluation test.