
Using OpenAI but not the Agents SDK? Check out the OpenAI
integration page.
Evaluating OpenAI Agents SDK Applications
You can set up Openlayer tests to evaluate your OpenAI Agents SDK applications in monitoring and development.Monitoring
To use the monitoring mode, you must instrument your code to publish the requests your AI system receives to the Openlayer platform. To set it up, you must follow the steps in the code snippet below:Python
See full Python example
- Agent conversations and message exchanges
- Function tool calls and their outputs
- Agent handoffs between different specialized agents
- Context sharing across agent interactions
- Metadata such as latency, token usage, and cost estimates

The OpenAI Agents SDK integration automatically captures the full conversation
flow, including agent handoffs and tool usage. You can use this together with
tracing to monitor complex multi-agent systems as part
of larger AI workflows.
Development
In development mode, Openlayer becomes a step in your CI/CD pipeline, and your tests get automatically evaluated after being triggered by some events. Openlayer tests often rely on your AI system’s outputs on a validation dataset. As discussed in the Configuring output generation guide, you have two options:- either provide a way for Openlayer to run your AI system on your datasets, or
- before pushing, generate the model outputs yourself and push them alongside your artifacts.
OPENAI_API_KEY
.
If you don’t add the required OpenAI API key, you’ll encounter a “Missing API key” error when Openlayer tries to run your AI system to get its outputs.