Traces help you understand your system, particularly when it contains multiple steps, such as in RAG, LLM chains, and agents.

In the monitoring mode of an Openlayer project, you can view the traces for the live requests your AI system receives. This allows you to log the inputs, outputs, latency, and other metadata such as cost and number of tokens associated with every step of your system.

This guide shows how you can set up tracing with Openlayer’s SDKs to achieve a result similar to the one below.

If you prefer, feel free to refer to a notebook example. Our templates gallery also has complete sample projects that show how tracing works for development and monitoring.

How to set up tracing

You must use one of Openlayer’s SDKs to trace your system. After installing the SDK in your language of choice, follow the steps:


Set environment variables

Openlayer needs to know where to upload the traces to. This information is in the following environment variables:


You can also set OPENLAYER_INFERENCE_PIPELINE_NAME or OPENLAYER_INFERENCE_PIPELINE_ID to specify which inference pipeline within the project to use. If not specified, the traces will be uploaded to the default inference pipeline, named production.


Annotate the code you want to trace

Annotate all the functions you want to trace with Openlayer’s SDK.

import openai
from openlayer import llm_monitors
from openlayer.tracing import tracer

# Wrap the OpenAI client with Openlayer's OpenAIMonitor.
# Openlayer also has other wrappers for other LLM providers.
openai_client = openai.OpenAI(api_key="sk-...")

# Decorate all the functions you want to trace
def main(user_query: str) -> str:
    context = retrieve_context(user_query)
    answer = generate_answer(user_query, context)
    return answer

def retrieve_context(user_query: str) -> str:
    return "Some context"

def generate_answer(user_query: str, context: str) -> str:
    result =
        messages=[{"role": "user", "content": user_query + " " + context}],
    return result.choices[0].message.content

The traced generate_answer function in the example above uses an OpenAI LLM. However, tracing also works for other LLM providers. If you set up any of the streamlined approaches described in the Publishing data guide, it will get added to the trace as well.


Use the annotated code

All data that goes through the decorated code is automatically streamed to the Openlayer platform, where your tests and alerts are defined.

In the example above, if we call main:

main("What's the meaning of life?")

the resulting trace would be:

The main function has two nested steps: retrieve_context, and generate_answer. The generate_answer has a chat completion call within it. The cost, number of tokens, latency, and other metadata are all computed automatically behind the scenes.