Skip to main content
You must set up a way to publish the requests your AI system is receiving to the Openlayer platform to use Openlayer’s monitoring mode. This guide expands on the use of Openlayer’s SDKs to publish data to the Openlayer platform. Alternatively, you can use Openlayer’s REST API, as discussed in the Monitoring overview.

Data publishing methods

The data publishing methods are categorized as streamlined approaches and the manual approach. The streamlined approaches exist for common AI patterns and frameworks. To use them, you need to wrap or decorate your code a certain way, and Openlayer automatically captures relevant data and metadata, such as the number of tokens, cost, latency, etc. This data is then published to the Openlayer platform. The manual approach is system-agnostic. It is equivalent to hitting the relevant endpoint from Openlayer’s REST API but via Openlayer’s SDKs.

Streamlined approaches

There is a streamlined approach for each of the frameworks below:
To monitor chat completions and completion calls to OpenAI LLMs, you need to:
# 1. Set the environment variables
import os
import openai

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Import the `trace_openai` function and wrap the OpenAI client with it
from openlayer.lib import trace_openai

openai_client = trace_openai(openai.OpenAI())

# 3. From now on, every chat completion/completion call with
# the `openai_client` is traced and published to Openlayer. E.g.,
completion = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "How are you doing today?"},
    ]
)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.Refer to the OpenAI integration page for more details.
To monitor chat completions and completion calls to Azure OpenAI LLMs, you need to:
Python
# 1. Set the environment variables
import os

os.environ["AZURE_OPENAI_ENDPOINT"] = "YOUR_AZURE_OPENAI_ENDPOINT_HERE"
os.environ["AZURE_OPENAI_API_KEY"] = "YOUR_AZURE_OPENAI_API_KEY_HERE"
os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"] = "YOUR_AZURE_OPENAI_DEPLOYMENT_NAME_HERE"

os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Import the `trace_openai` function
from openai import AzureOpenAI
from openlayer.lib import trace_openai

azure_client = trace_openai(
    AzureOpenAI(
        api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
        api_version="2024-02-01",
        azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    )
)

# 3. From now on, every chat completion/completion call with
# the `azure_client`is traced by Openlayer. E.g.,
completion = azure_client.chat.completions.create(
    model=os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME"),
    messages=[
        {"role": "user", "content": "How are you doing today?"},
    ]
)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, and more.Refer to the Azure OpenAI integration page for more details.

See full Python example

To monitor chat completions models and chains built with LangChain, you need to:
# 1. Set the environment variables
import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Instantiate the `OpenlayerHandler`
from openlayer.lib.integrations import langchain_callback

openlayer_handler = langchain_callback.OpenlayerHandler()

# 3. Pass the handler to your LLM/chain invocations
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(max_tokens=25, callbacks=[openlayer_handler])
chat.invoke("What's the meaning of life?")

That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.
The code snippet above uses LangChain’s ChatOpenAI. However, the Openlayer Callback Handler works for all LangChain chat models and LLMs.
Refer to the LangChain integration page for more details.

See full Python example

To monitor completions across 100+ LLM APIs using LiteLLM’s unified interface, you need to:
Python
# 1. Set the environment variables
import os
import litellm

os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# Set API keys for the providers you plan to use
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY_HERE"
# ... other provider keys as needed

# 2. Import the `trace_litellm` function and enable tracing
from openlayer.lib import trace_litellm

trace_litellm()

# 3. From now on, every completion call with LiteLLM
# is traced and published to Openlayer. E.g.,

# OpenAI
response1 = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    inference_id="openai-example-1"
)

# Anthropic
response2 = litellm.completion(
    model="claude-3-sonnet-20240229",
    messages=[{"role": "user", "content": "Hello!"}],
    inference_id="anthropic-example-1"
)

# Cohere
response3 = litellm.completion(
    model="command-r",
    messages=[{"role": "user", "content": "Hello!"}],
    inference_id="cohere-example-1"
)
That’s it! Now, your LiteLLM completions across all supported providers are being published to Openlayer, along with metadata such as latency, number of tokens, cost estimate, and more.Refer to the LiteLLM integration page for more details.

See full Python example

To trace a multi-step LLM system (such as a RAG system or LLM chains), you just need to decorate all the functions you are interested in adding to a trace with Openlayer’s decorator. For example:
import os
from openlayer.lib import trace, update_current_trace

# Set the environment variables
os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# Decorate all the functions you want to trace
@trace()
def main(user_query: str) -> str:
    context = retrieve_context(user_query)
    answer = generate_answer(user_query, context)
    return answer

@trace()
def retrieve_context(user_query: str) -> str:
    return "Some context"

@trace()
def generate_answer(user_query: str, context: str) -> str:
    return "Some answer"

# Every time the main function is called, the data is automatically
# streamed to your Openlayer project. E.g.:
main("What is the meaning of life?")
Dynamic Trace Updates: You can enhance your traces with metadata and custom inference IDs using update_current_trace() and update_current_step() functions. This enables:
  • Custom Inference IDs: Set custom IDs using update_current_trace(inferenceId="your_id") for request correlation and future data updates
  • Trace Metadata: Add context like update_current_trace(user_id="123", session="abc") for user tracking
  • Step Metadata: Add step-specific data using update_current_step(model="gpt-4", tokens=150) for detailed observability
Key Benefit: Custom inference IDs enable you to collect user feedback, ratings, and business signals after requests are completed. See the Tracing guide and Updating Data guide for comprehensive examples.
You can use the decorator together with the other streamlined methods. For example, if your generate_answer function uses a wrapped version of the OpenAI client, the chat completion calls will get added to the trace under the generate_answer function step.

See full Python example

To monitor Anthropic LLMs, you need to:
# 1. Set the environment variables
import anthropic
import os

os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY_HERE"

os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Import the `trace_anthropic` function
from openlayer.lib import trace_anthropic

anthropic_client = trace_anthropic(anthropic.Anthropic())

# 3. From now on, every message creation call with
# the `anthropic_client`is traced by Openlayer. E.g.,
completion = anthropic_client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "How are you doing today?"}
    ],
)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.Refer to the Anthropic integration page for more details.

See full Python example

To monitor Mistral AI LLMs, you need to:
# 1. Set the environment variables
import os

os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Import the `trace_mistral` function and wrap the Mistral client
from mistralai import Mistral
from openlayer.lib import trace_mistral

mistral_client = trace_mistral(Mistral(api_key=os.environ["MISTRAL_API_KEY"]))

# 3. From now on, every chat completion or streaming call with
# the `mistral_client` is traced by Openlayer. E.g.,
completion = mistral_client.chat.complete(
    model="mistral-large-latest",
    messages = [
        {"role": "user", "content": "What is the best French cheese?"},
    ]
)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.Refer to the Mistral AI integration page for more details.

See full Python example

To monitor Groq LLMs, you need to:
# 1. Set the environment variables
import os

os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY_HERE"
os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"

# 2. Import the `trace_groq` function and wrap the Groq client
import groq
from openlayer.lib import trace_groq

groq_client = trace_groq(groq.Groq())

# 3. From now on, every chat completion call with
# the `groq_client` is traced by Openlayer. E.g.,
completion = groq_client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="llama3-8b-8192",
)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.Refer to the Groq integration page for more details.

See full Python example

To monitor runs from OpenAI Assistants, you need to:
  1. Set the environment variables:
import os
import openai

# OpenAI env variables
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

# Openlayer env variables
os.environ["OPENLAYER_API_KEY"] = "YOUR_OPENLAYER_API_KEY_HERE"
os.environ["OPENLAYER_INFERENCE_PIPELINE_ID"] = "YOUR_OPENLAYER_INFERENCE_PIPELINE_ID_HERE"
  1. Instantiate the OpenAI client:
openai_client = openai.OpenAI()
  1. Create assistant, thread, and run it
# Create the assistant
assistant = openai_client.beta.assistants.create(
    name="Data visualizer",
    description="You are great at creating and explaining beautiful data visualizations.",
    model="gpt-4",
    tools=[{"type": "code_interpreter"}],
)

# Create a thread
thread = openai_client.beta.threads.create(
    messages=[
        {
        "role": "user",
        "content": "Create a data visualization of the american GDP.",
        }
    ]
)

# Run assistant on thread
run = openai_client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

from openlayer.lib import trace_openai_assistant_thread_run
import time

# Keep polling the run results
while run.status != "completed":
    run = openai_client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

    # Trace the run with the Openlayer `trace_openai_assistant_thread_run`. If complete, the thread is sent to Openlayer
    trace_openai_assistant_thread_run(openai_client, run)

    time.sleep(5)
That’s it! Now, your calls are being published to Openlayer, along with metadata, such as latency, number of tokens, cost estimate, and more.

Manual approach

To manually stream data to Openlayer, you can use the stream method, which hits the /data-stream endpoit of the Openlayer REST API.
# Let's say we want to stream the following row, which represents a model prediction:
rows = [
    {
        "user_query": "what's the meaning of life?",
        "output": "42",
        "tokens": 7,
        "cost": 0.02,
        "timestamp": 1620000000,
    }
]

# Instantiate the Openlayer client
import os
from openlayer import Openlayer

client = Openlayer(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENLAYER_API_KEY"),
)

# Prepare the config for the data, which depends on your project's task type. In this
# case, we have an LLM project:
from openlayer.types.inference_pipelines import data_stream_params

config = data_stream_params.ConfigLlmData(
    input_variable_names=["user_query"],
    output_column_name="output",
    num_of_token_column_name="tokens",
    cost_column_name="cost",
    timestamp_column_name="timestamp",
    prompt=[{"role": "user", "content": "{{ user_query }}"}],
)

# Use the `stream` method
data_stream_response = client.inference_pipelines.data.stream(
    id="YOUR_INFERENCE_PIPELINE_ID",
    rows=rows,
    config=config,
)

I