Evaluation and delay windows are part of every monitoring test. In this guide, we explain their use and highlight their differences.

Evaluation windows

To evaluate Openlayer tests, a dataset is always required. While in development, the training and validation sets are the natural choices, for monitoring, the dataset used by tests is defined by an evaluation window.

What is the evaluation window?

The evaluation window defines the period used to accumulate data, and form the dataset used to evaluate a test.

Therefore, every time a monitoring test is created, you are asked to provide an evaluation window — which can vary between 1 hour to 4 weeks, with a default of 1 hour.

The monitoring tests are, then, evaluated at a regular cadence using the data published within its evaluation window. For example, if the evaluation window is equal to 72 hours, this means that every 72 hours, Openlayer computes the metric of interest, and the test status can change.

Let’s look at a concrete example to clarify the process:

  1. Imagine we want to monitor Nulls on the feature Age for our production data. We would navigate to the test creation page and click the Missing values test.
  2. As part of the test creation flow, we are asked for an evaluation window. Let’s say we choose 24 hours.
  3. After the test is successfully created, the platform starts accumulating production data being published.
  4. Then, once the first 24 hours pass, it uses the data accumulated in the past 24 hours, evaluates the number of missing values, and updates the test status.

Each test can have its own evaluation window. This is important because each value monitored has its peculiarities. For instance, longer evaluation windows can smooth out seasonal data. On the other hand, shorter windows can be appropriate if quick reactions to sudden changes are needed.

Delay windows

We have seen that evaluation windows are defined to accumulate the data used by tests. Now, we will explore delay windows.

What is the delay window?

The delay window defines the gap between the test evaluation time and the end of the evaluation window.

Delay windows default to 0, because most tests can be evaluated at a regular cadence defined solely by the evaluation window. However, in some cases, a delay is also needed. For example, for performance tests, which usually require ground truths to be computed, a delay window is needed because the labels are not available at the same time as the data is published to the platform.