Skip to main content
Databricks hero Openlayer integrates with Databricks so you can run data quality tests directly on your Databricks tables. The integration uses a personal access token (PAT) tied to a secure Databricks connection. This ensures auditable, key-based access without requiring usernames or passwords.

Prerequisites

To follow this guide, you need:
  • A Databricks account and workspace with SQL warehouses enabled
  • Permissions to create and use a personal access token (PAT)
  • A table in Databricks you want to monitor (with timestamp and unique ID columns recommended)
  • An Openlayer project with monitoring mode enabled

Setup Guide

Step 1: Generate a personal access token

In your Databricks workspace:
  1. Go to User Settings → Developer → Access Tokens.
  2. Click Generate new token.
  3. Copy and store the PAT securely — you will provide it when connecting Openlayer.
See Databricks documentation for details.

Step 2: Collect connection details

You will need:
  • Hostname: your workspace URL (e.g. https://dbc-247310bd-93fc.cloud.databricks.com)
  • Port: typically 443
  • SQL Warehouse endpoint: path to the warehouse, e.g. /sql/1.0/warehouses/<warehouse-id>
  • Personal access token (PAT): generated in step 1

Step 3: Connect inside Openlayer

In your Openlayer workspace:
  1. Go to Data sources and select Databricks.
  2. Click Connect.
  3. Fill in the fields:
  • Hostname: your workspace hostname (e.g. https://dbc-247310bd-93fc.cloud.databricks.com)
  • Port: usually 443
  • SQL Warehouse endpoint: path to your warehouse
  • Personal access token: PAT you generated
  • Name: a descriptive label for this connection

Step 4: Configure your table

After the connection is created, select the table to monitor:
  • Catalog: Databricks catalog containing the table
  • Schema: schema containing the table
  • Table: table name (e.g. workspace.openlayer_demo.landing_inferences)
  • Timestamp column: column used to order/filter rows (e.g. timestamp)
  • Unique ID column: column identifying unique rows (e.g. inference_id)
  • Data source name: a descriptive label in Openlayer

Optional: ML-specific settings

If the table contains ML outputs, you can provide additional context:
  • Class names
  • Feature names
  • Categorical feature names
  • Predictions column
This enables Openlayer to run ML-aware tests such as drift detection and performance monitoring.

Troubleshooting

  • Authentication errors → verify that your PAT is valid and not expired.
  • Connection errors → confirm the hostname, port, and SQL warehouse endpoint are correct.
  • Empty results → check that the timestamp column is populated and you’ve selected the correct table.
  • Permission errors → ensure your PAT user has access to the warehouse and the target tables.