Skip to main content
Openlayer integrates with Databricks so you
can run data quality tests directly on your Databricks tables.
The integration uses a personal access token (PAT) tied to a secure Databricks
connection. This ensures auditable, key-based access without requiring usernames
or passwords.
Prerequisites
To follow this guide, you need:
A Databricks account and workspace with SQL warehouses enabled
Permissions to create and use a personal access token (PAT)
A table in Databricks you want to monitor (with timestamp and unique ID columns recommended)
An Openlayer project with monitoring mode enabled
Setup Guide
Step 1: Generate a personal access token
In your Databricks workspace:
Go to User Settings → Developer → Access Tokens .
Click Generate new token .
Copy and store the PAT securely — you will provide it when connecting Openlayer.
See Databricks documentation for details.
Step 2: Collect connection details
You will need:
Hostname : your workspace URL (e.g. https://dbc-247310bd-93fc.cloud.databricks.com
)
Port : typically 443
SQL Warehouse endpoint : path to the warehouse, e.g. /sql/1.0/warehouses/<warehouse-id>
Personal access token (PAT) : generated in step 1
Step 3: Connect inside Openlayer
In your Openlayer workspace:
Go to Data sources and select Databricks .
Click Connect .
Fill in the fields:
Hostname: your workspace hostname (e.g. https://dbc-247310bd-93fc.cloud.databricks.com
)
Port: usually 443
SQL Warehouse endpoint: path to your warehouse
Personal access token: PAT you generated
Name: a descriptive label for this connection
After the connection is created, select the table to monitor:
Catalog: Databricks catalog containing the table
Schema: schema containing the table
Table: table name (e.g. workspace.openlayer_demo.landing_inferences
)
Timestamp column: column used to order/filter rows (e.g. timestamp
)
Unique ID column: column identifying unique rows (e.g. inference_id
)
Data source name: a descriptive label in Openlayer
Optional: ML-specific settings
If the table contains ML outputs, you can provide additional context:
Class names
Feature names
Categorical feature names
Predictions column
This enables Openlayer to run ML-aware tests such as drift detection and performance monitoring.
Troubleshooting
Authentication errors → verify that your PAT is valid and not expired.
Connection errors → confirm the hostname, port, and SQL warehouse endpoint are correct.
Empty results → check that the timestamp column is populated and you’ve selected the correct table.
Permission errors → ensure your PAT user has access to the warehouse and the target tables.