> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

> Connect your Databricks tables to Openlayer for data quality monitoring

<img width="700" style={{ borderRadius: "0.5rem" }} src="https://mintcdn.com/openlayer-44/cC8k7Pv7ov5BbNmV/images/integrations/databricks_hero.png?fit=max&auto=format&n=cC8k7Pv7ov5BbNmV&q=85&s=3bc3f307cc4467bfdf23c95ac6a8ca24" alt="Databricks hero" data-path="images/integrations/databricks_hero.png" />

Openlayer integrates with [Databricks](https://www.databricks.com/) so you
can run data quality tests directly on your Databricks tables.

The integration uses a **personal access token (PAT)** tied to a secure Databricks
connection. This ensures auditable, key-based access without requiring usernames
or passwords.

## Prerequisites

To follow this guide, you need:

* A Databricks account and workspace with [SQL warehouses](https://docs.databricks.com/aws/en/compute/sql-warehouse/) enabled
* Permissions to create and use a **personal access token (PAT)**
* A table in Databricks you want to monitor (with timestamp and unique ID columns recommended)
* An [Openlayer project](/workspace-and-projects/creating-and-loading-projects) with monitoring mode enabled

## Setup Guide

### Step 1: Generate a personal access token

In your Databricks workspace:

1. Go to **User Settings → Developer → Access Tokens**.
2. Click **Generate new token**.
3. Copy and store the PAT securely — you will provide it when connecting Openlayer.

See [Databricks documentation](https://docs.databricks.com/en/dev-tools/auth/pat.html) for details.

### Step 2: Collect connection details

You will need:

* **Hostname**: your workspace URL (e.g. `https://dbc-247310bd-93fc.cloud.databricks.com`)
* **Port**: typically `443`
* **SQL Warehouse endpoint**: path to the warehouse, e.g. `/sql/1.0/warehouses/<warehouse-id>`
* **Personal access token (PAT)**: generated in step 1

### Step 3: Connect inside Openlayer

In your Openlayer workspace:

1. Go to **Data sources** and select **Databricks**.
2. Click **Connect**.
3. Fill in the fields:

* Hostname: your workspace hostname (e.g. `https://dbc-247310bd-93fc.cloud.databricks.com`)
* Port: usually `443`
* SQL Warehouse endpoint: path to your warehouse
* Personal access token: PAT you generated
* Name: a descriptive label for this connection

### Step 4: Configure your table

After the connection is created, select the table to monitor:

* Catalog: Databricks catalog containing the table
* Schema: schema containing the table
* Table: table name (e.g. `workspace.openlayer_demo.landing_inferences`)
* Timestamp column: column used to order/filter rows (e.g. `timestamp`)
* Unique ID column: column identifying unique rows (e.g. `inference_id`)
* Data source name: a descriptive label in Openlayer

#### Optional: ML-specific settings

If the table contains ML outputs, you can provide additional context:

* Class names
* Feature names
* Categorical feature names
* Predictions column

This enables Openlayer to run ML-aware tests such as drift detection and performance monitoring.

## Troubleshooting

* **Authentication errors** → verify that your PAT is valid and not expired.
* **Connection errors** → confirm the hostname, port, and SQL warehouse endpoint are correct.
* **Empty results** → check that the timestamp column is populated and you’ve selected the correct table.
* **Permission errors** → ensure your PAT user has access to the warehouse and the target tables.
