SQL query

Definition

The SQL query test allows you to write custom SQL queries to validate your data and set thresholds on the query results. This test executes a user-defined SQL query that returns a numerical result, which can then be compared against specified thresholds. The SQL query must reference your dataset as df (the table name) and should return a single numerical value.

Taxonomy

  • Task types: LLM, tabular classification, tabular regression, text classification.
  • Availability: and .

Why it matters

  • The SQL query test provides flexibility for creating custom data validation rules tailored to your specific use case.
  • You can implement complex business logic and data quality checks that aren’t covered by standard tests.
  • This enables domain-specific validation rules, such as checking data consistency across multiple columns, validating ranges, or ensuring specific business constraints are met.
  • Custom SQL queries allow you to leverage the full power of SQL for data analysis and validation within your testing pipeline.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the SQL query test:
[
  {
    "name": "Minimum record count check",
    "description": "Ensures the dataset has at least 1000 records",
    "type": "integrity",
    "subtype": "sqlQuery",
    "thresholds": [
      {
        "insightName": "sqlQuery",
        "insightParameters": [
          { "name": "query", "value": "SELECT COUNT(*) FROM df" } // Custom SQL query
        ],
        "measurement": "result",
        "operator": ">=",
        "value": 1000
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "Age range validation",
    "description": "Checks that no records have age values outside the expected range (18-100)",
    "type": "integrity",
    "subtype": "sqlQuery",
    "thresholds": [
      {
        "insightName": "sqlQuery",
        "insightParameters": [
          { "name": "query", "value": "SELECT COUNT(*) FROM df WHERE age < 18 OR age > 100" }
        ],
        "measurement": "result",
        "operator": "<=",
        "value": 0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]