Column contains string

Definition

Let A be a column in a dataset containing strings. Let B be a column in a dataset containing lists of strings. The column contains string test asserts that the list of strings in B contains the string in A on a per-row basis. For example:

A	B	Result
”a”	[“a”, “b”, “c”]	✓ Passed
”b”	[“a”, “b”, “c”]	✓ Passed
”c”	[“a”, “b”, “c”]	✓ Passed
”d”	[“a”, “b”, “c”]	x Failed

Since “d” is not in the list [“a”, “b”, “c”], the test fails.

Taxonomy

Category: Integrity.
Task types: LLM, tabular classification, tabular regression, text classification.
Availability: and .

Why it matters

In particular for RAG LLM projects, the context retriever will return a list of the top K contexts. The column contains string test can be used to ensure that the context retriever returns at least one of the correct contexts.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "Values in 'top_k_contexts' should be in 'correct_context' for every row",
    "description": "Asserts that the list of strings in 'top_k_contexts' contains the string in 'correct_context' on a per-row basis.",
    "type": "integrity",
    "subtype": "expectColumnAToBeInColumnB",
    "thresholds": [
      {
        "insightName": "expectColumnAToBeInColumnB",
        "insightParameters": [
          {
            "name": "column_a_name",
            "value": "correct_context" // Selects column A (`correct_context`)
          },
          {
            "name": "column_b_name",
            "value": "top_k_contexts" // Selects column B (`top_k_contexts`)
          }
        ],
        "measurement": "failingRowCount",  // Use the absolute row count
        "operator": "<=",
        "value": 0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "Values in 'top_k_contexts' should be in 'correct_context' for at least 80% of the rows",
    "description": "Asserts that the list of strings in 'top_k_contexts' contains the string in 'correct_context' on a per-row basis.",
    "type": "integrity",
    "subtype": "expectColumnAToBeInColumnB",
    "thresholds": [
      {
        "insightName": "expectColumnAToBeInColumnB",
        "insightParameters": [
          {
            "name": "column_a_name",
            "value": "correct_context" // Selects column A (`correct_context`)
          },
          {
            "name": "column_b_name",
            "value": "top_k_contexts" // Selects column B (`top_k_contexts`)
          }
        ],
        "measurement": "failingRowPercentage", // Use the row percentage
        "operator": "<",
        "value": 0.2
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]

Great Expectations test.

Get started

Set up tests

Test your system offline

Monitor your live system

Other resources

Column contains string

Definition

Taxonomy

Why it matters

Test configuration examples

Get started

Set up tests

Test your system offline

Monitor your live system

Other resources

​Definition

​Taxonomy

​Why it matters

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Test configuration examples

Related