> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Special characters ratio

> Learn how to use the special characters ratio test

## Definition

The special characters ratio test allows you to set a threshold on the ratio between the number of rows
that **only** contain special characters and the ones that contain alphanumeric characters for
a given column.

## Taxonomy

* **Task types**: LLM, text classification.
* **Availability**: <Tooltip tip="Continuously evaluate your models and datasets as you iterate on their versions.">development</Tooltip>
  and <Tooltip tip="Monitor a model in production, measure its health, check for drifts and set up alerts.">monitoring</Tooltip>.

## Why it matters

* Often, entries that only contain special characters are a sign of data quality issues.
* Understanding the extent of rows with only special characters helps in designing models that are robust to such anomalies. If your model is expected to encounter similar data in production, you might want to train it with some level of noise tolerance.

## Test configuration examples

If you are writing a `tests.json`, here are a few valid configurations for the character length test:

<CodeGroup>
  ```json Development theme={null}
  [
    {
      "name": "No more than 1% of outputs with only special characters",
      "description": "Asserts that the percentage of rows with only special characters is less than 1%",
      "type": "integrity",
      "subtype": "specialCharactersRatio",
      "thresholds": [
        {
          "insightName": "specialCharacters",
          "insightParameters": [
            { "name": "column_name", "value": "output" } // Selects the column `output`
          ],
          "measurement": "specialCharactersRatio",
          "operator": "<",
          "value": 0.01
        }
      ],
      "subpopulationFilters": null,
      "mode": "development",
      "usesValidationDataset": true, // Apply test to the validation set
      "usesTrainingDataset": false,
      "usesMlModel": false,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    }
  ]
  ```

  ```json Monitoring theme={null}
  [
    {
      "name": "No more than 1% of outputs with only special characters",
      "description": "Asserts that the percentage of rows with only special characters is less than 1%",
      "type": "integrity",
      "subtype": "specialCharactersRatio",
      "thresholds": [
        {
          "insightName": "specialCharacters",
          "insightParameters": [
            { "name": "column_name", "value": "output" } // Selects the column `output`
          ],
          "measurement": "specialCharactersRatio",
          "operator": "<",
          "value": 0.01
        }
      ],
      "subpopulationFilters": null,
      "mode": "monitoring",
      "usesProductionData": true,
      "evaluationWindow": 3600, // 1 hour
      "delayWindow": 0,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    }
  ]
  ```
</CodeGroup>

## Related

* [Ill-formed rows test](/tests/integrity/ill-formed-count).
