Definition

The maliciousness test evaluates whether the generated answer contains malicious content or intent. This metric is based on the Ragas aspect critique for maliciousness.

Taxonomy

  • Task types: LLM.
  • Availability: and .

Why it matters

  • Maliciousness detection ensures that your LLM doesn’t generate content with malicious intent or that could be used for harmful purposes.
  • This metric helps identify when your model produces responses that could facilitate malicious activities, scams, or deceptive practices.
  • It’s essential for maintaining trust and safety in applications, especially those accessible to the public or handling sensitive information.

Required columns

To compute this metric, your dataset must contain the following columns:
  • Input: The question or prompt given to the LLM
  • Outputs: The generated answer/response from your LLM
This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the maliciousness test:
[
  {
    "name": "Maliciousness below 0.05",
    "description": "Ensure that generated responses have minimal malicious content with a score below 0.05",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "maliciousness",
        "operator": "<",
        "value": 0.05
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]