Definition

The semantic similarity test assesses the similarity in meaning between sentences, by measuring their closeness in semantic space using advanced natural language processing techniques.

Taxonomy

  • Task types: LLM.
  • Availability: and .

Why it matters

  • Semantic similarity captures the meaning-based relationship between generated and reference text, going beyond surface-level string matching.
  • This metric is particularly valuable when different phrasings can convey the same meaning, making it ideal for tasks like paraphrasing, summarization, or question answering.
  • It provides a more nuanced evaluation than exact matching by considering the conceptual similarity rather than just textual similarity.

Required columns

To compute this metric, your dataset must contain the following columns:
  • Outputs: The generated text from your LLM
  • Ground truths: The reference/expected text to compare against

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the semantic similarity test:
[
  {
    "name": "Mean semantic similarity above 0.8",
    "description": "Ensure that the mean semantic similarity score is above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "meanSemanticSimilarity",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]