Integrity
Duplicate rows
![](https://mintlify.s3-us-west-1.amazonaws.com/openlayer-44/images/tests/integrity/integrity_icon.png)
Definition
The duplicate rows test checks if there are rows that are identical to each other in the dataset.
Taxonomy
- Category: Integrity.
- Task types: LLM, tabular classification, tabular regression, text classification.
- Availability: and .
Why it matters
- Duplicate rows on the training set can lead the model to overfit on the duplicated examples.
- Duplicate rows on the validation set can distort the aggregate metrics, making them overly optimistic or pessimistic.
Was this page helpful?