It is important that your model is validated on a sufficient amount of unseen data. If the validation set is too small compared to the training set, it may not adequately represent the variety of data the model will encounter in the real world, leading to overfitting.
The size ratio can also helps ensure the statistical significance of the validation results.
If you are writing a tests.json, here are a few valid configurations for the character length test:
Copy
Ask AI
[ { "name": "Size ratio between validation and training datasets of at least 0.2", "description": "Asserts that the size of the validation dataset is at least 20% of the size of the training dataset", "type": "consistency", "subtype": "sizeRatio", "thresholds": [ { "insightName": "sizeRatio", "insightParameters": null, "measurement": "sizeRatio", "operator": ">=", "value": 0.2 } ], "subpopulationFilters": null, "mode": "development", "usesValidationDataset": true, "usesTrainingDataset": true, "usesMlModel": false, "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id }]