Definition
The F1 score test measures the harmonic mean of precision and recall, calculated as:Taxonomy
- Task types: Tabular classification, text classification.
- Availability: and .
Why it matters
- F1 score provides a balanced measure that considers both precision and recall, making it ideal when you need to balance false positives and false negatives.
- It’s particularly useful for imbalanced datasets where accuracy alone might be misleading.
- Higher F1 scores indicate better model performance, with 1.0 representing perfect precision and recall.
- F1 score is especially valuable when the cost of false positives and false negatives is roughly equal.
Required columns
To compute this metric, your dataset must contain the following columns:- Predictions: The predicted class labels from your classification model
- Ground truths: The actual/true class labels
Test configuration examples
If you are writing atests.json
, here are a few valid configurations for the F1 score test:
Related
- Precision test - Measure positive prediction accuracy.
- Recall test - Measure ability to find all positive instances.
- Accuracy test - Overall classification correctness.
- Geometric mean test - Alternative balanced metric.
- Aggregate metrics - Overview of all available metrics.