Library built by SEA.AI to help measure and improve the performance of AI projects.
Install
pip install git+https://github.com/SEA-AI/seametricsIf you want to test a specific branch
pip install git+https://github.com/SEA-AI/seametrics@branch-nameIf you want to install additional dependencies.
pip install "seametrics[fiftyone] @ git+https://github.com/SEA-AI/seametrics"For more information about the optional dependencies have a look at the
[project.optional-dependencies]section of thepyproject.toml.
Hugging Face
Have a look at our Hugging Face organisation to browse through the available metrics.
PrecisionRecallF1Support
Basically a modified cocoeval.py wrapped inside torchmetrics' mAP metric but with numpy arrays instead of torch tensors.
import numpy as np
from seametrics.detection import PrecisionRecallF1Support
predictions = [
{
"boxes": np.array(
[
[449.3, 197.75390625, 6.25, 7.03125],
[334.3, 181.58203125, 11.5625, 6.85546875],
]
),
"labels": np.array([0, 0]),
"scores": np.array([0.153076171875, 0.72314453125]),
}
]
ground_truth = [
{
"boxes": np.array(
[
[449.3, 197.75390625, 6.25, 7.03125],
[334.3, 181.58203125, 11.5625, 6.85546875],
]
),
"labels": np.array([0, 0]),
"area": np.array([132.2, 83.8]),
}
]
metric = PrecisionRecallF1Support() # default settings
metric.update(preds=predictions, target=ground_truth)
metric.compute()['metrics']Will output:
{'all': {'range': [0, 10000000000.0],
'iouThr': '0.50',
'maxDets': 100,
'tp': 0,
'fp': 2,
'fn': 2,
'duplicates': 0,
'precision': 0.0,
'recall': 0.0,
'f1': 0,
'support': 2,
'fpi': 0,
'nImgs': 1}}Where:
allis the area range labelrangeis the area rangeiouThris the IoU threshold in string formatmaxDetsis the maximum number of detectionstp,fp,fnare the true positives, false positives and false negativesduplicatesis the number of duplicates, a duplicate is a prediction that matches an already matched ground truth.precision,recall,f1are ... well, the precision, recall and f1 scoresupportis the number of ground truth boxesfpiis the false positive indexnImgsis the number of images
Tracking Metrics
TrackingMetrics wraps motmetrics to compute standard MOT scores (MOTA, MOTP, IDF1, …). HOTAMetrics implements HOTA (Higher Order Tracking Accuracy), which jointly evaluates detection and association quality.
Both classes share the same interface and can be evaluated together in a single dataset pass using compute_all_metrics_by_sequence.
import fiftyone as fo
from seametrics.tracking import TrackingMetrics, HOTAMetrics
from seametrics.tracking.utils import compute_all_metrics_by_sequence, results_to_df
dataset = fo.load_dataset("my_dataset")
view = dataset.load_saved_view("my_view")
results = compute_all_metrics_by_sequence(
view=view,
gt_field="ground_truth",
pred_fields=["model_a", "model_b"],
metrics=[
(TrackingMetrics, {"max_iou": 0.5}),
(HOTAMetrics, {}),
],
)Returns a nested dict {pred_field: {metric_class_name: metric_instance}}. Convert any entry to a per-sequence DataFrame with results_to_df:
mot_df = results_to_df(results["model_a"]["TrackingMetrics"])
hota_df = results_to_df(results["model_a"]["HOTAMetrics"])TrackingMetrics DataFrame columns: sequence, num_frames, num_unique_objects, mota, motp, idf1, idp, idr, mostly_tracked, partially_tracked, mostly_lost, num_switches, num_false_positives, num_misses, num_fragmentations, precision, recall.
HOTAMetrics DataFrame columns: sequence, hota, deta, assa, loca, num_unique_objects. Scores are expressed as percentages (0–100).
num_unique_objects is included in both DataFrames so you can compute a track-count-weighted global score:
weighted_hota = (
(hota_df["hota"] * hota_df["num_unique_objects"]).sum()
/ hota_df["num_unique_objects"].sum()
)Failed sequences (empty GT, empty predictions, or unexpected errors) are logged rather than raising, and are accessible via metric_instance.failed_sequences.
