Skip to content

Releases: scaleapi/nucleus-python-client

Add async dataset deduplication support

06 May 20:09
3321deb

Choose a tag to compare

Summary

This release adds async deduplication support for Nucleus datasets.

Users can now start a background deduplication job with Dataset.deduplicate() or Dataset.deduplicate_by_ids(), then collect structured results via DeduplicationJob.result(). The result includes kept dataset item IDs, kept reference IDs, and deduplication stats such as threshold, original item count, and deduplicated item count.

The SDK now also validates completed deduplication job responses and fails clearly if the server returns a malformed result payload, avoiding misleading fallback stats.

Enforce mutually exclusive auth connections

16 Apr 19:48
93cb518

Choose a tag to compare

Changed

  • api_key and limited_access_key are now mutually exclusive in NucleusClient. Passing both (or setting NUCLEUS_API_KEY while also passing limited_access_key) raises a ValueError.

Fixed

  • Docstring improvements across NucleusClient: fixed copy-paste errors (get_job, get_slice, delete_slice), removed phantom stats_only parameter from list_jobs, corrected make_request parameter name, and restructured create_launch_model/create_launch_model_from_dir docs for proper rendering.
  • Suppressed Sphinx warnings from inherited pydantic BaseModel methods by removing inherited-members from autoapi options.

Fix Sphinx builds and prune deprecated packages Latest

06 Mar 16:48
400dfd8

Choose a tag to compare

Removed the deprecated pkg_resources package and replaced it with importlib-metadata. pkg_resources package was causing 2 main problems:

  1. Preventing the successful build of the new auto generated sdk docs
  2. Whenever the SDK threw an error back to the user, they also got this confusing error: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30.

Additionally, this update resolves ~79 errors/warnings in sphinx auto doc build errors.

v0.17.12

02 Mar 20:09
878ca05

Choose a tag to compare

Add dataset-scoped image deduplication support.

Supports deduplication on image and video datasets for an entire dataset, select reference ids, or select dataset item ids.

Example usage:

dataset = client.get_dataset("ds_...")

# Deduplicate entire dataset
result = dataset.deduplicate(threshold=10)

# Deduplicate specific items by reference IDs
result = dataset.deduplicate(threshold=10, reference_ids=["ref_1", "ref_2", "ref_3"])

# Deduplicate by internal item IDs (more efficient if you have them)
result = dataset.deduplicate_by_ids(threshold=10, dataset_item_ids=["item_1", "item_2"])

# Access results
print(f"Threshold: {result.stats.threshold}")
print(f"Original: {result.stats.original_count}, Unique: {result.stats.deduplicated_count}")
print(result.unique_reference_ids)

v0.17.11

17 Nov 15:43
671f475

Choose a tag to compare

Added support for limited access keys (to be used with or in substitute of api_keys for NucleusClient auth)

Example usage:

c = nucleus.NucleusClient(limited_access_key="<LIMITED_ACCESS_KEY>")
c = nucleus.NucleusClient(api_key="<API_KEY>", limited_access_key="<LIMITED_ACCESS_KEY>")
c = nucleus.NucleusClient(api_key="<API_KEY>")

v0.17.9

13 Mar 16:49
eb389ce

Choose a tag to compare

Added export_class_labels for slices and datasets.

Example usage

dataset = client.get_dataset(DATASET_ID)
class_labels = dataset.export_class_labels()
slice = dataset.get_slices()[0]
class_labels = slice.export_class_labels()

v0.17.5

15 Apr 16:36
f5d8b2d

Choose a tag to compare

Added

  • Method for uploading lidar semantic segmentation predictions, via dataset.upload_lidar_semseg_predictions

Example usage:

dataset = client.get_dataset("ds_...")
model = client.get_model("prj_...")
pointcloud_ref_id = 'pc_ref_1'
predictions_s3 = "s3://temp/predictions.json"

dataset.upload_lidar_semseg_predictions(model, pointcloud_ref_id, predictions_s3)

v0.17.3

29 Feb 16:26
4139951

Choose a tag to compare

Added

  • Added the environment variable S3_ENDPOINT to accommodate for nonstandard S3 Endpoint URLs when asking for presigned URLs

v0.17.2

29 Feb 10:14
db9d5f2

Choose a tag to compare

Modified

In Dataset.create_slice, the reference_ids parameter is now optional. If left unspecified, it will create an empty slice

v0.17.0

07 Feb 10:24
05435fc

Choose a tag to compare

Added

  • Added dataset.add_items_from_dir
  • Added pytest-xdist for test parallelization

Fixes

  • Fix test test_models.test_remove_invalid_tag_from_model