diff --git a/docs/rfc/0005-native-image-metadata.md b/docs/rfc/0005-native-image-metadata.md index a05b658..02cfa0a 100644 --- a/docs/rfc/0005-native-image-metadata.md +++ b/docs/rfc/0005-native-image-metadata.md @@ -94,6 +94,11 @@ supported_formats() -> tuple[str, ...] * `Image.from_file`/`from_bytes` lesen eingebettete Metadaten über `imagemeta.extract` zurück (Pillow-frei) und mergen sie (`update_from_usermetadata`). * `Image.embedded_metadata()` liefert die eingebettete `Metadata` (oder `None`). +* **Sidecar-Fallback (gleiche API):** Formate **ohne** nativen Träger (z. B. BMP) + werden via Pillow geschrieben und die Metadaten in einem verlustfreien + `.meta.json`-Sidecar abgelegt (gleiche Nutzlast wie eingebettet); + `from_file` merged einen vorhandenen Sidecar. `save(sidecar=True|False|None)` steuert + die Policy (immer / nie / nur ohne nativen Träger). ## 4. Designentscheidungen @@ -106,8 +111,11 @@ supported_formats() -> tuple[str, ...] * **Hash/Identität.** Das Einbetten verändert die Datei-Bytes (und damit deren Hash). Wer einen stabilen Inhalts-Hash braucht, hasht **vor** dem Einbetten oder die reinen Pixel — analog zum Daten-vs-Metadaten-Hash bei `DataFrame` (RFC 0004). -* **Sidecar bleibt komplementär.** Für Formate ohne Handler (oder bewusst externe - Metadaten) bleibt der JSON-LD-Sidecar (`semantic.write_sidecar`) verfügbar. +* **Sidecar als Fallback und Komplement.** Formate ohne nativen Träger werden über + einen automatischen, verlustfreien `.meta.json`-Sidecar einheitlich + abgedeckt (gleiche Nutzlast wie eingebettet). Für maschinenlesbare Linked Data + bleibt zusätzlich der JSON-LD-Sidecar (`semantic.write_sidecar`) verfügbar; beide + teilen dasselbe Metadaten-Modell. ## 5. Tests / Coverage @@ -128,7 +136,8 @@ supported_formats() -> tuple[str, ...] ## 7. Offene Punkte / Zukunft -* Weitere Container über die Registry: **BMP** (kein nativer Träger → Sidecar), - **BigTIFF** (Magic 43, 8-byte-Offsets). +* Weitere **native** Träger über die Registry, wo ein Container sie bietet, z. B. + **BigTIFF** (Magic 43, 8-byte-Offsets). Formate ohne nativen Träger (BMP, …) sind + bereits über den Sidecar-Fallback abgedeckt. * Optional: WebP **VP8X+XMP** für strikte Interop; PNG **`zTXt`** (komprimiert) für sehr große Nutzlasten; JPEG **Multi-Segment-APP1** jenseits 64 KiB. diff --git a/docs/usage/image-metadata.md b/docs/usage/image-metadata.md index 07cf952..912fc19 100644 --- a/docs/usage/image-metadata.md +++ b/docs/usage/image-metadata.md @@ -10,6 +10,11 @@ The embedding layer [`sdata.imagemeta`][sdata.imagemeta] is **pure Python** — **no Pillow** to read or write the metadata. Pillow is only used to *decode* pixels (`img.pil` / `img.to_numpy`) or to *transcode* between formats on `save`. +Any **other** Pillow-writable format without a native metadata container (e.g. BMP) +is handled through the **same API**: `save` writes a lossless +`.meta.json` sidecar and `from_file` reads it back — so metadata is never +lost regardless of the container. + | Format | Native carrier of the sdata payload | Marker | | ------ | ------------------------------------------ | --------------- | | PNG | `iTXt` chunk before `IEND` | keyword `sdata` | @@ -108,12 +113,28 @@ imagemeta.supported_formats() # ('png', 'jpeg', 'jp2', 'gif', 'webp', 'ti * **Extensible registry:** further containers (e.g. BMP, BigTIFF) plug in as two small functions plus one registry entry. -## When to use a sidecar instead +## Sidecars + +For a container **without** a native metadata slot, `save` automatically writes a +lossless `.meta.json` sidecar (same payload as the embedded form), and +`from_file` merges it back — the API is identical to the embedded case: + +```python +img = Image.from_bytes("scan.bmp", bmp_bytes) +img.metadata.add("station", "lab-3") +img.save("scan.bmp") # writes scan.bmp + scan.bmp.meta.json +Image.from_file("scan.bmp").metadata.get("station").value # 'lab-3' +``` + +The sidecar policy is controllable: `save(..., sidecar=True)` always writes one +(in addition to embedding, e.g. for tooling that only reads sidecars), +`sidecar=False` never does, and the default (`None`) writes one only when the format +has no native container. -For containers without a native handler, or when metadata must stay external (e.g. -read-only originals), the JSON-LD **sidecar** remains the complement — see -[Machine-readable metadata](metadata-jsonld.md). Both approaches share the same -metadata model; embedding and a sidecar are not mutually exclusive. +When metadata must stay external (read-only originals) or machine-readable as Linked +Data, the JSON-LD **sidecar** remains the complement — see +[Machine-readable metadata](metadata-jsonld.md). Embedding and sidecars share the +same metadata model and are not mutually exclusive. The design and the per-format details are specified in [RFC 0005 — Native image metadata](../rfc/0005-native-image-metadata.md). diff --git a/sdata/sclass/image.py b/sdata/sclass/image.py index 45c2f6e..ec4f4ef 100644 --- a/sdata/sclass/image.py +++ b/sdata/sclass/image.py @@ -3,10 +3,13 @@ Der Bild-Inhalt liegt als Blob-Content (``uri`` für Dateien, ``bytes`` für In-Memory-Daten). sdata-Metadaten werden **format-übergreifend nativ** in die -Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP) — über :mod:`sdata.imagemeta`, das -ohne Pillow auskommt. Pillow wird nur lazy zum **Dekodieren/Transkodieren** der -Pixel genutzt (:attr:`Image.pil`/:meth:`Image.to_numpy`/:meth:`Image.save` bei -Formatwechsel) und ist optional (``pip install pillow``). +Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP/TIFF) — über :mod:`sdata.imagemeta`, +das ohne Pillow auskommt. Formate **ohne** nativen Metadaten-Träger (z. B. BMP) +erhalten einen verlustfreien ``.meta.json``-**Sidecar**; die ``save``/ +``from_file``-API ist für alle Formate identisch. Pillow wird nur lazy zum +**Dekodieren/Transkodieren** der Pixel genutzt (:attr:`Image.pil`/ +:meth:`Image.to_numpy`/:meth:`Image.save` bei Formatwechsel) und ist optional +(``pip install pillow``). """ import io import os @@ -48,8 +51,9 @@ class Image(Blob): def from_file(cls, filepath, project=None, ns_name=None, **kwargs): """Create an Image referencing an image file (kept as ``uri`` content). - Any sdata metadata embedded in the file (PNG/JPEG/JP2/GIF/WebP) is read - back and merged — independent of Pillow. + Any sdata metadata is read back and merged: natively embedded + (PNG/JPEG/JP2/GIF/WebP/TIFF, Pillow-free) and/or from an adjacent + ``.meta.json`` sidecar (for formats without a native container). :param filepath: path to the image file. :param project: namespace for the deterministic SUUID (alias of ``ns_name``). @@ -62,6 +66,7 @@ def from_file(cls, filepath, project=None, ns_name=None, **kwargs): img = cls(content_type="uri", value=filepath, filetype=suffix, name=os.path.basename(filepath), suuid=suuid, **kwargs) img._load_embedded_metadata() + img._load_sidecar_metadata(filepath) return img @classmethod @@ -125,31 +130,71 @@ def _load_embedded_metadata(self) -> None: if embedded is not None: self.metadata.update_from_usermetadata(embedded) - def save(self, filepath, **kwargs): - """Save the image to ``filepath``; sdata metadata is embedded natively. + @staticmethod + def sidecar_path(filepath) -> str: + """Path of the metadata sidecar for ``filepath`` (``.meta.json``).""" + return str(filepath) + ".meta.json" - The container is chosen from the file suffix. If the stored bytes already - use that container, the metadata is embedded **without** re-encoding - (lossless, Pillow-free); otherwise Pillow transcodes to the target format - first. Formats without a native handler are written via Pillow without - embedded metadata (a warning is logged). + def write_sidecar(self, filepath) -> str: + """Write the sdata metadata next to ``filepath`` as a lossless JSON sidecar. + + The sidecar carries the **same** payload as the embedded form + (``metadata.to_json()``), so a round-trip is lossless regardless of whether + the format has a native metadata container. + + :param filepath: the image path the sidecar belongs to. + :return: the sidecar path (``.meta.json``). + """ + sidecar = self.sidecar_path(filepath) + with open(sidecar, "w", encoding="utf-8") as fh: + fh.write(self.metadata.to_json()) + logger.info(f"Image metadata sidecar written to {sidecar}") + return sidecar + + def _load_sidecar_metadata(self, filepath) -> None: + """Merge metadata from an adjacent ``.meta.json`` sidecar, if present.""" + sidecar = self.sidecar_path(filepath) + if not os.path.exists(sidecar): + return + try: + with open(sidecar, "r", encoding="utf-8") as fh: + md = Metadata.from_json(fh.read()) + self.metadata.update_from_usermetadata(md) + logger.debug(f"merged sidecar metadata from {sidecar}") + except Exception as exp: + logger.debug(f"sidecar metadata not loadable: {exp}") + + def save(self, filepath, sidecar=None, **kwargs): + """Save the image to ``filepath`` with sdata metadata — one API for all formats. + + The container is chosen from the file suffix. For a format with a native + metadata container (PNG/JPEG/JP2/GIF/WebP/TIFF) the metadata is **embedded**: + without re-encoding if the stored bytes already use that container (lossless, + Pillow-free), otherwise Pillow transcodes first. For **any other** format + (e.g. BMP) the image is written via Pillow and the metadata travels in a + lossless ``.meta.json`` **sidecar** — so metadata is never lost. :param filepath: destination path (its suffix selects the format). + :param sidecar: sidecar policy — ``None`` (default) writes a sidecar only when + the format has no native container; ``True`` always writes one (in addition + to embedding); ``False`` never writes one. :param kwargs: forwarded to ``PIL.Image.save`` when transcoding. - :raises ImportError: if Pillow is required (transcode / unsupported format) - but not installed. + :raises ImportError: if Pillow is required (transcode / non-native format) but + not installed. :return: the destination ``filepath``. """ suffix = Path(filepath).suffix.lstrip(".").lower() target = self._SUFFIX_FORMATS.get(suffix) if target is None: - # Kein nativer Handler: Pillow schreiben lassen, ohne Einbettung. + # Kein nativer Metadaten-Träger → Pillow schreiben + verlustfreier Sidecar. if PIL is None: raise ImportError("Pillow is required for Image.save (pip install pillow).") self.pil.save(filepath, **kwargs) - logger.warning("Image.save: no native sdata-metadata handler for " - "'%s'; saved without embedded metadata", filepath) + if sidecar is not False: + self.write_sidecar(filepath) + logger.info("Image.save: '%s' has no native metadata container; " + "metadata written to a sidecar", filepath) return filepath pil_format, meta_fmt = target @@ -167,5 +212,7 @@ def save(self, filepath, **kwargs): img_bytes = imagemeta.embed(img_bytes, self.metadata.to_json(), meta_fmt) with open(filepath, "wb") as fh: fh.write(img_bytes) + if sidecar is True: + self.write_sidecar(filepath) logger.info(f"Image saved to {filepath}") return filepath diff --git a/tests/test_image.py b/tests/test_image.py index f1f08e7..c6ee704 100644 --- a/tests/test_image.py +++ b/tests/test_image.py @@ -88,6 +88,43 @@ def test_image_metadata_roundtrip_all_formats(tmp_path, ext, pil_format, kwargs) assert reloaded.embedded_metadata() is not None +def test_image_sidecar_fallback_for_bmp(tmp_path): + """Formate ohne nativen Träger (BMP) bekommen einen verlustfreien Sidecar.""" + import io + import os + import PIL.Image + + buf = io.BytesIO() + PIL.Image.new("RGB", (5, 4), (7, 8, 9)).save(buf, "BMP") + img = Image.from_bytes("pic.bmp", buf.getvalue()) + img.metadata.add("station", "lab-3") + + out = str(tmp_path / "out.bmp") + img.save(out) # gleiche API; schreibt Sidecar + assert os.path.exists(out + ".meta.json") # Sidecar liegt daneben + + reloaded = Image.from_file(out) + assert reloaded.metadata.get("station").value == "lab-3" + assert reloaded.embedded_metadata() is None # BMP trägt nichts eingebettet + + +def test_image_sidecar_opt_in_for_native_format(tmp_path): + """sidecar=True schreibt zusätzlich einen Sidecar auch bei nativen Formaten.""" + import io + import os + import PIL.Image + + buf = io.BytesIO() + PIL.Image.new("RGB", (4, 4), (1, 2, 3)).save(buf, "PNG") + img = Image.from_bytes("pic.png", buf.getvalue()) + img.metadata.add("k", "v") + + out = str(tmp_path / "out.png") + img.save(out, sidecar=True) + assert os.path.exists(out + ".meta.json") # Sidecar zusätzlich zur Einbettung + assert Image.from_file(out).metadata.get("k").value == "v" + + def test_image_save_transcodes_between_formats(tmp_path): """save() in ein anderes Format transkodiert via Pillow und bettet ein.""" import io