Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions docs/rfc/0005-native-image-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ supported_formats() -> tuple[str, ...]
* `Image.from_file`/`from_bytes` lesen eingebettete Metadaten über `imagemeta.extract`
zurück (Pillow-frei) und mergen sie (`update_from_usermetadata`).
* `Image.embedded_metadata()` liefert die eingebettete `Metadata` (oder `None`).
* **Sidecar-Fallback (gleiche API):** Formate **ohne** nativen Träger (z. B. BMP)
werden via Pillow geschrieben und die Metadaten in einem verlustfreien
`<filepath>.meta.json`-Sidecar abgelegt (gleiche Nutzlast wie eingebettet);
`from_file` merged einen vorhandenen Sidecar. `save(sidecar=True|False|None)` steuert
die Policy (immer / nie / nur ohne nativen Träger).

## 4. Designentscheidungen

Expand All @@ -106,8 +111,11 @@ supported_formats() -> tuple[str, ...]
* **Hash/Identität.** Das Einbetten verändert die Datei-Bytes (und damit deren Hash).
Wer einen stabilen Inhalts-Hash braucht, hasht **vor** dem Einbetten oder die reinen
Pixel — analog zum Daten-vs-Metadaten-Hash bei `DataFrame` (RFC 0004).
* **Sidecar bleibt komplementär.** Für Formate ohne Handler (oder bewusst externe
Metadaten) bleibt der JSON-LD-Sidecar (`semantic.write_sidecar`) verfügbar.
* **Sidecar als Fallback und Komplement.** Formate ohne nativen Träger werden über
einen automatischen, verlustfreien `<filepath>.meta.json`-Sidecar einheitlich
abgedeckt (gleiche Nutzlast wie eingebettet). Für maschinenlesbare Linked Data
bleibt zusätzlich der JSON-LD-Sidecar (`semantic.write_sidecar`) verfügbar; beide
teilen dasselbe Metadaten-Modell.

## 5. Tests / Coverage

Expand All @@ -128,7 +136,8 @@ supported_formats() -> tuple[str, ...]

## 7. Offene Punkte / Zukunft

* Weitere Container über die Registry: **BMP** (kein nativer Träger → Sidecar),
**BigTIFF** (Magic 43, 8-byte-Offsets).
* Weitere **native** Träger über die Registry, wo ein Container sie bietet, z. B.
**BigTIFF** (Magic 43, 8-byte-Offsets). Formate ohne nativen Träger (BMP, …) sind
bereits über den Sidecar-Fallback abgedeckt.
* Optional: WebP **VP8X+XMP** für strikte Interop; PNG **`zTXt`** (komprimiert) für sehr
große Nutzlasten; JPEG **Multi-Segment-APP1** jenseits 64 KiB.
31 changes: 26 additions & 5 deletions docs/usage/image-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ The embedding layer [`sdata.imagemeta`][sdata.imagemeta] is **pure Python**
— **no Pillow** to read or write the metadata. Pillow is only used to *decode* pixels
(`img.pil` / `img.to_numpy`) or to *transcode* between formats on `save`.

Any **other** Pillow-writable format without a native metadata container (e.g. BMP)
is handled through the **same API**: `save` writes a lossless
`<filepath>.meta.json` sidecar and `from_file` reads it back — so metadata is never
lost regardless of the container.

| Format | Native carrier of the sdata payload | Marker |
| ------ | ------------------------------------------ | --------------- |
| PNG | `iTXt` chunk before `IEND` | keyword `sdata` |
Expand Down Expand Up @@ -108,12 +113,28 @@ imagemeta.supported_formats() # ('png', 'jpeg', 'jp2', 'gif', 'webp', 'ti
* **Extensible registry:** further containers (e.g. BMP, BigTIFF) plug in as two
small functions plus one registry entry.

## When to use a sidecar instead
## Sidecars

For a container **without** a native metadata slot, `save` automatically writes a
lossless `<filepath>.meta.json` sidecar (same payload as the embedded form), and
`from_file` merges it back — the API is identical to the embedded case:

```python
img = Image.from_bytes("scan.bmp", bmp_bytes)
img.metadata.add("station", "lab-3")
img.save("scan.bmp") # writes scan.bmp + scan.bmp.meta.json
Image.from_file("scan.bmp").metadata.get("station").value # 'lab-3'
```

The sidecar policy is controllable: `save(..., sidecar=True)` always writes one
(in addition to embedding, e.g. for tooling that only reads sidecars),
`sidecar=False` never does, and the default (`None`) writes one only when the format
has no native container.

For containers without a native handler, or when metadata must stay external (e.g.
read-only originals), the JSON-LD **sidecar** remains the complement — see
[Machine-readable metadata](metadata-jsonld.md). Both approaches share the same
metadata model; embedding and a sidecar are not mutually exclusive.
When metadata must stay external (read-only originals) or machine-readable as Linked
Data, the JSON-LD **sidecar** remains the complement — see
[Machine-readable metadata](metadata-jsonld.md). Embedding and sidecars share the
same metadata model and are not mutually exclusive.

The design and the per-format details are specified in
[RFC 0005 — Native image metadata](../rfc/0005-native-image-metadata.md).
83 changes: 65 additions & 18 deletions sdata/sclass/image.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@

Der Bild-Inhalt liegt als Blob-Content (``uri`` für Dateien, ``bytes`` für
In-Memory-Daten). sdata-Metadaten werden **format-übergreifend nativ** in die
Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP) — über :mod:`sdata.imagemeta`, das
ohne Pillow auskommt. Pillow wird nur lazy zum **Dekodieren/Transkodieren** der
Pixel genutzt (:attr:`Image.pil`/:meth:`Image.to_numpy`/:meth:`Image.save` bei
Formatwechsel) und ist optional (``pip install pillow``).
Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP/TIFF) — über :mod:`sdata.imagemeta`,
das ohne Pillow auskommt. Formate **ohne** nativen Metadaten-Träger (z. B. BMP)
erhalten einen verlustfreien ``<filepath>.meta.json``-**Sidecar**; die ``save``/
``from_file``-API ist für alle Formate identisch. Pillow wird nur lazy zum
**Dekodieren/Transkodieren** der Pixel genutzt (:attr:`Image.pil`/
:meth:`Image.to_numpy`/:meth:`Image.save` bei Formatwechsel) und ist optional
(``pip install pillow``).
"""
import io
import os
Expand Down Expand Up @@ -48,8 +51,9 @@ class Image(Blob):
def from_file(cls, filepath, project=None, ns_name=None, **kwargs):
"""Create an Image referencing an image file (kept as ``uri`` content).

Any sdata metadata embedded in the file (PNG/JPEG/JP2/GIF/WebP) is read
back and merged — independent of Pillow.
Any sdata metadata is read back and merged: natively embedded
(PNG/JPEG/JP2/GIF/WebP/TIFF, Pillow-free) and/or from an adjacent
``<filepath>.meta.json`` sidecar (for formats without a native container).

:param filepath: path to the image file.
:param project: namespace for the deterministic SUUID (alias of ``ns_name``).
Expand All @@ -62,6 +66,7 @@ def from_file(cls, filepath, project=None, ns_name=None, **kwargs):
img = cls(content_type="uri", value=filepath, filetype=suffix,
name=os.path.basename(filepath), suuid=suuid, **kwargs)
img._load_embedded_metadata()
img._load_sidecar_metadata(filepath)
return img

@classmethod
Expand Down Expand Up @@ -125,31 +130,71 @@ def _load_embedded_metadata(self) -> None:
if embedded is not None:
self.metadata.update_from_usermetadata(embedded)

def save(self, filepath, **kwargs):
"""Save the image to ``filepath``; sdata metadata is embedded natively.
@staticmethod
def sidecar_path(filepath) -> str:
"""Path of the metadata sidecar for ``filepath`` (``<filepath>.meta.json``)."""
return str(filepath) + ".meta.json"

The container is chosen from the file suffix. If the stored bytes already
use that container, the metadata is embedded **without** re-encoding
(lossless, Pillow-free); otherwise Pillow transcodes to the target format
first. Formats without a native handler are written via Pillow without
embedded metadata (a warning is logged).
def write_sidecar(self, filepath) -> str:
"""Write the sdata metadata next to ``filepath`` as a lossless JSON sidecar.

The sidecar carries the **same** payload as the embedded form
(``metadata.to_json()``), so a round-trip is lossless regardless of whether
the format has a native metadata container.

:param filepath: the image path the sidecar belongs to.
:return: the sidecar path (``<filepath>.meta.json``).
"""
sidecar = self.sidecar_path(filepath)
with open(sidecar, "w", encoding="utf-8") as fh:
fh.write(self.metadata.to_json())
logger.info(f"Image metadata sidecar written to {sidecar}")
return sidecar

def _load_sidecar_metadata(self, filepath) -> None:
"""Merge metadata from an adjacent ``<filepath>.meta.json`` sidecar, if present."""
sidecar = self.sidecar_path(filepath)
if not os.path.exists(sidecar):
return
try:
with open(sidecar, "r", encoding="utf-8") as fh:
md = Metadata.from_json(fh.read())
self.metadata.update_from_usermetadata(md)
logger.debug(f"merged sidecar metadata from {sidecar}")
except Exception as exp:
logger.debug(f"sidecar metadata not loadable: {exp}")

def save(self, filepath, sidecar=None, **kwargs):
"""Save the image to ``filepath`` with sdata metadata — one API for all formats.

The container is chosen from the file suffix. For a format with a native
metadata container (PNG/JPEG/JP2/GIF/WebP/TIFF) the metadata is **embedded**:
without re-encoding if the stored bytes already use that container (lossless,
Pillow-free), otherwise Pillow transcodes first. For **any other** format
(e.g. BMP) the image is written via Pillow and the metadata travels in a
lossless ``<filepath>.meta.json`` **sidecar** — so metadata is never lost.

:param filepath: destination path (its suffix selects the format).
:param sidecar: sidecar policy — ``None`` (default) writes a sidecar only when
the format has no native container; ``True`` always writes one (in addition
to embedding); ``False`` never writes one.
:param kwargs: forwarded to ``PIL.Image.save`` when transcoding.
:raises ImportError: if Pillow is required (transcode / unsupported format)
but not installed.
:raises ImportError: if Pillow is required (transcode / non-native format) but
not installed.
:return: the destination ``filepath``.
"""
suffix = Path(filepath).suffix.lstrip(".").lower()
target = self._SUFFIX_FORMATS.get(suffix)

if target is None:
# Kein nativer Handler: Pillow schreiben lassen, ohne Einbettung.
# Kein nativer Metadaten-Träger → Pillow schreiben + verlustfreier Sidecar.
if PIL is None:
raise ImportError("Pillow is required for Image.save (pip install pillow).")
self.pil.save(filepath, **kwargs)
logger.warning("Image.save: no native sdata-metadata handler for "
"'%s'; saved without embedded metadata", filepath)
if sidecar is not False:
self.write_sidecar(filepath)
logger.info("Image.save: '%s' has no native metadata container; "
"metadata written to a sidecar", filepath)
return filepath

pil_format, meta_fmt = target
Expand All @@ -167,5 +212,7 @@ def save(self, filepath, **kwargs):
img_bytes = imagemeta.embed(img_bytes, self.metadata.to_json(), meta_fmt)
with open(filepath, "wb") as fh:
fh.write(img_bytes)
if sidecar is True:
self.write_sidecar(filepath)
logger.info(f"Image saved to {filepath}")
return filepath
37 changes: 37 additions & 0 deletions tests/test_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,43 @@
assert reloaded.embedded_metadata() is not None


def test_image_sidecar_fallback_for_bmp(tmp_path):
"""Formate ohne nativen Träger (BMP) bekommen einen verlustfreien Sidecar."""
import io
import os
import PIL.Image

buf = io.BytesIO()
PIL.Image.new("RGB", (5, 4), (7, 8, 9)).save(buf, "BMP")
img = Image.from_bytes("pic.bmp", buf.getvalue())
img.metadata.add("station", "lab-3")

out = str(tmp_path / "out.bmp")
img.save(out) # gleiche API; schreibt Sidecar
assert os.path.exists(out + ".meta.json") # Sidecar liegt daneben

Check warning on line 104 in tests/test_image.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_image.py#L104

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

reloaded = Image.from_file(out)
assert reloaded.metadata.get("station").value == "lab-3"

Check warning on line 107 in tests/test_image.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_image.py#L107

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
assert reloaded.embedded_metadata() is None # BMP trägt nichts eingebettet

Check warning on line 108 in tests/test_image.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_image.py#L108

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


def test_image_sidecar_opt_in_for_native_format(tmp_path):
"""sidecar=True schreibt zusätzlich einen Sidecar auch bei nativen Formaten."""
import io
import os
import PIL.Image

buf = io.BytesIO()
PIL.Image.new("RGB", (4, 4), (1, 2, 3)).save(buf, "PNG")
img = Image.from_bytes("pic.png", buf.getvalue())
img.metadata.add("k", "v")

out = str(tmp_path / "out.png")
img.save(out, sidecar=True)
assert os.path.exists(out + ".meta.json") # Sidecar zusätzlich zur Einbettung

Check warning on line 124 in tests/test_image.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_image.py#L124

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
assert Image.from_file(out).metadata.get("k").value == "v"

Check warning on line 125 in tests/test_image.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_image.py#L125

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


def test_image_save_transcodes_between_formats(tmp_path):
"""save() in ein anderes Format transkodiert via Pillow und bettet ein."""
import io
Expand Down