Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions docs/rfc/0005-native-image-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

| Feld | Wert |
|-------------|--------------------------------------------------------------|
| Status | Accepted — implementiert (PNG/JPEG/JP2/GIF/WebP) |
| Status | Accepted — implementiert (PNG/JPEG/JP2/GIF/WebP/TIFF) |
| Datum | 2026-06-29 |
| Autor | lepy <lepy@tuta.io> |
| Komponente | `sdata/imagemeta.py`, `sdata/sclass/image.py` |
| Betrifft | Einbettung von sdata-Metadaten direkt in Bilddateien |
| Validierung | `imagemeta.py` 100 %; Pillow-Round-Trips für 5 Formate |
| Validierung | `imagemeta.py` 100 %; Pillow-Round-Trips für 6 Formate |

> **Umsetzungsstand.** Implementiert. `sdata/imagemeta.py` bettet sdata-Metadaten
> **nativ und Pillow-frei** in PNG, JPEG, JP2, GIF und WebP ein; `Image` nutzt es
Expand All @@ -31,6 +31,7 @@ die Nutzlast in den **nativen** Metadaten-Träger des jeweiligen Containers:
| JP2 | `uuid`-Box (ISO BMFF) vor der `jp2c`-Codestream-Box| feste sdata-UUID |
| GIF | Comment-Extension hinter dem Header | `sdata\0`-Präfix |
| WebP | eigener RIFF-Chunk `sdAT` | FourCC `sdAT` |
| TIFF | privates IFD-Tag (65000), Original-Bytes unverändert| Tag `65000` |

## 2. Motivation

Expand Down Expand Up @@ -76,6 +77,13 @@ supported_formats() -> tuple[str, ...]
korrekt.
* **WebP** — RIFF-Container. Ein eigener Chunk `sdAT` wird angehängt und die RIFF-Größe
aktualisiert. Begründung der Wahl s. u.
* **TIFF** — offset-basierte IFDs. Statt fehleranfälliger Offset-Chirurgie bleiben die
**Original-Bytes unverändert** (alle bestehenden Offsets, inkl. `StripOffsets`, gültig):
eine **Kopie** der ersten IFD — ergänzt um ein privates Tag (65000) mit der Nutzlast —
wird ans Dateiende angehängt und der Header auf diese neue IFD umgelenkt. Little- und
Big-Endian (`II`/`MM`, klassisches TIFF/Magic 42); BigTIFF (Magic 43) ist nicht
abgedeckt. Erneutes Einbetten ersetzt logisch die Nutzlast; verwaiste Vorgänger-Bytes
bleiben ungenutzt im File (kein Re-Pack).

### 3.3 `Image`-Integration

Expand Down Expand Up @@ -107,8 +115,8 @@ supported_formats() -> tuple[str, ...]
`imagemeta.py` zu **100 %** ab — inkl. Replace-Semantik, fehlender Nutzlast, JPEG-
Standalone-/Non-FF-Marker, JP2-XLBox/`LBox==0`/malformed-Guard, GIF mit/ohne (Local)
Color Table und Nicht-Comment-Extensions, WebP-Padding. Zusätzlich Pillow-Round-Trips
über PNG/JPEG/JP2/GIF/WebP (Decodier-Integrität).
* `tests/test_image.py`: einheitliche `Image`-API über alle fünf Formate + Transkodierung.
über PNG/JPEG/JP2/GIF/WebP/TIFF (Decodier-Integrität).
* `tests/test_image.py`: einheitliche `Image`-API über alle sechs Formate + Transkodierung.

## 6. Kompatibilität / Migration

Expand All @@ -120,7 +128,7 @@ supported_formats() -> tuple[str, ...]

## 7. Offene Punkte / Zukunft

* Weitere Container über die Registry: **TIFF** (IFD-Tag), **BMP** (kein nativer Träger
→ Sidecar).
* Weitere Container über die Registry: **BMP** (kein nativer Träger → Sidecar),
**BigTIFF** (Magic 43, 8-byte-Offsets).
* Optional: WebP **VP8X+XMP** für strikte Interop; PNG **`zTXt`** (komprimiert) für sehr
große Nutzlasten; JPEG **Multi-Segment-APP1** jenseits 64 KiB.
11 changes: 6 additions & 5 deletions docs/usage/image-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

[`sdata.sclass.image.Image`][sdata.sclass.image.Image] is a
[`Blob`][sdata.sclass.blob.Blob] over image content. sdata can write its metadata
**natively into the image file** — and read it back — across **five containers with
one API**: PNG, JPEG, JPEG 2000 (`jp2`), GIF and WebP.
**natively into the image file** — and read it back — across **six containers with
one API**: PNG, JPEG, JPEG 2000 (`jp2`), GIF, WebP and TIFF.

The embedding layer [`sdata.imagemeta`][sdata.imagemeta] is **pure Python**
(standard library only): it needs no third-party tool (no `exiftool`) and — crucially
Expand All @@ -17,6 +17,7 @@ The embedding layer [`sdata.imagemeta`][sdata.imagemeta] is **pure Python**
| JP2 | `uuid` box (ISO BMFF) before `jp2c` | fixed sdata UUID|
| GIF | comment extension after the header | `sdata\0` prefix|
| WebP | dedicated RIFF chunk `sdAT` | FourCC `sdAT` |
| TIFF | private IFD tag (original bytes untouched) | tag `65000` |

```bash
pip install pillow # optional: only needed to decode/transcode pixels
Expand Down Expand Up @@ -93,7 +94,7 @@ from sdata import imagemeta
imagemeta.detect_format(data) # 'png' | 'jpeg' | 'jp2' | 'gif' | 'webp' | None
out = imagemeta.embed(data, '{"k": 1}') # format auto-detected; replace semantics
imagemeta.extract(out) # '{"k": 1}' (None if absent/unknown format)
imagemeta.supported_formats() # ('png', 'jpeg', 'jp2', 'gif', 'webp')
imagemeta.supported_formats() # ('png', 'jpeg', 'jp2', 'gif', 'webp', 'tiff')
```

* **Replace semantics:** embedding again **replaces** the previous sdata payload
Expand All @@ -104,8 +105,8 @@ imagemeta.supported_formats() # ('png', 'jpeg', 'jp2', 'gif', 'webp')
unsupported format and
[`PayloadTooLargeError`][sdata.imagemeta.PayloadTooLargeError] when a JPEG payload
exceeds the single-`APP1` limit (~64 KiB).
* **Extensible registry:** further containers (e.g. TIFF) plug in as two small
functions plus one registry entry.
* **Extensible registry:** further containers (e.g. BMP, BigTIFF) plug in as two
small functions plus one registry entry.

## When to use a sidecar instead

Expand Down
82 changes: 82 additions & 0 deletions sdata/imagemeta.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* **JP2** — ``uuid``-Box (JPEG 2000, ISO BMFF) mit fester sdata-UUID
* **GIF** — Comment-Extension mit Präfix ``sdata\\0``
* **WebP** — eigener RIFF-Chunk ``sdAT`` (von Decodern als unbekannt ignoriert)
* **TIFF** — privates IFD-Tag (65000); die Original-Bytes bleiben unverändert

Das Format wird an den Magic-Bytes erkannt (:func:`detect_format`); :func:`embed`
und :func:`extract` wählen den passenden Handler. Die Schreibsemantik ist
Expand Down Expand Up @@ -380,6 +381,84 @@
return None


# ====================================================================== TIFF
_TIFF_LE = b"II\x2a\x00" # little-endian, classic TIFF (magic 42)
_TIFF_BE = b"MM\x00\x2a" # big-endian, classic TIFF (magic 42)
#: privates TIFF-Tag (Bereich 32768–65535) für die sdata-Nutzlast.
_TIFF_TAG = 65000
_TIFF_TYPE_UNDEFINED = 7 # 7 = UNDEFINED (rohe Bytes)


def _tiff_endian(data: bytes) -> str:
"""``struct``-Präfix für die Byte-Reihenfolge des TIFF (``<`` für ``II``)."""
return "<" if data[:2] == b"II" else ">"


def _tiff_entries(data: bytes, e: str, ifd_off: int):
"""Lies ``(entries, next_ifd_offset)`` einer IFD.

Check notice on line 398 in sdata/imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

sdata/imagemeta.py#L398

Multi-line docstring summary should start at the second line (D213)

``entries`` ist eine Liste ``(tag, type, count, value_field)`` (value_field =
die 4 rohen Value/Offset-Bytes, unverändert übernommen).
"""
(count,) = struct.unpack(e + "H", data[ifd_off:ifd_off + 2])
entries = []
pos = ifd_off + 2
for _ in range(count):
tag, typ, cnt = struct.unpack(e + "HHI", data[pos:pos + 8])
entries.append((tag, typ, cnt, data[pos + 8:pos + 12]))
pos += 12
(next_off,) = struct.unpack(e + "I", data[pos:pos + 4])
return entries, next_off


def _tiff_embed(data: bytes, payload: str) -> bytes:
"""Schreibe ``payload`` ohne Offset-Chirurgie ein (replace-Semantik).

Check notice on line 415 in sdata/imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

sdata/imagemeta.py#L415

Multi-line docstring summary should start at the second line (D213)

Die Original-Bytes bleiben **unverändert** (alle bestehenden Offsets, inkl.
``StripOffsets``, bleiben gültig). Eine **Kopie** der ersten IFD — ergänzt um ein
privates Tag mit der Nutzlast — wird ans Dateiende angehängt und der Header auf
diese neue IFD umgelenkt. Mehrfaches Einbetten ersetzt logisch die Nutzlast;
verwaiste Vorgänger-Bytes bleiben ungenutzt im File (kein Re-Pack).
"""
e = _tiff_endian(data)
(ifd0_off,) = struct.unpack(e + "I", data[4:8])
entries, next_off = _tiff_entries(data, e, ifd0_off)
entries = [en for en in entries if en[0] != _TIFF_TAG] # vorhandenes Tag droppen
body = payload.encode("utf-8")
blob = bytearray(data)
if len(body) <= 4:
value_field = body + b"\x00" * (4 - len(body)) # inline (≤4 Byte)
else:
blob += b"\x00" * (len(blob) & 1) # Value wortausrichten
payload_off = len(blob)
blob += body
value_field = struct.pack(e + "I", payload_off)
entries.append((_TIFF_TAG, _TIFF_TYPE_UNDEFINED, len(body), value_field))
entries.sort(key=lambda en: en[0]) # IFD-Einträge aufsteigend
blob += b"\x00" * (len(blob) & 1) # IFD wortausrichten
new_ifd_off = len(blob)
blob += struct.pack(e + "H", len(entries))
for tag, typ, count, value_field in entries:
blob += struct.pack(e + "HHI", tag, typ, count) + value_field
blob += struct.pack(e + "I", next_off)
struct.pack_into(e + "I", blob, 4, new_ifd_off) # Header → neue IFD
return bytes(blob)


def _tiff_extract(data: bytes) -> Optional[str]:
"""Lies die sdata-Nutzlast aus dem privaten TIFF-Tag (inline oder per Offset)."""
e = _tiff_endian(data)
(ifd0_off,) = struct.unpack(e + "I", data[4:8])
entries, _next = _tiff_entries(data, e, ifd0_off)
for tag, _typ, count, value_field in entries:
if tag == _TIFF_TAG:
if count <= 4:
return value_field[:count].decode("utf-8")
(off,) = struct.unpack(e + "I", value_field)
return data[off:off + count].decode("utf-8")
return None


# ================================================================== Fassade
#: Registry: ``fmt -> (embed, extract)``.
_HANDLERS = {
Expand All @@ -388,6 +467,7 @@
"jp2": (_jp2_embed, _jp2_extract),
"gif": (_gif_embed, _gif_extract),
"webp": (_webp_embed, _webp_extract),
"tiff": (_tiff_embed, _tiff_extract),
}


Expand All @@ -407,6 +487,8 @@
return "gif"
if data[:4] == b"RIFF" and data[8:12] == b"WEBP":
return "webp"
if data[:4] in (_TIFF_LE, _TIFF_BE):
return "tiff"
return None


Expand Down
1 change: 1 addition & 0 deletions sdata/sclass/image.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class Image(Blob):
"jpf": ("JPEG2000", "jp2"), "jpx": ("JPEG2000", "jp2"),
"gif": ("GIF", "gif"),
"webp": ("WEBP", "webp"),
"tif": ("TIFF", "tiff"), "tiff": ("TIFF", "tiff"),
}

@classmethod
Expand Down
1 change: 1 addition & 0 deletions tests/test_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ def test_image_from_bytes_and_png_metadata_roundtrip(tmp_path):
("jp2", "JPEG2000", {}),
("gif", "GIF", {}),
("webp", "WEBP", {}),
("tiff", "TIFF", {}),
])
def test_image_metadata_roundtrip_all_formats(tmp_path, ext, pil_format, kwargs):
"""Einheitliche API: Metadaten schreiben→lesen über PNG/JPEG/JP2/GIF/WebP."""
Expand Down
46 changes: 43 additions & 3 deletions tests/test_imagemeta.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,20 +271,60 @@
assert im.extract(_riff_webp((b"VP8 ", b"pixeldata"))) is None


# ====================================================================== TIFF
def _tiff(e="<", extra_entries=()):
"""A minimal classic TIFF: header + IFD0 (ImageWidth) + optional entries."""
entries = [(256, 3, 1, struct.pack(e + "H", 4) + b"\x00\x00")] # ImageWidth=4
entries = sorted(entries + list(extra_entries), key=lambda en: en[0])
hdr = (b"II\x2a\x00" if e == "<" else b"MM\x00\x2a") + struct.pack(e + "I", 8)
ifd = struct.pack(e + "H", len(entries))
for tag, typ, cnt, vf in entries:
ifd += struct.pack(e + "HHI", tag, typ, cnt) + vf
ifd += struct.pack(e + "I", 0)
return hdr + ifd


def test_tiff_detect_both_endians():
assert im.detect_format(_tiff("<")) == "tiff"

Check warning on line 288 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L288

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
assert im.detect_format(_tiff(">")) == "tiff"

Check warning on line 289 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L289

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


def test_tiff_embed_extract_le_and_be():
for e in ("<", ">"):
out = im.embed(_tiff(e), PAYLOAD)
assert im.detect_format(out) == "tiff"

Check warning on line 295 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L295

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
assert im.extract(out) == PAYLOAD


def test_tiff_replace_existing():
once = im.embed(_tiff("<"), PAYLOAD)
twice = im.embed(once, "second")
assert im.extract(twice) == "second"


def test_tiff_tiny_payload_inline():
out = im.embed(_tiff("<"), "hi") # ≤4 Byte → inline-Value (kein Offset)
assert im.extract(out) == "hi"

Check warning on line 307 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L307

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


def test_tiff_extract_none_when_absent():
assert im.extract(_tiff("<")) is None

Check warning on line 311 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L311

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


# ================================================================== Fassade
def test_detect_unknown_returns_none():
assert im.detect_format(b"not an image") is None


def test_supported_formats():
assert im.supported_formats() == ("png", "jpeg", "jp2", "gif", "webp")
assert im.supported_formats() == ("png", "jpeg", "jp2", "gif", "webp", "tiff")

Check warning on line 320 in tests/test_imagemeta.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tests/test_imagemeta.py#L320

Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.


def test_embed_unsupported_format_raises():
with pytest.raises(im.UnsupportedImageFormatError):
im.embed(b"not an image", PAYLOAD)
with pytest.raises(im.UnsupportedImageFormatError):
im.embed(b"\x89PNG\r\n\x1a\n...", PAYLOAD, fmt="tiff")
im.embed(b"\x89PNG\r\n\x1a\n...", PAYLOAD, fmt="bmp") # not in the registry


def test_extract_unknown_format_is_lenient():
Expand All @@ -311,7 +351,7 @@

@pytest.mark.parametrize("fmt,kwargs", [
("PNG", {}), ("JPEG", {}), ("JPEG2000", {}), ("GIF", {}),
("WEBP", {}), ("WEBP", {"lossless": True}),
("WEBP", {}), ("WEBP", {"lossless": True}), ("TIFF", {}),
])
def test_real_image_roundtrip_keeps_pixels(_pil, fmt, kwargs):
import io
Expand Down