From 023dfe9966bc8b570618358097f5512eaa4f781d Mon Sep 17 00:00:00 2001 From: lepy Date: Mon, 29 Jun 2026 16:28:17 +0200 Subject: [PATCH] docs: End-to-End-Cookbook (Blob/Image -> DataFrame -> Schema -> JSON-LD -> VC) Eine durchgehende Usage-Seite, die die jetzt feature-reiche Bibliothek an EINEM realistischen Szenario (Zugversuch) verbindet: 1. Image als Blob: Inhalt + Provenienz + sha256/verify, native PNG-Einbettung 2. DataFrame: Tabelle + column_metadata (Einheiten/Ontologie) 3. Schema: apply_schema (vervollstaendigen) + validate (ValidationReport) 4. Maschinenlesbar: JSON-LD (csvw columns), Sidecar, Turtle (rdflib optional) 5. Verifiable Credential: EdDSA-Signatur (sdata.did) + verify + Rekonstruktion 6. DataFrame.as_blob: Tabelle als binaeres Asset Alle Snippets wurden vorab als Skript ausgefuehrt und verifiziert (API-korrekt, echte Beispielausgaben). Nav: Usage -> "Cookbook (end to end)" (zuerst); Startseiten-Link ergaenzt. mkdocs build --strict gruen (alle Cross-Refs/Anker aufgeloest). Reine Docs-Aenderung - Tests/Coverage unveraendert (100%). --- docs/index.md | 2 + docs/usage/cookbook.md | 157 +++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 160 insertions(+) create mode 100644 docs/usage/cookbook.md diff --git a/docs/index.md b/docs/index.md index 6c5310a..ec900a0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -39,6 +39,8 @@ print(data.metadata.df[["value", "unit", "dtype", "ontology"]]) ## Where to next +- **[Cookbook (end to end)](usage/cookbook.md)** — one tensile-test dataset through + image → table → schema → JSON-LD/RDF → Verifiable Credential. - **[Tabular data (DataFrame)](usage/dataframe.md)** — the self-describing table container: column metadata, Parquet/CSV/Arrow/Feather I/O, table-schema validation. - **[Image metadata](usage/image-metadata.md)** — embed sdata metadata natively into diff --git a/docs/usage/cookbook.md b/docs/usage/cookbook.md new file mode 100644 index 0000000..e01ae0e --- /dev/null +++ b/docs/usage/cookbook.md @@ -0,0 +1,157 @@ +# Cookbook: a self-describing dataset end to end + +This walkthrough threads the whole library together on one realistic scenario — a +**tensile test**: an image of the specimen, the measured stress–strain curve, a +declarative schema, machine-readable export, and a cryptographic signature. Each step +builds on the previous one; the snippets run as shown. + +```bash +pip install sdata pillow # core + Pillow (for images) +pip install "sdata[parquet]" # pyarrow -> DataFrame Parquet/Arrow +pip install "sdata[rdf]" # rdflib -> real Turtle/RDF (optional) +``` + +## 1. An image with provenance and integrity + +[`Image`][sdata.sclass.image.Image] is a [`Blob`][sdata.sclass.blob.Blob]: binary +content plus provenance and integrity. Annotate it, fix the checksum, and save — the +sdata metadata is embedded **natively** into the file (here a PNG `iTXt` chunk). + +```python +import io +import PIL.Image +from sdata.sclass.image import Image + +buf = io.BytesIO() +PIL.Image.new("RGB", (64, 48), (90, 90, 90)).save(buf, "PNG") + +img = Image.from_bytes("specimen_01.png", buf.getvalue(), project="TensileTest") +img.metadata.add("operator", "ada", description="who acquired the image") +img.metadata.add("magnification", 200, unit="-", dtype="int") + +img.update_checksum() # store the SHA-256 in the checksum metadata +img.sha256[:12], img.size, img.verify() # ('d354ab02639d', 138, True) +img.metadata.get("mime_type").value # 'image/png' (autofilled) + +img.save("specimen_01.png") # embeds the sdata metadata into the PNG +Image.from_file("specimen_01.png").metadata.get("operator").value # 'ada' +``` + +See [Image metadata](image-metadata.md) for the six native containers and the sidecar +fallback. + +## 2. A table with column metadata + +[`DataFrame`][sdata.sclass.dataframe.DataFrame] wraps a pandas frame with **per-column** +metadata and **dataset-level** metadata. + +```python +import pandas as pd +from sdata.sclass.dataframe import DataFrame + +df = pd.DataFrame({"strain": [0.0, 0.01, 0.02], "stress": [0.0, 210.0, 250.0]}) +sdf = DataFrame(df=df, name="tensile_curve_01", + description="engineering stress-strain curve", + column_metadata={"strain": {"unit": "-", "label": "strain"}, + "stress": {"unit": "MPa", "label": "engineering stress"}}) +sdf.set_column("stress", ontology="qudt:Stress") + +sdf.column_units # {'strain': '-', 'stress': 'MPa'} +sdf.shape # (3, 2) +``` + +More table I/O (Parquet/CSV/Arrow/Feather/Data Package/HDF5) in +[Tabular data](dataframe.md). + +## 3. A declarative schema: validate and complete + +A [`MetadataSchema`][sdata.schema.MetadataSchema] declares the expected attributes of a +data class. `apply_schema` **completes** the metadata (defaults, units, ontology); +`validate` returns a truthy [`ValidationReport`][sdata.schema.ValidationReport]. + +```python +from sdata.schema import MetadataSchema, AttrSpec + +schema = MetadataSchema("TensileTest", [ + AttrSpec("max_force", dtype="float", unit="kN", required=True, ontology="bfo:Quality"), + AttrSpec("temperature", dtype="float", unit="degC", default=23.0), +]) + +sdf.metadata.apply_schema(schema) +report = sdf.metadata.validate(schema) +bool(report), report.missing # (False, ['max_force']) +sdf.metadata.get("temperature").value # 23.0 (default filled in) + +sdf.metadata.add("max_force", 18.2, dtype="float", unit="kN") +bool(sdf.metadata.validate(schema)) # True +``` + +A schema can also be wired onto a class via the `SDATA_SCHEMA` hook so that every +instance is auto-completed and `obj.validate()` works — see +[Machine-readable metadata](metadata-jsonld.md#schema-templates). + +## 4. Machine-readable export: JSON-LD, RDF, sidecar + +The metadata (including the per-column annotations as csvw columns) serializes to +JSON-LD, and to an independent `.meta.jsonld` sidecar. + +```python +doc = sdf.to_jsonld() +doc["@id"] # 'did:suuid:DataFrame__tensile_curve_01__…:sdata' +doc["columns"] # [{'name': 'strain', 'datatype': 'xsd:double'}, + # {'name': 'stress', 'datatype': 'xsd:double', + # 'unitRef': 'unit:MegaPA', 'symbol': 'MPa', 'label': 'engineering stress'}] + +sdf.write_sidecar(".") # writes .meta.jsonld next to your data +sdf.to_turtle() # real Turtle with sdata[rdf]; JSON-LD fallback otherwise +``` + +The stable `@id` is the dataset's `did:suuid::sdata` DID. Details on the vocab, +units (QUDT/UCUM) and ontology classes (BFO) are in +[Machine-readable metadata](metadata-jsonld.md). + +## 5. Sign it: a Verifiable Credential + +Finally, sign the (fully-qualified) metadata as a **W3C Verifiable Credential** — a +compact JWS over the JSON-LD, using the pure-Python EdDSA stack in `sdata.did` (no +external crypto dependency). + +```python +from sdata.metadata import Metadata +from sdata import semantic +from sdata.did import keys, pub_from_priv_jwk + +priv = keys.gen_ed25519_jwk() # issuer key (Ed25519, as a JWK) +pub = pub_from_priv_jwk(priv) + +jws = sdf.metadata.to_verifiable_credential("did:example:lab", priv) +jws.count(".") # 2 -> compact JWS (header.payload.sig) + +subject = semantic.verify_credential(jws, pub) # raises on tampering +subject["@id"] # the dataset DID + +restored = Metadata.from_verifiable_credential(jws, pub) +restored.get("max_force").value # 18.2 +``` + +Tampering with the payload makes `verify_credential` raise — the metadata is now +**verifiable and trustworthy**. See +[Verifiable Credentials](metadata-jsonld.md#verifiable-credentials). + +## 6. Bundle a table as a binary asset + +To hand a table around as a single hashable/signable file, render it to a `Blob` in a +chosen format ([`as_blob`][sdata.sclass.dataframe.DataFrame.as_blob]): + +```python +blob = sdf.as_blob("parquet") # or "csv" / "arrow" / "feather" +blob.filetype, blob.verify(), blob.size # ('parquet', True, 14453) +blob.write("s3://bucket/tensile_curve_01.parquet") # fsspec target (sdata[blob]) +``` + +## Where to go next + +- [Tabular data (DataFrame)](dataframe.md) — the full table I/O portfolio. +- [Image metadata](image-metadata.md) — native embedding across six formats + sidecar. +- [Machine-readable metadata](metadata-jsonld.md) — JSON-LD/RDF, schema, units, VC. +- [API reference](../api.md) — the complete Python API. diff --git a/mkdocs.yml b/mkdocs.yml index 63f4d0d..4812d4b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -62,6 +62,7 @@ markdown_extensions: nav: - Home: index.md - Usage: + - Cookbook (end to end): usage/cookbook.md - Tabular data (DataFrame): usage/dataframe.md - Image metadata: usage/image-metadata.md - Machine-readable metadata: usage/metadata-jsonld.md