Skip to content

Implement entity-relation-grondslagen — decision-metadata PATCH endpoint (#1435)#1503

Open
rjzondervan wants to merge 11 commits into
developmentfrom
feat/1435/entity-relation-grondslagen-impl
Open

Implement entity-relation-grondslagen — decision-metadata PATCH endpoint (#1435)#1503
rjzondervan wants to merge 11 commits into
developmentfrom
feat/1435/entity-relation-grondslagen-impl

Conversation

@rjzondervan
Copy link
Copy Markdown
Member

Implements the entity-relation-grondslagen change. Includes a spec rework committed before the implementation — see "Spec rework note" below.

Summary

Introduces an audited, decision-only PATCH endpoint for setting operator decisions on detected-entity occurrences:

  • PATCH /api/entity-relations/{id} — body {bases?: ?array<string>, skipAnonymization?: bool}. Any other top-level key returns HTTP 400 with the offending field name. @NoAdminRequired; the acting user MUST have write-access to the relation's parent file.
  • EntityRelationMapper::updateDecisionMetadata(EntityRelation, array, ?IUser) — single audited write path. Controller passes the loaded relation; mapper enforces the whitelist, validates field shapes, computes a diff, persists changes, and emits exactly one audit-trail entry (only when fields actually change — semantic no-ops produce no audit).
  • Anonymise flow gains skip-awareness. markAsAnonymized adds AND skip_anonymization = 0 to its WHERE; FileTextController::anonymizeFile reads through the new findEntitiesForAnonymization; DocumentProcessingHandler::anonymizeDocument defensively filters skipped values server-side so the contract — "skipped relations are never redacted, full stop" — holds regardless of caller behaviour.

The post-hoc system fields anonymized / anonymizedValue are intentionally NOT in the whitelist; they're set only by markAsAnonymized to preserve the audit invariant "anonymized=true ⟹ the redaction code ran for this row".

Spec rework note

This branch's first commit (ef5464b94) is a substantive amendment to the spec that was approved as PR #1494. The original spec described "the anonymise endpoint accepts bases, persists, then strips before forwarding to OpenAnonymiser". Explore-mode investigation in apply uncovered two architectural mismatches:

  1. OpenAnonymiser is called during detection (EntityRecognitionHandler::detectWithOpenAnonymiser), not at anonymise time. The "strip before forward" requirement had no counterpart in the code.
  2. OR's anonymise endpoint takes a fileId and reads pre-detected relations from the DB; it never forwarded entity arrays anywhere external.

The rework decoupled bases-set from the anonymise call into the new PATCH endpoint, and added the skipAnonymization field on the same audited write path so operators can also opt-out per-occurrence. Both spec and design now describe the actual architecture.

The companion DocuDesk amendment lives on docudesk:feat/123/anonymisation-grondslagen-and-prohibition-gate-amend (local-only — to be pushed alongside this review).

Database

Column Type Default Purpose
bases JSON NULL UUIDs referencing legal grondslagen (consumer-owned vocabulary).
skip_anonymization BOOLEAN false Operator opt-out from the next anonymise pass.

Migration is forward-only and idempotent (hasColumn guards). Existing rows pick up the defaults; no backfill.

Tests

33 unit tests, 133 assertions, all passing against master-nextcloud-1:

  • tests/Unit/Db/EntityRelationTest.php — extended for the two new fields.
  • tests/Unit/Db/EntityRelationMapperTest.php — construction wiring.
  • tests/Unit/Db/EntityRelationMapperUpdateDecisionMetadataTest.php — whitelist, shape validation, semantic-no-op, diff-aware audit emission, audit-failure tolerance, actor resolution (explicit / session / system fallback).
  • tests/Unit/Controller/EntityRelationsControllerTest.php — 401 / 403 / 404 / 400 / 200 / 500 HTTP contract.
$ docker exec -w /var/www/html/apps-extra/openregister master-nextcloud-1 \
    php vendor/bin/phpunit -c phpunit-unit.xml --filter "EntityRelation"
Tests: 33, Assertions: 133 — OK

Quality gates

  • phpcs clean on all touched files (composer phpcs)
  • php -l clean
  • openspec validate entity-relation-grondslagen clean
  • ✓ 33 / 33 unit tests passing
  • ⏸ Live-stack manual smoke deferred to this PR's review phase

Test plan

  • Review the spec-rework commit (ef5464b94) — confirm the decoupled-PATCH architecture matches expectations
  • Review the implementation commit (139e1667d)
  • Apply the migration on a dev install; PATCH a relation with {bases: ["uuid-a"]}, confirm the row is updated and one audit-trail entry exists
  • PATCH a relation with {skipAnonymization: true}; run POST /api/files/:fileId/anonymize on the parent file; confirm the skipped row is NOT redacted and stays anonymized=false
  • PATCH with {anonymized: true}; confirm HTTP 400 with field_not_editable error
  • Confirm pre-change anonymise calls (no skip flags anywhere) behave identically to before
  • Re-review docs/Features/entity-relation-decision-metadata.md for accuracy

Refs: #1435

@github-actions
Copy link
Copy Markdown
Contributor

Quality Report — ConductionNL/openregister @ 554b045

Check PHP Vue Security License Tests
lint
phpcs
phpmd
psalm
phpstan
phpmetrics
eslint
stylelint
composer ✅ 153/153
npm ✅ 598/598
PHPUnit
Newman ⏭️
Playwright ⏭️

Quality workflow — 2026-05-12 14:28 UTC

Download the full PDF report from the workflow artifacts.

Copy link
Copy Markdown
Contributor

@WilcoLouwerse WilcoLouwerse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — Strict mode. 4 blockers (including 1 new from the most recent commit), 5 significant issues.

🔴 Blockers (4): Auth bypass for object/email-bound relations; @NoCSRFRequired on a state-mutating PATCH without justification; ADR-005 violation (getDisplayName() stored as audit actor instead of UID); Newman regression introduced by commit b8949d17 — Newman API Test Suite passed on 139e1667 and now fails with "socket hang up" on every request after the IEventDispatcher constructor parameter was added to EntityRelationMapper without a matching explicit registerService in Application.php.

🟡 Significant (5): Order-sensitive bases comparison causes spurious audit entries; getParams() content-type sensitivity; non-transactional audit write; fileId = 0 bypasses file auth; tasks.md item 3.2a left unchecked despite being implemented in this commit.

🟢 Minor (3): Missing @copyright on 3 new test files (gate-1); missing postSchemaChange() comment in migration; CHANGELOG.md documents the wrong method signature for updateDecisionMetadata.

CI note: PHPUnit failures on quality / PHPUnit (PHP 8.3/8.4, NC stable32) are pre-existing on the development branch. The Newman failure IS new — it regressed in this PR.

Comment thread lib/Controller/EntityRelationsController.php Outdated
Comment thread lib/Controller/EntityRelationsController.php Outdated
Comment thread lib/Db/EntityRelationMapper.php
Comment thread lib/Db/EntityRelationMapper.php Outdated
@rjzondervan
Copy link
Copy Markdown
Member Author

Re: Newman regression attributed to b8949d17

Wanted to share reproduction results before applying the suggested registerService(EntityRelationMapper::class, …) fix in lib/AppInfo/Application.php, because I cannot reproduce the regression locally and the autowiring path appears healthy.

Tests I ran

  1. Newman crud collection on this branch tip (80c2de5b9):

    • BASE_URL=http://localhost CONTAINER_NAME=master-nextcloud-1 COLLECTIONS="crud" bash tests/newman/run-all.sh
    • Result: 1 collection failed — assertion failures (HTTP 400 on a number of delete operations), no socket hang up, server kept responding.
  2. Same Newman crud collection on development HEAD (9d50f4360before my branch diverged):

    • Result: identical set of HTTP 400 failures on the same delete operations.
    • Conclusion: the failures pre-exist b8949d17 and 80c2de5b9. They are not caused by this PR.

Direct evidence the IEventDispatcher autowiring works

Endpoint GET /apps/openregister/api/entities/stats is served by GdprEntitiesController, which constructor-injects EntityRelationMapper (the new signature with IEventDispatcher). Hit it from inside the NC container:

HTTP 200
{...,"byType":{"ORGANIZATION":57,"PERSON":53,"LOCATION":37,"DATE":17,"EMAIL":4,"IBAN":1,"PHONE":1},...}

That request constructs EntityRelationMapper through the DI container with all five deps. If the autowiring chain were broken we'd get a 500 here, not a 200 with a valid payload.

Additionally, sibling mappers (SchemaMapper, MagicMapper, ConfigurationMapper) all use IEventDispatcher via autowiring with no explicit registerService block — same pattern, no failures.

Ask

Could you share the exact failing collection / commit you ran and the failure mode (was it really "socket hang up"? or assertion errors like above?)? I want to make sure I'm not missing something CI-specific before signing off — but I don't want to add a speculative registerService block if the diagnosis is wrong, because that would obscure a different root cause.

Continuing with the other review items (ADR-005 audit actor → UID, PATCH authz, @NoCSRFRequired, the 5 significant items) in parallel.

rjzondervan added a commit that referenced this pull request May 15, 2026
… significant)

Addresses the OR #1503 review on the entity-relation decision-metadata
PATCH endpoint. Four blockers (minus the Newman one — see PR comment),
five significant items, and the minor CHANGELOG/migration cleanups.

**Blockers:**

- **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry`
  no longer persists `$user->getDisplayName()` on the audit-trail entry's
  `userName` column (PII). Audit entries now record only the UID; the
  display name is dropped.

- **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and
  email-bound relations no longer fall through with an accept-with-warning
  log. The controller now constructor-injects `PermissionHandler`,
  `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`,
  then routes object-bound relations through
  `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)`
  — the same verdict OR uses everywhere else for object writes — and
  routes email-bound relations through their parent ObjectEntity via the
  EmailLink. Email links without a parent object are denied. `fileId = 0`
  / `objectId = 0` / `emailId = 0` no longer satisfy the file branch.
  Adds `EmailLinkMapper::find(int $id)`.

- **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`.
  This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF
  protection MUST stay on. Spec amended (D7) to record the decision.

**Significant items:**

- **Order-sensitive `bases` diff.** New private
  `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does
  multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now
  a semantic no-op and produces no audit entry. `null` vs `[]` remains
  distinct. Storage preserves operator-supplied order; only the diff
  check normalises.

- **`getParams()` content-type sensitivity.** `update()` rejects requests
  whose Content-Type is anything other than `application/json` (or
  empty) with HTTP 415, so the PATCH never depends on body-parser
  heuristics for the decision-time mutation.

- **Non-transactional audit write.** Row UPDATE + audit-trail INSERT
  now run inside a single `$db->beginTransaction()` / `commit()` /
  `rollBack()`. An audit failure rolls back the UPDATE and surfaces as
  HTTP 500 — clients never observe a persisted decision-metadata change
  without a matching audit entry. The downstream event dispatch stays
  OUTSIDE the transaction (post-commit, informational; listener
  failures still don't roll back). Test
  `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the
  contract.

- **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`,
  `objectId`, `emailId` in the authz dispatch helper.

- **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6,
  3.1–3.8 marked `[x]` to reflect what landed in this PR.

**Minor items:**

- **Test copyright headers.** Added SPDX + Conduction headers to
  `EntityRelationsControllerTest`, `EntityRelationMapperTest`,
  `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest`
  (also fixed the latter's `namespace Unit\Db` typo).

- **CHANGELOG signature.** Corrected the `updateDecisionMetadata`
  signature in the Unreleased entry from `int $id` to
  `EntityRelation $relation` and noted the transactional contract +
  multiset `bases` diff.

- **Migration `postSchemaChange()` comment.** Documented why
  `Version1Date20260512120000` intentionally does not override
  `postSchemaChange` (nullable column + DB-level default; no backfill).

**Spec.** `design.md` §D6 now records the transactional audit-invariant
+ multiset `bases` equality; §D7 documents the authz wiring (file →
`isUpdateable`; object → `PermissionHandler`; email → parent object via
`EmailLink`) and the `@NoCSRFRequired` removal rationale.

**Tests.** 44 tests / 160 assertions across the four mapper +
controller suites pass. New cases:
- multiset bases equality (reordered, duplicates, null-vs-empty).
- audit-failure rolls back update and rethrows.
- 403 on object-bound when `hasPermission` denies.
- 200 on object-bound when `hasPermission` grants.
- 403 on email-bound when EmailLink has no parent object.
- 403 on a relation with no subject (fileId/objectId/emailId all unset).

**Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via
missing `registerService` for autowiring) does not reproduce — see PR
comment. Newman crud fails identically on `development` HEAD (`9d50f4360`,
before this branch existed); the failures are HTTP 400 assertion
errors, not "socket hang up". Autowiring works locally:
`GET /api/entities/stats` (uses `GdprEntitiesController` which
constructor-injects `EntityRelationMapper`) returns HTTP 200 with a
valid payload. Speculative `registerService` block NOT added pending
reviewer's exact failing collection + commit.
@rjzondervan
Copy link
Copy Markdown
Member Author

Review response — commit e34eac991

Addressed every flagged item except the Newman regression, which my earlier comment (#issuecomment-4460023069) showed doesn't reproduce.

Blockers

  • ADR-005 audit actor. EntityRelationMapper::emitDecisionMetadataAuditEntry no longer persists $user->getDisplayName() on the audit-trail entry's userName column. The audit row records only the UID; the display name is dropped (set to null).
  • Auth bypass on PATCH /api/entity-relations/{id}. Object- and email-bound relations no longer fall through with an accept-with-warning log. The controller now constructor-injects PermissionHandler, MagicMapper, SchemaMapper, RegisterMapper, and EmailLinkMapper. Object-bound relations route through PermissionHandler::hasPermission(schema, 'update', uid, owner, object). Email-bound relations resolve their parent ObjectEntity via EmailLink and run the same object-update verdict; email links without a parent object are denied. fileId = 0 / objectId = 0 / emailId = 0 no longer satisfy any branch. Added EmailLinkMapper::find(int $id) to support the email-bound lookup.
  • @NoCSRFRequired. Removed from EntityRelationsController::update. Spec amended (design.md §D7) to record the decision — this is a browser-facing decision-time PATCH and Nextcloud's session-CSRF protection MUST stay on.
  • Newman. Not addressed — see previous comment. The crud Newman collection fails identically on development HEAD (commit 9d50f4360) without any of my commits, with HTTP 400 assertion errors (not "socket hang up"). GET /api/entities/stats (which constructor-injects EntityRelationMapper) returns HTTP 200 locally — autowiring works. I'd push back on this blocker rather than apply a speculative registerService fix that might mask a different CI-environmental cause.

Significant items

  • Order-sensitive bases diff → new private EntityRelationMapper::basesAreEqual(?array, ?array): bool does multiset equality (sort + unique). ['a', 'b'] vs ['b', 'a'] is a semantic no-op and produces no audit entry. null vs [] remains distinct (different operator intent). Storage preserves operator-supplied order; only the diff check normalises.
  • getParams() content-type sensitivityupdate() rejects requests with Content-Type other than application/json (or empty) with HTTP 415, so the PATCH never depends on body-parser heuristics for the decision-time mutation.
  • Non-transactional audit write → row UPDATE + audit-trail INSERT now run inside a single $db->beginTransaction() / commit() / rollBack(). An audit failure rolls back the UPDATE and surfaces as HTTP 500 — clients never observe a persisted decision-metadata change without a matching audit entry. The downstream event dispatch stays OUTSIDE the transaction (post-commit, informational; listener failures still don't roll back). Test testAuditEmissionFailureRollsBackUpdateAndThrows updates the contract.
  • fileId = 0 bypass → closed by > 0 guards on fileId / objectId / emailId in the authz dispatch helper.
  • tasks.md 3.2a unchecked → items 1.1–1.3, 2.1–2.6, 3.1–3.8 all marked [x] to reflect what landed in this PR.

Minor items

  • Test copyright headers → SPDX + Conduction headers added to EntityRelationsControllerTest, EntityRelationMapperTest, EntityRelationMapperUpdateDecisionMetadataTest, EntityRelationTest (also fixed the latter's namespace Unit\Db typo).
  • CHANGELOG signature → corrected the updateDecisionMetadata signature from int $id to EntityRelation $relation and noted the transactional contract + multiset bases diff.
  • Migration postSchemaChange() comment → documented why Version1Date20260512120000 intentionally does not override postSchemaChange (nullable column + DB-level default; no backfill needed).

Tests

44 / 44 green, 160 assertions across EntityRelationMapperUpdateDecisionMetadataTest, EntityRelationsControllerTest, EntityRelationTest, EntityRelationMapperTest. New cases cover multiset bases equality, audit-failure rollback, 403 on object-bound denial / 200 on grant, 403 on email-bound without parent object, 403 on relation without any subject.

Comment thread lib/Service/File/DocumentProcessingHandler.php
Comment thread lib/Db/EntityRelationMapper.php
Comment thread lib/Service/File/DocumentProcessingHandler.php
Copy link
Copy Markdown
Contributor

@WilcoLouwerse WilcoLouwerse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (Strict). 3 of 4 prior findings resolved: 🔴 @NoCSRFRequired removed and spec amended (design.md §D7); 🔴 auth bypass closed — all three branches (file/object/email) now run an authorisation check, ALL-IDs-null/0 case correctly denies; 🔴 ADR-005 actor — getDisplayName() no longer persisted, audit row records only the UID. Transactional audit-write, HTTP 415 on non-JSON, multiset basesAreEqual, migration idempotency — all verified.

Newman/DI: accepting the pushback that the failure can't be conclusively diagnosed from the diff alone, but downgraded to a 🟡 follow-up because EntityRelationMapper is the only IEventDispatcher-dependent mapper without an explicit registerService() block, inconsistent with the codebase pattern (see inline).

Blocker (1 🔴):

  • Double markAsAnonymized on the HTTP path — commit fb512791 added a markAsAnonymized($fileId, '[REDACTED]') at the end of DocumentProcessingHandler::anonymizeDocument, but FileTextController::anonymizeFile:547 also calls markAsAnonymized with 'anonymized_<date>' afterwards. The second UPDATE overwrites the first's anonymized_value, leaving rows with a value that doesn't match the placeholder actually written to the file.

Concerns (2 🟡):

rubenvdlinde pushed a commit that referenced this pull request May 18, 2026
… significant)

Addresses the OR #1503 review on the entity-relation decision-metadata
PATCH endpoint. Four blockers (minus the Newman one — see PR comment),
five significant items, and the minor CHANGELOG/migration cleanups.

**Blockers:**

- **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry`
  no longer persists `$user->getDisplayName()` on the audit-trail entry's
  `userName` column (PII). Audit entries now record only the UID; the
  display name is dropped.

- **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and
  email-bound relations no longer fall through with an accept-with-warning
  log. The controller now constructor-injects `PermissionHandler`,
  `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`,
  then routes object-bound relations through
  `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)`
  — the same verdict OR uses everywhere else for object writes — and
  routes email-bound relations through their parent ObjectEntity via the
  EmailLink. Email links without a parent object are denied. `fileId = 0`
  / `objectId = 0` / `emailId = 0` no longer satisfy the file branch.
  Adds `EmailLinkMapper::find(int $id)`.

- **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`.
  This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF
  protection MUST stay on. Spec amended (D7) to record the decision.

**Significant items:**

- **Order-sensitive `bases` diff.** New private
  `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does
  multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now
  a semantic no-op and produces no audit entry. `null` vs `[]` remains
  distinct. Storage preserves operator-supplied order; only the diff
  check normalises.

- **`getParams()` content-type sensitivity.** `update()` rejects requests
  whose Content-Type is anything other than `application/json` (or
  empty) with HTTP 415, so the PATCH never depends on body-parser
  heuristics for the decision-time mutation.

- **Non-transactional audit write.** Row UPDATE + audit-trail INSERT
  now run inside a single `$db->beginTransaction()` / `commit()` /
  `rollBack()`. An audit failure rolls back the UPDATE and surfaces as
  HTTP 500 — clients never observe a persisted decision-metadata change
  without a matching audit entry. The downstream event dispatch stays
  OUTSIDE the transaction (post-commit, informational; listener
  failures still don't roll back). Test
  `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the
  contract.

- **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`,
  `objectId`, `emailId` in the authz dispatch helper.

- **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6,
  3.1–3.8 marked `[x]` to reflect what landed in this PR.

**Minor items:**

- **Test copyright headers.** Added SPDX + Conduction headers to
  `EntityRelationsControllerTest`, `EntityRelationMapperTest`,
  `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest`
  (also fixed the latter's `namespace Unit\Db` typo).

- **CHANGELOG signature.** Corrected the `updateDecisionMetadata`
  signature in the Unreleased entry from `int $id` to
  `EntityRelation $relation` and noted the transactional contract +
  multiset `bases` diff.

- **Migration `postSchemaChange()` comment.** Documented why
  `Version1Date20260512120000` intentionally does not override
  `postSchemaChange` (nullable column + DB-level default; no backfill).

**Spec.** `design.md` §D6 now records the transactional audit-invariant
+ multiset `bases` equality; §D7 documents the authz wiring (file →
`isUpdateable`; object → `PermissionHandler`; email → parent object via
`EmailLink`) and the `@NoCSRFRequired` removal rationale.

**Tests.** 44 tests / 160 assertions across the four mapper +
controller suites pass. New cases:
- multiset bases equality (reordered, duplicates, null-vs-empty).
- audit-failure rolls back update and rethrows.
- 403 on object-bound when `hasPermission` denies.
- 200 on object-bound when `hasPermission` grants.
- 403 on email-bound when EmailLink has no parent object.
- 403 on a relation with no subject (fileId/objectId/emailId all unset).

**Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via
missing `registerService` for autowiring) does not reproduce — see PR
comment. Newman crud fails identically on `development` HEAD (`9d50f4360`,
before this branch existed); the failures are HTTP 400 assertion
errors, not "socket hang up". Autowiring works locally:
`GET /api/entities/stats` (uses `GdprEntitiesController` which
constructor-injects `EntityRelationMapper`) returns HTTP 200 with a
valid payload. Speculative `registerService` block NOT added pending
reviewer's exact failing collection + commit.
@rubenvdlinde rubenvdlinde force-pushed the feat/1435/entity-relation-grondslagen-impl branch from 86f2d18 to 0ee978b Compare May 18, 2026 17:35
@rubenvdlinde
Copy link
Copy Markdown
Contributor

Rebased on top of latest development (tip 95aed02c5) to pick up:

Resolved 2 conflicts:

  • openspec/changes/entity-relation-grondslagen/tasks.md: kept PR's full reworked task list (the rework commit on this branch deliberately re-introduces the skip_anonymization column + PATCH endpoint that dev's ADR-032 split removed).
  • CHANGELOG.md (across two rebase steps): merged EML (dev) and entity-relation-grondslagen (PR) entries under ### Added / ### Behaviour changes; on the review-fix commit, kept the newer transaction-wrapped + multiset-equal-diff description.

appinfo/routes.php and lib/Db/EmailLinkMapper.php auto-merged. Force-pushed with lease.

rubenvdlinde pushed a commit that referenced this pull request May 18, 2026
… significant)

Addresses the OR #1503 review on the entity-relation decision-metadata
PATCH endpoint. Four blockers (minus the Newman one — see PR comment),
five significant items, and the minor CHANGELOG/migration cleanups.

**Blockers:**

- **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry`
  no longer persists `$user->getDisplayName()` on the audit-trail entry's
  `userName` column (PII). Audit entries now record only the UID; the
  display name is dropped.

- **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and
  email-bound relations no longer fall through with an accept-with-warning
  log. The controller now constructor-injects `PermissionHandler`,
  `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`,
  then routes object-bound relations through
  `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)`
  — the same verdict OR uses everywhere else for object writes — and
  routes email-bound relations through their parent ObjectEntity via the
  EmailLink. Email links without a parent object are denied. `fileId = 0`
  / `objectId = 0` / `emailId = 0` no longer satisfy the file branch.
  Adds `EmailLinkMapper::find(int $id)`.

- **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`.
  This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF
  protection MUST stay on. Spec amended (D7) to record the decision.

**Significant items:**

- **Order-sensitive `bases` diff.** New private
  `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does
  multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now
  a semantic no-op and produces no audit entry. `null` vs `[]` remains
  distinct. Storage preserves operator-supplied order; only the diff
  check normalises.

- **`getParams()` content-type sensitivity.** `update()` rejects requests
  whose Content-Type is anything other than `application/json` (or
  empty) with HTTP 415, so the PATCH never depends on body-parser
  heuristics for the decision-time mutation.

- **Non-transactional audit write.** Row UPDATE + audit-trail INSERT
  now run inside a single `$db->beginTransaction()` / `commit()` /
  `rollBack()`. An audit failure rolls back the UPDATE and surfaces as
  HTTP 500 — clients never observe a persisted decision-metadata change
  without a matching audit entry. The downstream event dispatch stays
  OUTSIDE the transaction (post-commit, informational; listener
  failures still don't roll back). Test
  `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the
  contract.

- **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`,
  `objectId`, `emailId` in the authz dispatch helper.

- **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6,
  3.1–3.8 marked `[x]` to reflect what landed in this PR.

**Minor items:**

- **Test copyright headers.** Added SPDX + Conduction headers to
  `EntityRelationsControllerTest`, `EntityRelationMapperTest`,
  `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest`
  (also fixed the latter's `namespace Unit\Db` typo).

- **CHANGELOG signature.** Corrected the `updateDecisionMetadata`
  signature in the Unreleased entry from `int $id` to
  `EntityRelation $relation` and noted the transactional contract +
  multiset `bases` diff.

- **Migration `postSchemaChange()` comment.** Documented why
  `Version1Date20260512120000` intentionally does not override
  `postSchemaChange` (nullable column + DB-level default; no backfill).

**Spec.** `design.md` §D6 now records the transactional audit-invariant
+ multiset `bases` equality; §D7 documents the authz wiring (file →
`isUpdateable`; object → `PermissionHandler`; email → parent object via
`EmailLink`) and the `@NoCSRFRequired` removal rationale.

**Tests.** 44 tests / 160 assertions across the four mapper +
controller suites pass. New cases:
- multiset bases equality (reordered, duplicates, null-vs-empty).
- audit-failure rolls back update and rethrows.
- 403 on object-bound when `hasPermission` denies.
- 200 on object-bound when `hasPermission` grants.
- 403 on email-bound when EmailLink has no parent object.
- 403 on a relation with no subject (fileId/objectId/emailId all unset).

**Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via
missing `registerService` for autowiring) does not reproduce — see PR
comment. Newman crud fails identically on `development` HEAD (`9d50f4360`,
before this branch existed); the failures are HTTP 400 assertion
errors, not "socket hang up". Autowiring works locally:
`GET /api/entities/stats` (uses `GdprEntitiesController` which
constructor-injects `EntityRelationMapper`) returns HTTP 200 with a
valid payload. Speculative `registerService` block NOT added pending
reviewer's exact failing collection + commit.
@rubenvdlinde rubenvdlinde force-pushed the feat/1435/entity-relation-grondslagen-impl branch from 0ee978b to 0808d18 Compare May 18, 2026 18:41
rubenvdlinde and others added 9 commits May 18, 2026 21:19
Closes #1564.

The static `self::\$constructCount` counter + `> 2` guard in
`MagicMapper::__construct` was a March-2026 holdover from a refactor's
debugging session — added in ef8ed0d, kept across the
`file_put_contents('/tmp/or-debug.log', ...)` debug logging that PR
#1565 just removed.

The two circular-DI paths the guard was protecting against have both
been broken via lazy container resolution since:
- `CacheHandler → RegisterMapper → MagicMapper` — `CacheHandler` and
  `ICacheFactory` are now resolved lazily inside
  `initializeHandlers()` (commented in-line).
- `SettingsService → MagicMapper` — `SettingsService` removed
  `MagicMapper` from its constructor signature (private docblock at
  SettingsService.php:243 marks "REMOVED: Object mapper").

The full PHPUnit suite (12,226 tests / 25,949 assertions) is green
after removing the guard — proving the cycle no longer trips in any
covered path.

## What changed

- `lib/Db/MagicMapper.php` — dropped `\$constructCount` static + the
  guard branch from `__construct()`. Constructor now goes straight to
  `\$this->initializeHandlers()`.
- `tests/Unit/Service/MagicMapperTest.php` — removed the reflection
  block that reset the static counter between tests (it referenced a
  property that no longer exists).

## Verification

- `composer phpcs` on `MagicMapper.php` → 0 errors
- `phpunit --filter MagicMapperTest` → 29/29 pass
- `phpunit -c phpunit-unit.xml` (full) → 12,226 tests / 25,949
  assertions / 0 errors / 0 failures
)

Newman runs from the host targeting http://localhost:8080. Previously
the workflow only waited for `occ status` to report "installed: true"
— but OCC uses the PHP CLI and talks to the DB directly, so it returns
installed before Apache has finished bringing up the HTTP listener.
Newman then immediately fired requests that all came back `[errored]`
(connection refused / port not yet forwarded), failing the entire
api-test-coverage workflow on every PR.

Added a step between "Enable openregister app" and "Run Newman
orchestrator" that:
  1. Polls `http://localhost:8080/status.php` until Apache responds —
     proves the host-mapped port works.
  2. Polls
     `http://localhost:8080/index.php/apps/openregister/api/settings/rbac`
     until the OpenRegister app is dispatching (accepts 200/401/405
     as evidence of dispatch — the suite re-auths anyway).
  3. Does one final `curl -fsS` against the same endpoint and fails
     fast with a clear ::error:: marker if the API is still down,
     so the failure surfaces before Newman runs and dumps 100s of
     `[errored]` lines.

Both polls cap at 60 attempts × 2s = 120s. Local dev unaffected
(orchestrator only runs in CI from this workflow).
…findings

Original spec described 'anonymise endpoint accepts bases, persists,
strips before forwarding to OpenAnonymiser'. Two architectural problems
surfaced during apply:

1. OpenAnonymiser is called during DETECTION (EntityRecognitionHandler),
   not at anonymise time. The 'strip before forward' Requirement had no
   counterpart in the code.
2. OR's anonymise endpoint takes a fileId and reads pre-detected
   relations from the DB; it has no payload-driven entity list and
   never forwarded entity arrays anywhere external.

Decoupled the bases-set verb from the anonymise verb. New shape:

- NEW: PATCH /api/entity-relations/{id} with decision-only whitelist
  { bases, skipAnonymization }. Backed by single audited DI method
  EntityRelationMapper::updateDecisionMetadata. HTTP controller is a
  thin wrapper.
- NEW: skip_anonymization boolean column on EntityRelation (default
  false). Operator decision: 'do not redact this occurrence'.
- MODIFIED: anonymise flow honours the skip flag. markAsAnonymized
  gains 'AND skip_anonymization = 0' predicate; FileService defensively
  filters skipped rows server-side.
- UNCHANGED: anonymise endpoint URLs + signatures. OpenAnonymiser
  detection wiring. markAsAnonymized's role as the writer of
  anonymized/anonymizedValue.

Whitelist is decision-only: anonymized and anonymizedValue are NOT
editable via PATCH. Those are post-hoc system fields written only by
markAsAnonymized — letting operators flip them would forge audit
history.

Per-relation storage gives per-position granularity natively. Consumer
UIs (DocuDesk's review) may aggregate at entity level for UX, but the
OR surface preserves the finer granularity.

Cross-app impact: DD's anonymisation-grondslagen-and-prohibition-gate
spec was written against the old contract (bases on anonymise payload);
it needs a parallel amendment to call the new PATCH endpoint instead.
Flagged in the proposal under Out of scope.

Refs: #1435
…endpoint

Implements the post-explore-mode-rework spec. New surfaces:

- DB migration adds two columns to oc_openregister_entity_relations:
  - bases (JSON, nullable)
  - skip_anonymization (BOOLEAN, default false)
  Both idempotent via hasColumn guards.

- EntityRelation entity gains getBases/setBases and
  getSkipAnonymization/setSkipAnonymization, registered via addType,
  and serialised in jsonSerialize.

- EntityRelationMapper:
  - find(int) — explicit concrete find by primary id (QBMapper only
    declares it via @method docblock; concrete mappers add their own).
  - findEntitiesForAnonymization(int) — skip-aware variant of
    findEntitiesForFile, filters skip_anonymization=true rows out.
  - findSkippedEntityValuesForFile(int) — joined Entity.value strings
    for skipped rows on a file; backs DocumentProcessingHandler's
    defensive filter.
  - markAsAnonymized gains AND skip_anonymization = 0 predicate.
  - updateDecisionMetadata(EntityRelation, array, ?IUser) — single
    audited write path with whitelist (bases, skipAnonymization),
    shape validation, diff awareness (semantic no-ops do not emit
    audit), and AuditTrail emission via the existing OpenRegister
    audit-trail subsystem.

- EntityRelationsController (NEW) — thin wrapper exposing
  PATCH /api/entity-relations/{id}. @NoAdminRequired. Routes 401 for
  unauthenticated session, 404 for missing relation, 403 when the
  acting user can't write the relation's parent file, 400 on
  whitelist/shape violation, 200 with the updated row otherwise.
  Authorization helper checks isUpdateable() on the file in the
  user's user-folder; object/email-bound relations are accepted
  with a warning log for v1 (tracked as a follow-up).

- DocumentProcessingHandler::anonymizeDocument defensively filters
  skipped occurrences server-side so the OR contract ('skipped
  relations are never redacted, full stop') holds regardless of
  caller filtering behaviour.

- Route registered: PATCH /api/entity-relations/{id}.

- Tests:
  - EntityRelationTest extended (bases + skipAnonymization round-trip,
    addType registrations, jsonSerialize shape).
  - EntityRelationMapperTest (construction wiring).
  - EntityRelationMapperUpdateDecisionMetadataTest (whitelist, shape
    validation, semantic no-ops, diff-aware audit entries, audit
    failure tolerance, actor resolution including system fallback).
  - EntityRelationsControllerTest (401/403/404/400/200/500 HTTP
    contract; named-args dispatch into mapper).
  - 33 tests / 133 assertions all pass against the dev container.

- Documentation:
  - CHANGELOG.md: Added + Behaviour changes entries.
  - docs/Features/entity-relation-decision-metadata.md (NEW) —
    endpoint contract, semantics, DD-side consumption pattern,
    spec refs.

- Spec refinement: updateDecisionMetadata signature takes the loaded
  EntityRelation (caller handles find/404). Cleaner SRP and makes
  the mapper method directly unit-testable without DB. spec.md and
  design.md updated to match.

Quality:
- composer phpcs --standard=phpcs.xml: clean on all touched files.
- openspec validate entity-relation-grondslagen: clean.
- 8.4 manual smoke against live stack deferred to PR review phase.

Refs: #1435
…atedEvent

Adds a post-commit, informational Symfony event fired from
EntityRelationMapper::updateDecisionMetadata after the row update and
audit-trail entry have both succeeded. Enables downstream apps to
react to operator decisions on individual entity relations without
polling — the immediate consumer is DocuDesk's
publication-clearance-via-anonymise change, which subscribes to the
event and creates a publicationConsent record the moment
skipAnonymization flips false → true. That moves the Woo 28-day clock
to decision time instead of anonymise time, avoiding the
422-on-first-anonymise-attempt UX trap that would otherwise hit every
entity needing the standard WOO workflow.

Event carries the post-update EntityRelation, the per-field diff
(same shape as the audit-trail's changedFields payload), and the
acting IUser (or null when no session user). A convenience helper
isSkipAnonymizationActivated() encapsulates the most common listener
trigger.

Failure isolation: listener throws don't mask the persisted state
change or the audit entry — wrapped in try/catch, logged at error
level, method continues. Same contract as the existing audit-emission
isolation.

Not vetoable. Pre-commit cancelable events were considered and
rejected — they would couple downstream apps tightly to OR's
persistence semantics. The DocuDesk consumer handles policy rejection
on its own write path (reverse the PATCH + emit operator
notification), which keeps OR a clean primitive.

Spec amendment in design.md §D6a + tasks.md item 3.2a documents the
contract. Tests cover: dispatch on skipAnonymization flip, dispatch
on bases change, no dispatch on semantic no-op, dispatch failure
isolation. Pre-existing PHPCS debt on tests/Unit/Db/ is out of scope
for this commit (98 errors baseline on the existing
UpdateDecisionMetadataTest predate this change).
…sForFile

Adds a second projected JOIN read method on EntityRelationMapper for
the grondslagen-summary downstream renderer (DocuDesk's
anonymisation-grondslagen-summary change). Differs from the existing
findEntitiesForFile in two ways:

  1. Filters to rows where r.anonymized = true — callers want the
     "what was actually redacted" set, not the "what was detected" set.
  2. Selects r.bases, r.skip_anonymization, r.anonymized,
     r.anonymized_value so the renderer can produce a
     grondslagen-traceable audit page without an N+1 of find($id) calls.

JSON-decodes the bases column into a PHP array on the way out, and
normalises bool-as-int columns to actual booleans (some DB drivers
return them as int).

The existing findEntitiesForFile is unchanged — the anonymise pipeline
doesn't need bases and shouldn't pay the cost of decoding them per
row.

No spec change beyond this addition being a natural extension of the
Wave 1.3 read surface. tests/Unit/Db/EntityRelationMapperTest.php
construction test continues to pass.
… significant)

Addresses the OR #1503 review on the entity-relation decision-metadata
PATCH endpoint. Four blockers (minus the Newman one — see PR comment),
five significant items, and the minor CHANGELOG/migration cleanups.

**Blockers:**

- **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry`
  no longer persists `$user->getDisplayName()` on the audit-trail entry's
  `userName` column (PII). Audit entries now record only the UID; the
  display name is dropped.

- **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and
  email-bound relations no longer fall through with an accept-with-warning
  log. The controller now constructor-injects `PermissionHandler`,
  `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`,
  then routes object-bound relations through
  `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)`
  — the same verdict OR uses everywhere else for object writes — and
  routes email-bound relations through their parent ObjectEntity via the
  EmailLink. Email links without a parent object are denied. `fileId = 0`
  / `objectId = 0` / `emailId = 0` no longer satisfy the file branch.
  Adds `EmailLinkMapper::find(int $id)`.

- **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`.
  This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF
  protection MUST stay on. Spec amended (D7) to record the decision.

**Significant items:**

- **Order-sensitive `bases` diff.** New private
  `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does
  multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now
  a semantic no-op and produces no audit entry. `null` vs `[]` remains
  distinct. Storage preserves operator-supplied order; only the diff
  check normalises.

- **`getParams()` content-type sensitivity.** `update()` rejects requests
  whose Content-Type is anything other than `application/json` (or
  empty) with HTTP 415, so the PATCH never depends on body-parser
  heuristics for the decision-time mutation.

- **Non-transactional audit write.** Row UPDATE + audit-trail INSERT
  now run inside a single `$db->beginTransaction()` / `commit()` /
  `rollBack()`. An audit failure rolls back the UPDATE and surfaces as
  HTTP 500 — clients never observe a persisted decision-metadata change
  without a matching audit entry. The downstream event dispatch stays
  OUTSIDE the transaction (post-commit, informational; listener
  failures still don't roll back). Test
  `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the
  contract.

- **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`,
  `objectId`, `emailId` in the authz dispatch helper.

- **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6,
  3.1–3.8 marked `[x]` to reflect what landed in this PR.

**Minor items:**

- **Test copyright headers.** Added SPDX + Conduction headers to
  `EntityRelationsControllerTest`, `EntityRelationMapperTest`,
  `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest`
  (also fixed the latter's `namespace Unit\Db` typo).

- **CHANGELOG signature.** Corrected the `updateDecisionMetadata`
  signature in the Unreleased entry from `int $id` to
  `EntityRelation $relation` and noted the transactional contract +
  multiset `bases` diff.

- **Migration `postSchemaChange()` comment.** Documented why
  `Version1Date20260512120000` intentionally does not override
  `postSchemaChange` (nullable column + DB-level default; no backfill).

**Spec.** `design.md` §D6 now records the transactional audit-invariant
+ multiset `bases` equality; §D7 documents the authz wiring (file →
`isUpdateable`; object → `PermissionHandler`; email → parent object via
`EmailLink`) and the `@NoCSRFRequired` removal rationale.

**Tests.** 44 tests / 160 assertions across the four mapper +
controller suites pass. New cases:
- multiset bases equality (reordered, duplicates, null-vs-empty).
- audit-failure rolls back update and rethrows.
- 403 on object-bound when `hasPermission` denies.
- 200 on object-bound when `hasPermission` grants.
- 403 on email-bound when EmailLink has no parent object.
- 403 on a relation with no subject (fileId/objectId/emailId all unset).

**Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via
missing `registerService` for autowiring) does not reproduce — see PR
comment. Newman crud fails identically on `development` HEAD (`9d50f4360`,
before this branch existed); the failures are HTTP 400 assertion
errors, not "socket hang up". Autowiring works locally:
`GET /api/entities/stats` (uses `GdprEntitiesController` which
constructor-injects `EntityRelationMapper`) returns HTTP 200 with a
valid payload. Speculative `registerService` block NOT added pending
reviewer's exact failing collection + commit.
…ymise path

The Wave 1.3 spec (`entity-relation-grondslagen`) is explicit: every
file the anonymise flow successfully redacts MUST end with its
EntityRelation rows flipped to `anonymized = 1` (except rows whose
`skip_anonymization = 1`, which `markAsAnonymized` excludes per the
existing WHERE-clause contract). Downstream consumers — DocuDesk's
per-dossier grondslagen summary, audit-trail reporting, anonymise-
status badges — depend on this flag to discriminate "this file went
through redaction" from "this file was scanned but never anonymised."

The HTTP-route path
`FileTextController::anonymizeFile`-> mapper->markAsAnonymized()
honours this. The DI path
`FileService::anonymizeDocument` → `DocumentProcessingHandler::anonymizeDocument`
— which is what DocuDesk's per-file and batch anonymise endpoints
ultimately call — did NOT. It filtered by skip, built the
replacements map, wrote the anonymised file, and returned without
ever touching `anonymized` / `anonymized_value`. Every relation on
every file DocuDesk redacted stayed at `anonymized = 0`, so:

  - the DI parallel was silently inconsistent with the HTTP path,
  - DocuDesk's `GrondslagenSummaryService::renderDossierSummary`
    (which queries `findAnonymisedEntitiesWithBasesForFile`, filtered
    on `r.anonymized = 1`) saw empty result sets and produced reports
    saying "Dit dossier bevat geen geanonimiseerde documenten" even
    immediately after a folder-wide anonymise that updated dozens of
    rows with `bases` and produced multiple anonymised PDFs.

**Fix.** After `replaceWords` returns the anonymised file, call
`entityRelationMapper->markAsAnonymized($sourceFileId, '[REDACTED]')`
on the source's file id. The mapper's `AND skip_anonymization = 0`
predicate preserves the operator's skip decisions per the existing
contract. The placeholder value `[REDACTED]` is a generic
representative — each entity received its own `[TYPE: key]`
replacement key earlier in `anonymizeDocument`, but the
`anonymized_value` column is a single per-file string rather than a
per-row list, so we record a sentinel.

**Failure isolation.** The mark step is wrapped in `try/catch`. A
mapper failure does NOT mask the successful redaction — the
anonymised file is already on disk by the time we get here, and
re-throwing would force the controller to surface a "your file is
gone" error when in fact the file is fine and only the audit flag
update failed. The retry path (subsequent anonymise call on the same
file) re-marks idempotently, and the warning log lets operators
catch the divergence between disk state and DB state.

Guard: skips the call when `$replacements` is empty (no entities to
mark anonymised) or when `$fileId` is non-positive (synthetic /
non-persisted node).
…laceholder in DI anonymise path

`DocumentProcessingHandler::anonymizeDocument` previously generated
substitution placeholders from a fresh UUID prefix per call:

    $key = $entity['key'] ?? substr(Uuid::v4()->toRfc4122(), 0, 8);
    $replacements[$originalText] = '['.$entityType.': '.$key.']';

Two real problems:

1. **Non-idempotent.** Re-anonymising the same document with the
   same inputs produced byte-divergent output because the UUID
   prefix changed every call. Bad for forensics, bad for caching,
   bad for diff-based reviews.

2. **Reports can't reference the actual placeholder.** DocuDesk's
   grondslagen-summary report needs to show "for this anonymised
   file, the placeholder `X` corresponds to grondslag `Y`". With
   per-call UUID prefixes, the report can either (a) ingest the
   anonymised file and re-parse placeholders out of it (fragile)
   or (b) display a synthetic tag that doesn't match what's in
   the file (confusing). Neither is acceptable.

**Fix.** Look up the stable `entity_id` from existing EntityRelation
rows on the file and use it as the placeholder suffix:

    $entityIdMap = $entityRelationMapper->findEntityIdsByValueForFile($fileId);
    if (isset($entityIdMap[$originalText])) {
        $replacements[$originalText] = '['.$entityType.': '.$entityIdMap[$originalText]['id'].']';
    } else {
        // Legacy fallback — entity not in relations table.
        $key = $entity['key'] ?? substr(Uuid::v4()->toRfc4122(), 0, 8);
        $replacements[$originalText] = '['.$entityType.': '.$key.']';
    }

New mapper helper `EntityRelationMapper::findEntityIdsByValueForFile(
int $fileId): array<string $value, array{id: int, type: string}>`
returns the distinct `(entity_value, entity_id, entity_type)` set
for the file via a single JOIN. Same value mapping to multiple types
takes the first match — callers discriminate at the substitution
site with their own `entityType`.

Properties of the new placeholder format:

- **Stable across runs.** Same entity in same file always produces
  the same placeholder. Re-anonymise is now idempotent.
- **Stable across files.** Same entity in different files produces
  the same placeholder. Cross-file forensics (`[PERSON: 42]`
  appears in N anonymised files) becomes a meaningful query.
- **Matches the report.** DocuDesk's grondslagen-summary report
  generates the same tag from the same data; the report column
  and the in-file substitution are byte-identical.

**Fallback retained** for direct DI callers that bypass extraction
and pass entities not present in the relations table. The fallback
keeps the legacy UUID-prefix behaviour so existing tests / external
consumers don't break — they just don't get the stability benefit.

**Drive-by clean-up.** Pulled the `$fileId` resolution out of the
nested `method_exists($node, 'getId')` block and into a single
top-of-method variable so the later `markAsAnonymized` and new
`findEntityIdsByValueForFile` calls share a guarded local. Hides a
latent (but harmless) undefined-variable warning on the post-write
flag flip when `$node` lacked `getId`.
@rubenvdlinde rubenvdlinde force-pushed the feat/1435/entity-relation-grondslagen-impl branch from 0808d18 to 6d81c12 Compare May 18, 2026 19:34
- EmailServiceTest: PHPUnit 10+ strict separation — move `find` from
  addMethods() into onlyMethods() since EmailLinkMapper::find() exists
  in this PR's lib.
- FileTextControllerTest: anonymise flow now reads pre-detected entities
  via EntityRelationMapper::findEntitiesForAnonymization() instead of
  findEntitiesForFile(); update mock method names in 4 spots.
Base automatically changed from development to documentation May 18, 2026 20:38
@rjzondervan rjzondervan changed the base branch from documentation to development May 19, 2026 09:47
… + 2 concerns)

🔴 Double markAsAnonymized on the HTTP path — FileTextController::anonymizeFile
called markAsAnonymized($fileId, 'anonymized_<date>') after
DocumentProcessingHandler::anonymizeDocument already called
markAsAnonymized($fileId, '[REDACTED]'). The second UPDATE overwrote the
first, leaving rows with an anonymizedValue that didn't match what was
actually written to the file content. Pick the canonical caller per
option (b) — DocumentProcessingHandler is on every redaction path (HTTP,
DI, event listeners); the controller is bypassed by non-HTTP paths.
Remove the controller-side call; update the two affected FileTextControllerTest
expectations from once() → never() with explanatory comments.

🟡 DI wiring inconsistency — EntityRelationMapper is the only
IEventDispatcher-dependent mapper without an explicit registerService()
block. Other event-dispatcher-dependent mappers (SchemaMapper,
RegisterMapper, MagicMapper, WebhookMapper) are wired explicitly. Add
the EntityRelationMapper block next to MagicMapper, matching the same
pattern. Removes the autowiring question.

🟡 Placeholder format unspec'd + correlation acknowledgement — add two
ADDED Requirements to the spec: (1) the DI anonymise path MUST substitute
each entity using the stable [<TYPE>: <entity_id>] placeholder format,
with three scenarios covering stable id, idempotency on re-anonymise,
and the fallback to UUID fragment for entries missing from the entity
catalogue; (2) DocumentProcessingHandler is the sole canonical caller
of markAsAnonymized in the redaction flow. Also add a Notes section
documenting the cross-document correlation property of the placeholder
choice and the privacy threat model (within-tenant correlation is
in-scope and required by the consumer report).

openspec validate entity-relation-grondslagen — clean.
@rjzondervan
Copy link
Copy Markdown
Member Author

@WilcoLouwerse — addressed in eb02e6b41. Details below mapped to the three findings.

🔴 Double markAsAnonymized on the HTTP path

Picked option (b) per your recommendation: DocumentProcessingHandler::anonymizeDocument is now the sole canonical caller. The controller-side markAsAnonymized($fileId, 'anonymized_<date>') in FileTextController::anonymizeFile is removed, with an inline comment explaining why a controller-side call would race the DI-path call and clobber anonymized_value.

Value semantics: kept '[REDACTED]' (already in DocumentProcessingHandler since fb512791b) as the canonical column marker. The 'anonymized_<date>' timestamp form was incidental — no consumer queries depend on it; the column's purpose is "this file's relations have been redacted at least once" + a generic marker for audit log readability. Per-entity placeholders live in the file content ([<TYPE>: <entity_id>]) and the audit-trail row records the actor + timestamp, so the column doesn't need to carry redundant time info.

Tests updated: FileTextControllerTest::testAnonymizeFileWithSpecialEntities and ::testAnonymizeFileDeduplicatesEntities now assert expects($this->never())->method('markAsAnonymized') on the controller-side mock with explanatory comments. The DocumentProcessingHandler-side call is mocked through $this->fileService in these tests; coverage of the actual mark belongs in DocumentProcessingHandlerTest (which is a separate piece of work; flagged for follow-up if not already there).

🟡 DI wiring — explicit registerService() for EntityRelationMapper

Added the explicit block in Application::registerMappersWithCircularDependencies() immediately after MagicMapper, mirroring the pattern (constructor injects: IDBConnection, AuditTrailMapper, IUserSession, IEventDispatcher, LoggerInterface). Comment explains the rationale (every other event-dispatcher-dependent mapper is explicit; autowiring of framework-interface keys is version-sensitive). Removes the autowiring question entirely.

🟡 [<TYPE>: <entity_id>] placeholder is unspec'd + sequential-DB-integer correlation risk

Two new ADDED Requirements in specs/entity-relation-grondslagen/spec.md:

  1. Placeholder format requirement — captures the exact shape ([<TYPE>: <entity_id>]), the entity_id source (openregister_entities PK, looked up via findEntityIdsByValueForFile), the whitespace contract (exactly one space after the colon), and the fallback to an 8-char UUID fragment when an entity row is missing for DI callers bypassing the standard detection path. Three scenarios: stable placeholder, byte-identical re-anonymise, fallback for missing rows.
  2. Canonical-caller requirement — exactly ONE component (DocumentProcessingHandler::anonymizeDocument) calls markAsAnonymized per redaction. HTTP controllers, DI orchestrators, and event listeners MUST NOT call it independently. Two scenarios covering HTTP-call and DI-call paths.

Plus a Notes section "Placeholder format and cross-document correlation" addressing the correlation concern explicitly:

  • The within-tenant correlation property is deliberate, not a leak — consumer reports (grondslagen-summary in DocuDesk) thread entities through dossier files using these stable ids; the feature wouldn't work without them.
  • Cross-tenant correlation is impossible (per-tenant DB, independent PK sequences).
  • HMAC with per-tenant salt was considered — it would carry the same stability and the same correlation, so doesn't change the privacy property; we kept raw entity_id for simplicity.
  • Per-document salt would defeat correlation but break the consumer use case.

openspec validate entity-relation-grondslagen — clean.

Re-requesting review.

Copy link
Copy Markdown
Contributor

@WilcoLouwerse WilcoLouwerse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review verdict — APPROVE

Commit eb02e6b41 addresses the three findings from my 2026-05-18 re-review, and the four findings from the 2026-05-15 round are confirmed resolved by e34eac991. CI Newman API Test Suite is passing on HEAD.

Resolved this round (eb02e6b)

  • 🔴 Double markAsAnonymized on HTTP path — option (b) chosen; DocumentProcessingHandler::anonymizeDocument is sole canonical caller (writes '[REDACTED]'), FileTextController::anonymizeFile no longer marks. Tests now expects($this->never()). Call chain verified end-to-end.
  • 🟡 DI wiring — explicit registerService for EntityRelationMapper — explicit factory block added in Application::registerMappersWithCircularDependencies(), mirroring the pattern used by SchemaMapper / MagicMapper / WebhookMapper.
  • 🟡 Placeholder format unspec'd + correlation risk — two new Requirements in specs/entity-relation-grondslagen/spec.md (placeholder format with three scenarios incl. fallback; sole canonical caller for markAsAnonymized), plus a "Placeholder format and cross-document correlation" Notes section explicitly framing within-tenant correlation as deliberate (Woo dossier consumer requirement), with cross-tenant impossibility and HMAC alternatives discussed.

Resolved earlier (e34eac9)

Minor non-blocking notes for follow-up

  • DocumentProcessingHandler::anonymizeDocument does not emit the warning log that the new spec's fallback scenario says implementations SHOULD log when no matching entity-catalogue row is found. SHOULD-level, non-blocking.
  • The fallback uses $entity['key'] ?? substr(uuid, 0, 8) rather than always a UUID fragment, so a DI caller supplying 'key' can influence the placeholder content. Spec says MAY, so within bounds.

LGTM. 🟢

rjzondervan added a commit that referenced this pull request May 19, 2026
Adds the `react/async` package so the OR combined test branch can resolve
`React\Async\await()` at runtime. The fault path emerged during DocuDesk's
folder-anonymisation flow (entity-relation-grondslagen DI path) where the
async helper is called by ObjectService.

Test-branch-only commit. Not part of any merged PR scope; will land on
development through the appropriate feature branch later (see PR #1503
discussion).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants