Implement entity-relation-grondslagen — decision-metadata PATCH endpoint (#1435)#1503
Implement entity-relation-grondslagen — decision-metadata PATCH endpoint (#1435)#1503rjzondervan wants to merge 11 commits into
Conversation
Quality Report — ConductionNL/openregister @
|
| Check | PHP | Vue | Security | License | Tests |
|---|---|---|---|---|---|
| lint | ✅ | ||||
| phpcs | ✅ | ||||
| phpmd | ✅ | ||||
| psalm | ✅ | ||||
| phpstan | ✅ | ||||
| phpmetrics | ✅ | ||||
| eslint | ❌ | ||||
| stylelint | ✅ | ||||
| composer | ✅ | ✅ 153/153 | |||
| npm | ✅ | ✅ 598/598 | |||
| PHPUnit | ❌ | ||||
| Newman | ⏭️ | ||||
| Playwright | ⏭️ |
Quality workflow — 2026-05-12 14:28 UTC
Download the full PDF report from the workflow artifacts.
WilcoLouwerse
left a comment
There was a problem hiding this comment.
REQUEST_CHANGES — Strict mode. 4 blockers (including 1 new from the most recent commit), 5 significant issues.
🔴 Blockers (4): Auth bypass for object/email-bound relations; @NoCSRFRequired on a state-mutating PATCH without justification; ADR-005 violation (getDisplayName() stored as audit actor instead of UID); Newman regression introduced by commit b8949d17 — Newman API Test Suite passed on 139e1667 and now fails with "socket hang up" on every request after the IEventDispatcher constructor parameter was added to EntityRelationMapper without a matching explicit registerService in Application.php.
🟡 Significant (5): Order-sensitive bases comparison causes spurious audit entries; getParams() content-type sensitivity; non-transactional audit write; fileId = 0 bypasses file auth; tasks.md item 3.2a left unchecked despite being implemented in this commit.
🟢 Minor (3): Missing @copyright on 3 new test files (gate-1); missing postSchemaChange() comment in migration; CHANGELOG.md documents the wrong method signature for updateDecisionMetadata.
CI note: PHPUnit failures on quality / PHPUnit (PHP 8.3/8.4, NC stable32) are pre-existing on the development branch. The Newman failure IS new — it regressed in this PR.
Re: Newman regression attributed to
|
… significant) Addresses the OR #1503 review on the entity-relation decision-metadata PATCH endpoint. Four blockers (minus the Newman one — see PR comment), five significant items, and the minor CHANGELOG/migration cleanups. **Blockers:** - **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry` no longer persists `$user->getDisplayName()` on the audit-trail entry's `userName` column (PII). Audit entries now record only the UID; the display name is dropped. - **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and email-bound relations no longer fall through with an accept-with-warning log. The controller now constructor-injects `PermissionHandler`, `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`, then routes object-bound relations through `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)` — the same verdict OR uses everywhere else for object writes — and routes email-bound relations through their parent ObjectEntity via the EmailLink. Email links without a parent object are denied. `fileId = 0` / `objectId = 0` / `emailId = 0` no longer satisfy the file branch. Adds `EmailLinkMapper::find(int $id)`. - **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`. This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF protection MUST stay on. Spec amended (D7) to record the decision. **Significant items:** - **Order-sensitive `bases` diff.** New private `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now a semantic no-op and produces no audit entry. `null` vs `[]` remains distinct. Storage preserves operator-supplied order; only the diff check normalises. - **`getParams()` content-type sensitivity.** `update()` rejects requests whose Content-Type is anything other than `application/json` (or empty) with HTTP 415, so the PATCH never depends on body-parser heuristics for the decision-time mutation. - **Non-transactional audit write.** Row UPDATE + audit-trail INSERT now run inside a single `$db->beginTransaction()` / `commit()` / `rollBack()`. An audit failure rolls back the UPDATE and surfaces as HTTP 500 — clients never observe a persisted decision-metadata change without a matching audit entry. The downstream event dispatch stays OUTSIDE the transaction (post-commit, informational; listener failures still don't roll back). Test `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the contract. - **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`, `objectId`, `emailId` in the authz dispatch helper. - **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6, 3.1–3.8 marked `[x]` to reflect what landed in this PR. **Minor items:** - **Test copyright headers.** Added SPDX + Conduction headers to `EntityRelationsControllerTest`, `EntityRelationMapperTest`, `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest` (also fixed the latter's `namespace Unit\Db` typo). - **CHANGELOG signature.** Corrected the `updateDecisionMetadata` signature in the Unreleased entry from `int $id` to `EntityRelation $relation` and noted the transactional contract + multiset `bases` diff. - **Migration `postSchemaChange()` comment.** Documented why `Version1Date20260512120000` intentionally does not override `postSchemaChange` (nullable column + DB-level default; no backfill). **Spec.** `design.md` §D6 now records the transactional audit-invariant + multiset `bases` equality; §D7 documents the authz wiring (file → `isUpdateable`; object → `PermissionHandler`; email → parent object via `EmailLink`) and the `@NoCSRFRequired` removal rationale. **Tests.** 44 tests / 160 assertions across the four mapper + controller suites pass. New cases: - multiset bases equality (reordered, duplicates, null-vs-empty). - audit-failure rolls back update and rethrows. - 403 on object-bound when `hasPermission` denies. - 200 on object-bound when `hasPermission` grants. - 403 on email-bound when EmailLink has no parent object. - 403 on a relation with no subject (fileId/objectId/emailId all unset). **Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via missing `registerService` for autowiring) does not reproduce — see PR comment. Newman crud fails identically on `development` HEAD (`9d50f4360`, before this branch existed); the failures are HTTP 400 assertion errors, not "socket hang up". Autowiring works locally: `GET /api/entities/stats` (uses `GdprEntitiesController` which constructor-injects `EntityRelationMapper`) returns HTTP 200 with a valid payload. Speculative `registerService` block NOT added pending reviewer's exact failing collection + commit.
Review response — commit
|
WilcoLouwerse
left a comment
There was a problem hiding this comment.
Re-review (Strict). 3 of 4 prior findings resolved: 🔴 @NoCSRFRequired removed and spec amended (design.md §D7); 🔴 auth bypass closed — all three branches (file/object/email) now run an authorisation check, ALL-IDs-null/0 case correctly denies; 🔴 ADR-005 actor — getDisplayName() no longer persisted, audit row records only the UID. Transactional audit-write, HTTP 415 on non-JSON, multiset basesAreEqual, migration idempotency — all verified.
Newman/DI: accepting the pushback that the failure can't be conclusively diagnosed from the diff alone, but downgraded to a 🟡 follow-up because EntityRelationMapper is the only IEventDispatcher-dependent mapper without an explicit registerService() block, inconsistent with the codebase pattern (see inline).
Blocker (1 🔴):
- Double
markAsAnonymizedon the HTTP path — commitfb512791added amarkAsAnonymized($fileId, '[REDACTED]')at the end ofDocumentProcessingHandler::anonymizeDocument, butFileTextController::anonymizeFile:547also callsmarkAsAnonymizedwith'anonymized_<date>'afterwards. The second UPDATE overwrites the first'sanonymized_value, leaving rows with a value that doesn't match the placeholder actually written to the file.
Concerns (2 🟡):
- DI wiring inconsistency — see Newman note above.
[<TYPE>: <entity_id>]placeholder format is unspec'd — landed today in86f2d18803post-author's response; needs a spec entry, and the sequential-DB-integer choice has mild cross-document correlation risk.
… significant) Addresses the OR #1503 review on the entity-relation decision-metadata PATCH endpoint. Four blockers (minus the Newman one — see PR comment), five significant items, and the minor CHANGELOG/migration cleanups. **Blockers:** - **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry` no longer persists `$user->getDisplayName()` on the audit-trail entry's `userName` column (PII). Audit entries now record only the UID; the display name is dropped. - **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and email-bound relations no longer fall through with an accept-with-warning log. The controller now constructor-injects `PermissionHandler`, `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`, then routes object-bound relations through `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)` — the same verdict OR uses everywhere else for object writes — and routes email-bound relations through their parent ObjectEntity via the EmailLink. Email links without a parent object are denied. `fileId = 0` / `objectId = 0` / `emailId = 0` no longer satisfy the file branch. Adds `EmailLinkMapper::find(int $id)`. - **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`. This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF protection MUST stay on. Spec amended (D7) to record the decision. **Significant items:** - **Order-sensitive `bases` diff.** New private `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now a semantic no-op and produces no audit entry. `null` vs `[]` remains distinct. Storage preserves operator-supplied order; only the diff check normalises. - **`getParams()` content-type sensitivity.** `update()` rejects requests whose Content-Type is anything other than `application/json` (or empty) with HTTP 415, so the PATCH never depends on body-parser heuristics for the decision-time mutation. - **Non-transactional audit write.** Row UPDATE + audit-trail INSERT now run inside a single `$db->beginTransaction()` / `commit()` / `rollBack()`. An audit failure rolls back the UPDATE and surfaces as HTTP 500 — clients never observe a persisted decision-metadata change without a matching audit entry. The downstream event dispatch stays OUTSIDE the transaction (post-commit, informational; listener failures still don't roll back). Test `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the contract. - **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`, `objectId`, `emailId` in the authz dispatch helper. - **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6, 3.1–3.8 marked `[x]` to reflect what landed in this PR. **Minor items:** - **Test copyright headers.** Added SPDX + Conduction headers to `EntityRelationsControllerTest`, `EntityRelationMapperTest`, `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest` (also fixed the latter's `namespace Unit\Db` typo). - **CHANGELOG signature.** Corrected the `updateDecisionMetadata` signature in the Unreleased entry from `int $id` to `EntityRelation $relation` and noted the transactional contract + multiset `bases` diff. - **Migration `postSchemaChange()` comment.** Documented why `Version1Date20260512120000` intentionally does not override `postSchemaChange` (nullable column + DB-level default; no backfill). **Spec.** `design.md` §D6 now records the transactional audit-invariant + multiset `bases` equality; §D7 documents the authz wiring (file → `isUpdateable`; object → `PermissionHandler`; email → parent object via `EmailLink`) and the `@NoCSRFRequired` removal rationale. **Tests.** 44 tests / 160 assertions across the four mapper + controller suites pass. New cases: - multiset bases equality (reordered, duplicates, null-vs-empty). - audit-failure rolls back update and rethrows. - 403 on object-bound when `hasPermission` denies. - 200 on object-bound when `hasPermission` grants. - 403 on email-bound when EmailLink has no parent object. - 403 on a relation with no subject (fileId/objectId/emailId all unset). **Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via missing `registerService` for autowiring) does not reproduce — see PR comment. Newman crud fails identically on `development` HEAD (`9d50f4360`, before this branch existed); the failures are HTTP 400 assertion errors, not "socket hang up". Autowiring works locally: `GET /api/entities/stats` (uses `GdprEntitiesController` which constructor-injects `EntityRelationMapper`) returns HTTP 200 with a valid payload. Speculative `registerService` block NOT added pending reviewer's exact failing collection + commit.
86f2d18 to
0ee978b
Compare
|
Rebased on top of latest
Resolved 2 conflicts:
|
… significant) Addresses the OR #1503 review on the entity-relation decision-metadata PATCH endpoint. Four blockers (minus the Newman one — see PR comment), five significant items, and the minor CHANGELOG/migration cleanups. **Blockers:** - **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry` no longer persists `$user->getDisplayName()` on the audit-trail entry's `userName` column (PII). Audit entries now record only the UID; the display name is dropped. - **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and email-bound relations no longer fall through with an accept-with-warning log. The controller now constructor-injects `PermissionHandler`, `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`, then routes object-bound relations through `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)` — the same verdict OR uses everywhere else for object writes — and routes email-bound relations through their parent ObjectEntity via the EmailLink. Email links without a parent object are denied. `fileId = 0` / `objectId = 0` / `emailId = 0` no longer satisfy the file branch. Adds `EmailLinkMapper::find(int $id)`. - **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`. This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF protection MUST stay on. Spec amended (D7) to record the decision. **Significant items:** - **Order-sensitive `bases` diff.** New private `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now a semantic no-op and produces no audit entry. `null` vs `[]` remains distinct. Storage preserves operator-supplied order; only the diff check normalises. - **`getParams()` content-type sensitivity.** `update()` rejects requests whose Content-Type is anything other than `application/json` (or empty) with HTTP 415, so the PATCH never depends on body-parser heuristics for the decision-time mutation. - **Non-transactional audit write.** Row UPDATE + audit-trail INSERT now run inside a single `$db->beginTransaction()` / `commit()` / `rollBack()`. An audit failure rolls back the UPDATE and surfaces as HTTP 500 — clients never observe a persisted decision-metadata change without a matching audit entry. The downstream event dispatch stays OUTSIDE the transaction (post-commit, informational; listener failures still don't roll back). Test `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the contract. - **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`, `objectId`, `emailId` in the authz dispatch helper. - **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6, 3.1–3.8 marked `[x]` to reflect what landed in this PR. **Minor items:** - **Test copyright headers.** Added SPDX + Conduction headers to `EntityRelationsControllerTest`, `EntityRelationMapperTest`, `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest` (also fixed the latter's `namespace Unit\Db` typo). - **CHANGELOG signature.** Corrected the `updateDecisionMetadata` signature in the Unreleased entry from `int $id` to `EntityRelation $relation` and noted the transactional contract + multiset `bases` diff. - **Migration `postSchemaChange()` comment.** Documented why `Version1Date20260512120000` intentionally does not override `postSchemaChange` (nullable column + DB-level default; no backfill). **Spec.** `design.md` §D6 now records the transactional audit-invariant + multiset `bases` equality; §D7 documents the authz wiring (file → `isUpdateable`; object → `PermissionHandler`; email → parent object via `EmailLink`) and the `@NoCSRFRequired` removal rationale. **Tests.** 44 tests / 160 assertions across the four mapper + controller suites pass. New cases: - multiset bases equality (reordered, duplicates, null-vs-empty). - audit-failure rolls back update and rethrows. - 403 on object-bound when `hasPermission` denies. - 200 on object-bound when `hasPermission` grants. - 403 on email-bound when EmailLink has no parent object. - 403 on a relation with no subject (fileId/objectId/emailId all unset). **Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via missing `registerService` for autowiring) does not reproduce — see PR comment. Newman crud fails identically on `development` HEAD (`9d50f4360`, before this branch existed); the failures are HTTP 400 assertion errors, not "socket hang up". Autowiring works locally: `GET /api/entities/stats` (uses `GdprEntitiesController` which constructor-injects `EntityRelationMapper`) returns HTTP 200 with a valid payload. Speculative `registerService` block NOT added pending reviewer's exact failing collection + commit.
0ee978b to
0808d18
Compare
Closes #1564. The static `self::\$constructCount` counter + `> 2` guard in `MagicMapper::__construct` was a March-2026 holdover from a refactor's debugging session — added in ef8ed0d, kept across the `file_put_contents('/tmp/or-debug.log', ...)` debug logging that PR #1565 just removed. The two circular-DI paths the guard was protecting against have both been broken via lazy container resolution since: - `CacheHandler → RegisterMapper → MagicMapper` — `CacheHandler` and `ICacheFactory` are now resolved lazily inside `initializeHandlers()` (commented in-line). - `SettingsService → MagicMapper` — `SettingsService` removed `MagicMapper` from its constructor signature (private docblock at SettingsService.php:243 marks "REMOVED: Object mapper"). The full PHPUnit suite (12,226 tests / 25,949 assertions) is green after removing the guard — proving the cycle no longer trips in any covered path. ## What changed - `lib/Db/MagicMapper.php` — dropped `\$constructCount` static + the guard branch from `__construct()`. Constructor now goes straight to `\$this->initializeHandlers()`. - `tests/Unit/Service/MagicMapperTest.php` — removed the reflection block that reset the static counter between tests (it referenced a property that no longer exists). ## Verification - `composer phpcs` on `MagicMapper.php` → 0 errors - `phpunit --filter MagicMapperTest` → 29/29 pass - `phpunit -c phpunit-unit.xml` (full) → 12,226 tests / 25,949 assertions / 0 errors / 0 failures
) Newman runs from the host targeting http://localhost:8080. Previously the workflow only waited for `occ status` to report "installed: true" — but OCC uses the PHP CLI and talks to the DB directly, so it returns installed before Apache has finished bringing up the HTTP listener. Newman then immediately fired requests that all came back `[errored]` (connection refused / port not yet forwarded), failing the entire api-test-coverage workflow on every PR. Added a step between "Enable openregister app" and "Run Newman orchestrator" that: 1. Polls `http://localhost:8080/status.php` until Apache responds — proves the host-mapped port works. 2. Polls `http://localhost:8080/index.php/apps/openregister/api/settings/rbac` until the OpenRegister app is dispatching (accepts 200/401/405 as evidence of dispatch — the suite re-auths anyway). 3. Does one final `curl -fsS` against the same endpoint and fails fast with a clear ::error:: marker if the API is still down, so the failure surfaces before Newman runs and dumps 100s of `[errored]` lines. Both polls cap at 60 attempts × 2s = 120s. Local dev unaffected (orchestrator only runs in CI from this workflow).
…findings
Original spec described 'anonymise endpoint accepts bases, persists,
strips before forwarding to OpenAnonymiser'. Two architectural problems
surfaced during apply:
1. OpenAnonymiser is called during DETECTION (EntityRecognitionHandler),
not at anonymise time. The 'strip before forward' Requirement had no
counterpart in the code.
2. OR's anonymise endpoint takes a fileId and reads pre-detected
relations from the DB; it has no payload-driven entity list and
never forwarded entity arrays anywhere external.
Decoupled the bases-set verb from the anonymise verb. New shape:
- NEW: PATCH /api/entity-relations/{id} with decision-only whitelist
{ bases, skipAnonymization }. Backed by single audited DI method
EntityRelationMapper::updateDecisionMetadata. HTTP controller is a
thin wrapper.
- NEW: skip_anonymization boolean column on EntityRelation (default
false). Operator decision: 'do not redact this occurrence'.
- MODIFIED: anonymise flow honours the skip flag. markAsAnonymized
gains 'AND skip_anonymization = 0' predicate; FileService defensively
filters skipped rows server-side.
- UNCHANGED: anonymise endpoint URLs + signatures. OpenAnonymiser
detection wiring. markAsAnonymized's role as the writer of
anonymized/anonymizedValue.
Whitelist is decision-only: anonymized and anonymizedValue are NOT
editable via PATCH. Those are post-hoc system fields written only by
markAsAnonymized — letting operators flip them would forge audit
history.
Per-relation storage gives per-position granularity natively. Consumer
UIs (DocuDesk's review) may aggregate at entity level for UX, but the
OR surface preserves the finer granularity.
Cross-app impact: DD's anonymisation-grondslagen-and-prohibition-gate
spec was written against the old contract (bases on anonymise payload);
it needs a parallel amendment to call the new PATCH endpoint instead.
Flagged in the proposal under Out of scope.
Refs: #1435
…endpoint
Implements the post-explore-mode-rework spec. New surfaces:
- DB migration adds two columns to oc_openregister_entity_relations:
- bases (JSON, nullable)
- skip_anonymization (BOOLEAN, default false)
Both idempotent via hasColumn guards.
- EntityRelation entity gains getBases/setBases and
getSkipAnonymization/setSkipAnonymization, registered via addType,
and serialised in jsonSerialize.
- EntityRelationMapper:
- find(int) — explicit concrete find by primary id (QBMapper only
declares it via @method docblock; concrete mappers add their own).
- findEntitiesForAnonymization(int) — skip-aware variant of
findEntitiesForFile, filters skip_anonymization=true rows out.
- findSkippedEntityValuesForFile(int) — joined Entity.value strings
for skipped rows on a file; backs DocumentProcessingHandler's
defensive filter.
- markAsAnonymized gains AND skip_anonymization = 0 predicate.
- updateDecisionMetadata(EntityRelation, array, ?IUser) — single
audited write path with whitelist (bases, skipAnonymization),
shape validation, diff awareness (semantic no-ops do not emit
audit), and AuditTrail emission via the existing OpenRegister
audit-trail subsystem.
- EntityRelationsController (NEW) — thin wrapper exposing
PATCH /api/entity-relations/{id}. @NoAdminRequired. Routes 401 for
unauthenticated session, 404 for missing relation, 403 when the
acting user can't write the relation's parent file, 400 on
whitelist/shape violation, 200 with the updated row otherwise.
Authorization helper checks isUpdateable() on the file in the
user's user-folder; object/email-bound relations are accepted
with a warning log for v1 (tracked as a follow-up).
- DocumentProcessingHandler::anonymizeDocument defensively filters
skipped occurrences server-side so the OR contract ('skipped
relations are never redacted, full stop') holds regardless of
caller filtering behaviour.
- Route registered: PATCH /api/entity-relations/{id}.
- Tests:
- EntityRelationTest extended (bases + skipAnonymization round-trip,
addType registrations, jsonSerialize shape).
- EntityRelationMapperTest (construction wiring).
- EntityRelationMapperUpdateDecisionMetadataTest (whitelist, shape
validation, semantic no-ops, diff-aware audit entries, audit
failure tolerance, actor resolution including system fallback).
- EntityRelationsControllerTest (401/403/404/400/200/500 HTTP
contract; named-args dispatch into mapper).
- 33 tests / 133 assertions all pass against the dev container.
- Documentation:
- CHANGELOG.md: Added + Behaviour changes entries.
- docs/Features/entity-relation-decision-metadata.md (NEW) —
endpoint contract, semantics, DD-side consumption pattern,
spec refs.
- Spec refinement: updateDecisionMetadata signature takes the loaded
EntityRelation (caller handles find/404). Cleaner SRP and makes
the mapper method directly unit-testable without DB. spec.md and
design.md updated to match.
Quality:
- composer phpcs --standard=phpcs.xml: clean on all touched files.
- openspec validate entity-relation-grondslagen: clean.
- 8.4 manual smoke against live stack deferred to PR review phase.
Refs: #1435
…atedEvent Adds a post-commit, informational Symfony event fired from EntityRelationMapper::updateDecisionMetadata after the row update and audit-trail entry have both succeeded. Enables downstream apps to react to operator decisions on individual entity relations without polling — the immediate consumer is DocuDesk's publication-clearance-via-anonymise change, which subscribes to the event and creates a publicationConsent record the moment skipAnonymization flips false → true. That moves the Woo 28-day clock to decision time instead of anonymise time, avoiding the 422-on-first-anonymise-attempt UX trap that would otherwise hit every entity needing the standard WOO workflow. Event carries the post-update EntityRelation, the per-field diff (same shape as the audit-trail's changedFields payload), and the acting IUser (or null when no session user). A convenience helper isSkipAnonymizationActivated() encapsulates the most common listener trigger. Failure isolation: listener throws don't mask the persisted state change or the audit entry — wrapped in try/catch, logged at error level, method continues. Same contract as the existing audit-emission isolation. Not vetoable. Pre-commit cancelable events were considered and rejected — they would couple downstream apps tightly to OR's persistence semantics. The DocuDesk consumer handles policy rejection on its own write path (reverse the PATCH + emit operator notification), which keeps OR a clean primitive. Spec amendment in design.md §D6a + tasks.md item 3.2a documents the contract. Tests cover: dispatch on skipAnonymization flip, dispatch on bases change, no dispatch on semantic no-op, dispatch failure isolation. Pre-existing PHPCS debt on tests/Unit/Db/ is out of scope for this commit (98 errors baseline on the existing UpdateDecisionMetadataTest predate this change).
…sForFile
Adds a second projected JOIN read method on EntityRelationMapper for
the grondslagen-summary downstream renderer (DocuDesk's
anonymisation-grondslagen-summary change). Differs from the existing
findEntitiesForFile in two ways:
1. Filters to rows where r.anonymized = true — callers want the
"what was actually redacted" set, not the "what was detected" set.
2. Selects r.bases, r.skip_anonymization, r.anonymized,
r.anonymized_value so the renderer can produce a
grondslagen-traceable audit page without an N+1 of find($id) calls.
JSON-decodes the bases column into a PHP array on the way out, and
normalises bool-as-int columns to actual booleans (some DB drivers
return them as int).
The existing findEntitiesForFile is unchanged — the anonymise pipeline
doesn't need bases and shouldn't pay the cost of decoding them per
row.
No spec change beyond this addition being a natural extension of the
Wave 1.3 read surface. tests/Unit/Db/EntityRelationMapperTest.php
construction test continues to pass.
… significant) Addresses the OR #1503 review on the entity-relation decision-metadata PATCH endpoint. Four blockers (minus the Newman one — see PR comment), five significant items, and the minor CHANGELOG/migration cleanups. **Blockers:** - **ADR-005 audit actor.** `EntityRelationMapper::emitDecisionMetadataAuditEntry` no longer persists `$user->getDisplayName()` on the audit-trail entry's `userName` column (PII). Audit entries now record only the UID; the display name is dropped. - **Auth bypass on PATCH /api/entity-relations/{id}.** Object-bound and email-bound relations no longer fall through with an accept-with-warning log. The controller now constructor-injects `PermissionHandler`, `MagicMapper`, `SchemaMapper`, `RegisterMapper`, and `EmailLinkMapper`, then routes object-bound relations through `PermissionHandler::hasPermission(schema, 'update', uid, owner, object)` — the same verdict OR uses everywhere else for object writes — and routes email-bound relations through their parent ObjectEntity via the EmailLink. Email links without a parent object are denied. `fileId = 0` / `objectId = 0` / `emailId = 0` no longer satisfy the file branch. Adds `EmailLinkMapper::find(int $id)`. - **`@NoCSRFRequired`.** Removed from `EntityRelationsController::update`. This is a browser-facing decision-time PATCH; Nextcloud's session-CSRF protection MUST stay on. Spec amended (D7) to record the decision. **Significant items:** - **Order-sensitive `bases` diff.** New private `EntityRelationMapper::basesAreEqual(?array, ?array): bool` does multiset equality (sort + unique). `['a', 'b']` vs `['b', 'a']` is now a semantic no-op and produces no audit entry. `null` vs `[]` remains distinct. Storage preserves operator-supplied order; only the diff check normalises. - **`getParams()` content-type sensitivity.** `update()` rejects requests whose Content-Type is anything other than `application/json` (or empty) with HTTP 415, so the PATCH never depends on body-parser heuristics for the decision-time mutation. - **Non-transactional audit write.** Row UPDATE + audit-trail INSERT now run inside a single `$db->beginTransaction()` / `commit()` / `rollBack()`. An audit failure rolls back the UPDATE and surfaces as HTTP 500 — clients never observe a persisted decision-metadata change without a matching audit entry. The downstream event dispatch stays OUTSIDE the transaction (post-commit, informational; listener failures still don't roll back). Test `testAuditEmissionFailureRollsBackUpdateAndThrows` updates the contract. - **`fileId = 0` bypass.** Closed by adding `> 0` guards on `fileId`, `objectId`, `emailId` in the authz dispatch helper. - **`tasks.md` 3.2a + companions checked.** Items 1.1–1.3, 2.1–2.6, 3.1–3.8 marked `[x]` to reflect what landed in this PR. **Minor items:** - **Test copyright headers.** Added SPDX + Conduction headers to `EntityRelationsControllerTest`, `EntityRelationMapperTest`, `EntityRelationMapperUpdateDecisionMetadataTest`, `EntityRelationTest` (also fixed the latter's `namespace Unit\Db` typo). - **CHANGELOG signature.** Corrected the `updateDecisionMetadata` signature in the Unreleased entry from `int $id` to `EntityRelation $relation` and noted the transactional contract + multiset `bases` diff. - **Migration `postSchemaChange()` comment.** Documented why `Version1Date20260512120000` intentionally does not override `postSchemaChange` (nullable column + DB-level default; no backfill). **Spec.** `design.md` §D6 now records the transactional audit-invariant + multiset `bases` equality; §D7 documents the authz wiring (file → `isUpdateable`; object → `PermissionHandler`; email → parent object via `EmailLink`) and the `@NoCSRFRequired` removal rationale. **Tests.** 44 tests / 160 assertions across the four mapper + controller suites pass. New cases: - multiset bases equality (reordered, duplicates, null-vs-empty). - audit-failure rolls back update and rethrows. - 403 on object-bound when `hasPermission` denies. - 200 on object-bound when `hasPermission` grants. - 403 on email-bound when EmailLink has no parent object. - 403 on a relation with no subject (fileId/objectId/emailId all unset). **Newman.** The reviewer's diagnosis (commit `b8949d17` broke Newman via missing `registerService` for autowiring) does not reproduce — see PR comment. Newman crud fails identically on `development` HEAD (`9d50f4360`, before this branch existed); the failures are HTTP 400 assertion errors, not "socket hang up". Autowiring works locally: `GET /api/entities/stats` (uses `GdprEntitiesController` which constructor-injects `EntityRelationMapper`) returns HTTP 200 with a valid payload. Speculative `registerService` block NOT added pending reviewer's exact failing collection + commit.
…ymise path
The Wave 1.3 spec (`entity-relation-grondslagen`) is explicit: every
file the anonymise flow successfully redacts MUST end with its
EntityRelation rows flipped to `anonymized = 1` (except rows whose
`skip_anonymization = 1`, which `markAsAnonymized` excludes per the
existing WHERE-clause contract). Downstream consumers — DocuDesk's
per-dossier grondslagen summary, audit-trail reporting, anonymise-
status badges — depend on this flag to discriminate "this file went
through redaction" from "this file was scanned but never anonymised."
The HTTP-route path
`FileTextController::anonymizeFile`-> mapper->markAsAnonymized()
honours this. The DI path
`FileService::anonymizeDocument` → `DocumentProcessingHandler::anonymizeDocument`
— which is what DocuDesk's per-file and batch anonymise endpoints
ultimately call — did NOT. It filtered by skip, built the
replacements map, wrote the anonymised file, and returned without
ever touching `anonymized` / `anonymized_value`. Every relation on
every file DocuDesk redacted stayed at `anonymized = 0`, so:
- the DI parallel was silently inconsistent with the HTTP path,
- DocuDesk's `GrondslagenSummaryService::renderDossierSummary`
(which queries `findAnonymisedEntitiesWithBasesForFile`, filtered
on `r.anonymized = 1`) saw empty result sets and produced reports
saying "Dit dossier bevat geen geanonimiseerde documenten" even
immediately after a folder-wide anonymise that updated dozens of
rows with `bases` and produced multiple anonymised PDFs.
**Fix.** After `replaceWords` returns the anonymised file, call
`entityRelationMapper->markAsAnonymized($sourceFileId, '[REDACTED]')`
on the source's file id. The mapper's `AND skip_anonymization = 0`
predicate preserves the operator's skip decisions per the existing
contract. The placeholder value `[REDACTED]` is a generic
representative — each entity received its own `[TYPE: key]`
replacement key earlier in `anonymizeDocument`, but the
`anonymized_value` column is a single per-file string rather than a
per-row list, so we record a sentinel.
**Failure isolation.** The mark step is wrapped in `try/catch`. A
mapper failure does NOT mask the successful redaction — the
anonymised file is already on disk by the time we get here, and
re-throwing would force the controller to surface a "your file is
gone" error when in fact the file is fine and only the audit flag
update failed. The retry path (subsequent anonymise call on the same
file) re-marks idempotently, and the warning log lets operators
catch the divergence between disk state and DB state.
Guard: skips the call when `$replacements` is empty (no entities to
mark anonymised) or when `$fileId` is non-positive (synthetic /
non-persisted node).
…laceholder in DI anonymise path
`DocumentProcessingHandler::anonymizeDocument` previously generated
substitution placeholders from a fresh UUID prefix per call:
$key = $entity['key'] ?? substr(Uuid::v4()->toRfc4122(), 0, 8);
$replacements[$originalText] = '['.$entityType.': '.$key.']';
Two real problems:
1. **Non-idempotent.** Re-anonymising the same document with the
same inputs produced byte-divergent output because the UUID
prefix changed every call. Bad for forensics, bad for caching,
bad for diff-based reviews.
2. **Reports can't reference the actual placeholder.** DocuDesk's
grondslagen-summary report needs to show "for this anonymised
file, the placeholder `X` corresponds to grondslag `Y`". With
per-call UUID prefixes, the report can either (a) ingest the
anonymised file and re-parse placeholders out of it (fragile)
or (b) display a synthetic tag that doesn't match what's in
the file (confusing). Neither is acceptable.
**Fix.** Look up the stable `entity_id` from existing EntityRelation
rows on the file and use it as the placeholder suffix:
$entityIdMap = $entityRelationMapper->findEntityIdsByValueForFile($fileId);
if (isset($entityIdMap[$originalText])) {
$replacements[$originalText] = '['.$entityType.': '.$entityIdMap[$originalText]['id'].']';
} else {
// Legacy fallback — entity not in relations table.
$key = $entity['key'] ?? substr(Uuid::v4()->toRfc4122(), 0, 8);
$replacements[$originalText] = '['.$entityType.': '.$key.']';
}
New mapper helper `EntityRelationMapper::findEntityIdsByValueForFile(
int $fileId): array<string $value, array{id: int, type: string}>`
returns the distinct `(entity_value, entity_id, entity_type)` set
for the file via a single JOIN. Same value mapping to multiple types
takes the first match — callers discriminate at the substitution
site with their own `entityType`.
Properties of the new placeholder format:
- **Stable across runs.** Same entity in same file always produces
the same placeholder. Re-anonymise is now idempotent.
- **Stable across files.** Same entity in different files produces
the same placeholder. Cross-file forensics (`[PERSON: 42]`
appears in N anonymised files) becomes a meaningful query.
- **Matches the report.** DocuDesk's grondslagen-summary report
generates the same tag from the same data; the report column
and the in-file substitution are byte-identical.
**Fallback retained** for direct DI callers that bypass extraction
and pass entities not present in the relations table. The fallback
keeps the legacy UUID-prefix behaviour so existing tests / external
consumers don't break — they just don't get the stability benefit.
**Drive-by clean-up.** Pulled the `$fileId` resolution out of the
nested `method_exists($node, 'getId')` block and into a single
top-of-method variable so the later `markAsAnonymized` and new
`findEntityIdsByValueForFile` calls share a guarded local. Hides a
latent (but harmless) undefined-variable warning on the post-write
flag flip when `$node` lacked `getId`.
0808d18 to
6d81c12
Compare
- EmailServiceTest: PHPUnit 10+ strict separation — move `find` from addMethods() into onlyMethods() since EmailLinkMapper::find() exists in this PR's lib. - FileTextControllerTest: anonymise flow now reads pre-detected entities via EntityRelationMapper::findEntitiesForAnonymization() instead of findEntitiesForFile(); update mock method names in 4 spots.
… + 2 concerns) 🔴 Double markAsAnonymized on the HTTP path — FileTextController::anonymizeFile called markAsAnonymized($fileId, 'anonymized_<date>') after DocumentProcessingHandler::anonymizeDocument already called markAsAnonymized($fileId, '[REDACTED]'). The second UPDATE overwrote the first, leaving rows with an anonymizedValue that didn't match what was actually written to the file content. Pick the canonical caller per option (b) — DocumentProcessingHandler is on every redaction path (HTTP, DI, event listeners); the controller is bypassed by non-HTTP paths. Remove the controller-side call; update the two affected FileTextControllerTest expectations from once() → never() with explanatory comments. 🟡 DI wiring inconsistency — EntityRelationMapper is the only IEventDispatcher-dependent mapper without an explicit registerService() block. Other event-dispatcher-dependent mappers (SchemaMapper, RegisterMapper, MagicMapper, WebhookMapper) are wired explicitly. Add the EntityRelationMapper block next to MagicMapper, matching the same pattern. Removes the autowiring question. 🟡 Placeholder format unspec'd + correlation acknowledgement — add two ADDED Requirements to the spec: (1) the DI anonymise path MUST substitute each entity using the stable [<TYPE>: <entity_id>] placeholder format, with three scenarios covering stable id, idempotency on re-anonymise, and the fallback to UUID fragment for entries missing from the entity catalogue; (2) DocumentProcessingHandler is the sole canonical caller of markAsAnonymized in the redaction flow. Also add a Notes section documenting the cross-document correlation property of the placeholder choice and the privacy threat model (within-tenant correlation is in-scope and required by the consumer report). openspec validate entity-relation-grondslagen — clean.
|
@WilcoLouwerse — addressed in 🔴 Double
|
WilcoLouwerse
left a comment
There was a problem hiding this comment.
Re-review verdict — APPROVE
Commit eb02e6b41 addresses the three findings from my 2026-05-18 re-review, and the four findings from the 2026-05-15 round are confirmed resolved by e34eac991. CI Newman API Test Suite is passing on HEAD.
Resolved this round (eb02e6b)
- 🔴 Double
markAsAnonymizedon HTTP path — option (b) chosen;DocumentProcessingHandler::anonymizeDocumentis sole canonical caller (writes'[REDACTED]'),FileTextController::anonymizeFileno longer marks. Tests nowexpects($this->never()). Call chain verified end-to-end. - 🟡 DI wiring — explicit
registerServiceforEntityRelationMapper— explicit factory block added inApplication::registerMappersWithCircularDependencies(), mirroring the pattern used bySchemaMapper/MagicMapper/WebhookMapper. - 🟡 Placeholder format unspec'd + correlation risk — two new Requirements in
specs/entity-relation-grondslagen/spec.md(placeholder format with three scenarios incl. fallback; sole canonical caller formarkAsAnonymized), plus a "Placeholder format and cross-document correlation" Notes section explicitly framing within-tenant correlation as deliberate (Woo dossier consumer requirement), with cross-tenant impossibility and HMAC alternatives discussed.
Resolved earlier (e34eac9)
- 🔴
@NoCSRFRequiredon state-mutating PATCH — annotation removed; design §D7 records the rationale. - 🔴 Auth bypass for object/email-bound relations —
PermissionHandlerwired, object-bound relations now route throughhasPermission(schema, 'update', uid, owner, object); email-bound relations resolve parentObjectEntityviaEmailLinkand run the same verdict;>0guards close thefileId=0/objectId=0/emailId=0bypass. - 🔴 Newman regression /
IEventDispatcherDI wiring — although the author initially pushed back on the diagnosis, the explicitregisterServiceblock from the follow-up effectively closes this; Newman now passes on HEAD. - 🔴 ADR-005 violation — display name stored as audit actor —
userNameno longer persisted; only the stable UID is stored in the audit-trail row.
Minor non-blocking notes for follow-up
DocumentProcessingHandler::anonymizeDocumentdoes not emit thewarninglog that the new spec's fallback scenario says implementations SHOULD log when no matching entity-catalogue row is found. SHOULD-level, non-blocking.- The fallback uses
$entity['key'] ?? substr(uuid, 0, 8)rather than always a UUID fragment, so a DI caller supplying'key'can influence the placeholder content. Spec says MAY, so within bounds.
LGTM. 🟢
Adds the `react/async` package so the OR combined test branch can resolve `React\Async\await()` at runtime. The fault path emerged during DocuDesk's folder-anonymisation flow (entity-relation-grondslagen DI path) where the async helper is called by ObjectService. Test-branch-only commit. Not part of any merged PR scope; will land on development through the appropriate feature branch later (see PR #1503 discussion).
Implements the
entity-relation-grondslagenchange. Includes a spec rework committed before the implementation — see "Spec rework note" below.Summary
Introduces an audited, decision-only PATCH endpoint for setting operator decisions on detected-entity occurrences:
PATCH /api/entity-relations/{id}— body{bases?: ?array<string>, skipAnonymization?: bool}. Any other top-level key returns HTTP 400 with the offending field name.@NoAdminRequired; the acting user MUST have write-access to the relation's parent file.EntityRelationMapper::updateDecisionMetadata(EntityRelation, array, ?IUser)— single audited write path. Controller passes the loaded relation; mapper enforces the whitelist, validates field shapes, computes a diff, persists changes, and emits exactly one audit-trail entry (only when fields actually change — semantic no-ops produce no audit).markAsAnonymizedaddsAND skip_anonymization = 0to its WHERE;FileTextController::anonymizeFilereads through the newfindEntitiesForAnonymization;DocumentProcessingHandler::anonymizeDocumentdefensively filters skipped values server-side so the contract — "skipped relations are never redacted, full stop" — holds regardless of caller behaviour.The post-hoc system fields
anonymized/anonymizedValueare intentionally NOT in the whitelist; they're set only bymarkAsAnonymizedto preserve the audit invariant "anonymized=true⟹ the redaction code ran for this row".Spec rework note
This branch's first commit (
ef5464b94) is a substantive amendment to the spec that was approved as PR #1494. The original spec described "the anonymise endpoint acceptsbases, persists, then strips before forwarding to OpenAnonymiser". Explore-mode investigation in apply uncovered two architectural mismatches:EntityRecognitionHandler::detectWithOpenAnonymiser), not at anonymise time. The "strip before forward" requirement had no counterpart in the code.fileIdand reads pre-detected relations from the DB; it never forwarded entity arrays anywhere external.The rework decoupled bases-set from the anonymise call into the new PATCH endpoint, and added the
skipAnonymizationfield on the same audited write path so operators can also opt-out per-occurrence. Both spec and design now describe the actual architecture.The companion DocuDesk amendment lives on
docudesk:feat/123/anonymisation-grondslagen-and-prohibition-gate-amend(local-only — to be pushed alongside this review).Database
basesskip_anonymizationMigration is forward-only and idempotent (
hasColumnguards). Existing rows pick up the defaults; no backfill.Tests
33 unit tests, 133 assertions, all passing against
master-nextcloud-1:tests/Unit/Db/EntityRelationTest.php— extended for the two new fields.tests/Unit/Db/EntityRelationMapperTest.php— construction wiring.tests/Unit/Db/EntityRelationMapperUpdateDecisionMetadataTest.php— whitelist, shape validation, semantic-no-op, diff-aware audit emission, audit-failure tolerance, actor resolution (explicit / session / system fallback).tests/Unit/Controller/EntityRelationsControllerTest.php— 401 / 403 / 404 / 400 / 200 / 500 HTTP contract.Quality gates
phpcsclean on all touched files (composer phpcs)php -lcleanopenspec validate entity-relation-grondslagencleanTest plan
ef5464b94) — confirm the decoupled-PATCH architecture matches expectations139e1667d){bases: ["uuid-a"]}, confirm the row is updated and one audit-trail entry exists{skipAnonymization: true}; runPOST /api/files/:fileId/anonymizeon the parent file; confirm the skipped row is NOT redacted and staysanonymized=false{anonymized: true}; confirm HTTP 400 withfield_not_editableerrordocs/Features/entity-relation-decision-metadata.mdfor accuracyRefs: #1435