Skip to content

Various fixes after e2e testing#52

Merged
donaldgray merged 21 commits into
mainfrom
feature/e2e_run
Jun 2, 2026
Merged

Various fixes after e2e testing#52
donaldgray merged 21 commits into
mainfrom
feature/e2e_run

Conversation

@donaldgray
Copy link
Copy Markdown
Member

  • Remove TODO comments in favour of issues.
  • Logging improvements, this can be added to as we go.
    • Add correlation-id to hangfire job implementation.
  • Wire up S3TextStore to search-api, previously this was only done for builder-api.
  • Update readmes to show full level of nesting for appsettings
  • Extract user-agent to constant and use consistent format.
  • Validate id route variables - required and cannot start with /
  • Have single place for introducing all paths, shared by builder + search api.
  • Set @context on Manifest when search service added.

donaldgray and others added 21 commits June 1, 2026 10:40
Deduplication: when injecting search services, rendering links, canvas
annotation refs, manifest annotations, and figures into an existing
manifest, skip any entry whose id is already present. Consolidates
the check into a single AddIfNew(array, item, prepend) helper.

Refactor: extract the five functional sections of Handle into named
private methods (InjectSearchServices, InjectRenderingLinks,
InjectCanvasAnnotationRefs, InjectManifestAnnotationsRefAsync,
InjectFiguresRefAsync), leaving Handle as a slim orchestrator.

Tests: adds 8 deduplication tests covering service, rendering, and
annotations blocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously ~90% of files had no logging despite Serilog being fully
wired up. Storage I/O failures were silent, and parse events in Core
providers (hyphenation merges, missing PrintSpace, timestamp errors)
produced no observable output.

* Add ILogger to FileSystemTextStore and S3TextStore; log every
  Save/Load/Delete/Exists at Debug, and unexpected S3 errors at Warning
* Add ILoggerFactory? to TextBuilder (default NullLoggerFactory) and
  thread loggers through to all four format providers
* AltoTextFormatProvider: logs namespace, missing PrintSpace (Warning),
  hyphenation merges (Debug), unmatched HypPart1 (Warning)
* HocrTextFormatProvider: logs missing ocr_page, unparseable bbox
* VttTextFormatProvider: logs timing-line parse failures (Warning)
* W3cAnnotationTextFormatProvider: logs skipped/unparseable annotations
* TextBuilder logs Warning when no provider matches a page
* Use ActivatorUtilities.CreateInstance for FileSystemTextStore
  production registrations, consistent with S3TextStore, so future
  DI-resolvable constructor params are auto-resolved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Reject IDs with a leading slash at creation time; downstream
  endpoints return 404 naturally since no such job can exist
* Use hasData (non-null, non-empty) consistently for the per-page
  iteration guard, replacing the looser SourceData != null check
* Add unit tests covering all JobInstruction validation branches
* Add E2E tests confirming 400 on leading slash and 202 on internal slashes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Search API previously hardcoded FileSystemTextStore — any deployment
using S3 in the Builder API would write artefacts the Search API could
not read.

* Add AWSSDK.Extensions.NETCore.Setup to Search API; wire S3TextStore
  with the same conditional pattern as the Builder API
* Rename Storage:RootPath -> Storage:FileSystem:RootPath in both APIs
  so filesystem and S3 are symmetrically namespaced under Storage:<type>
* Add typed S3StorageOptions to Builder API's StorageOptions (was
  previously read only as raw config strings)
* Add missing settings to builder-api.md: RunMigrations,
  AllowFileImageProxy, Notifications:TopicArn
* Use full config key paths (TextServices:...) in both doc tables to
  make nesting unambiguous
* Remove dead CorsAllowedOrigins from Search API example appsettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Add UserAgents.Builder / UserAgents.Search constants to
  TextServices.Infrastructure so both APIs reference one source
* Builder was TextServices/1.0, Search had the GitHub URL suffix —
  now both are TextServices-{Role}/1.0 (+github URL)
* Log at Debug when S3 storage or SNS notifications are activated,
  making startup config easier to diagnose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Required for local development against AWS
Replace {**id} (optional catch-all) with {*id:minlength(1)} across all
endpoints in both APIs. {**id} matched empty strings, meaning routes like
DELETE /textbuilder/ would reach the handler with an empty id. The
minlength(1) constraint causes the route not to match at all, returning
404 cleanly.

Add E2E tests verifying that mutation and resource endpoints without an
id segment return 404.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* InjectSearchServices now appends the v2 and v1 Search API context
  URLs to @context (promoting a string value to an array as needed,
  with deduplication). Context belongs at the manifest level alongside
  the service descriptors it enables.
* All three inject methods (search services, rendering links, canvas
  annotation refs) are now skipped when text is null. If there is no
  indexed text there is nothing to link to; previously search service
  descriptors were always written regardless.
* InjectRenderingLinks and InjectCanvasAnnotationRefs tightened to
  non-nullable Text parameter now that the null guard lives at the
  call site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hardcoded route-segment interpolations in TextAugmentedHandler and
JobResponse are replaced with calls to SearchApiRoutes, a new static
class in TextServices.Infrastructure. ResolvedRequest gains extension
methods that delegate to SearchApiRoutes, enabling resolved.SearchV1Url()
etc. TextAugmentedRequest now accepts ResolvedRequest directly rather
than individual URL strings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@donaldgray donaldgray merged commit ea40d0e into main Jun 2, 2026
4 checks passed
@donaldgray donaldgray deleted the feature/e2e_run branch June 2, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant