Various fixes after e2e testing#52
Merged
Merged
Conversation
Deduplication: when injecting search services, rendering links, canvas annotation refs, manifest annotations, and figures into an existing manifest, skip any entry whose id is already present. Consolidates the check into a single AddIfNew(array, item, prepend) helper. Refactor: extract the five functional sections of Handle into named private methods (InjectSearchServices, InjectRenderingLinks, InjectCanvasAnnotationRefs, InjectManifestAnnotationsRefAsync, InjectFiguresRefAsync), leaving Handle as a slim orchestrator. Tests: adds 8 deduplication tests covering service, rendering, and annotations blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously ~90% of files had no logging despite Serilog being fully wired up. Storage I/O failures were silent, and parse events in Core providers (hyphenation merges, missing PrintSpace, timestamp errors) produced no observable output. * Add ILogger to FileSystemTextStore and S3TextStore; log every Save/Load/Delete/Exists at Debug, and unexpected S3 errors at Warning * Add ILoggerFactory? to TextBuilder (default NullLoggerFactory) and thread loggers through to all four format providers * AltoTextFormatProvider: logs namespace, missing PrintSpace (Warning), hyphenation merges (Debug), unmatched HypPart1 (Warning) * HocrTextFormatProvider: logs missing ocr_page, unparseable bbox * VttTextFormatProvider: logs timing-line parse failures (Warning) * W3cAnnotationTextFormatProvider: logs skipped/unparseable annotations * TextBuilder logs Warning when no provider matches a page * Use ActivatorUtilities.CreateInstance for FileSystemTextStore production registrations, consistent with S3TextStore, so future DI-resolvable constructor params are auto-resolved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Reject IDs with a leading slash at creation time; downstream endpoints return 404 naturally since no such job can exist * Use hasData (non-null, non-empty) consistently for the per-page iteration guard, replacing the looser SourceData != null check * Add unit tests covering all JobInstruction validation branches * Add E2E tests confirming 400 on leading slash and 202 on internal slashes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Search API previously hardcoded FileSystemTextStore — any deployment using S3 in the Builder API would write artefacts the Search API could not read. * Add AWSSDK.Extensions.NETCore.Setup to Search API; wire S3TextStore with the same conditional pattern as the Builder API * Rename Storage:RootPath -> Storage:FileSystem:RootPath in both APIs so filesystem and S3 are symmetrically namespaced under Storage:<type> * Add typed S3StorageOptions to Builder API's StorageOptions (was previously read only as raw config strings) * Add missing settings to builder-api.md: RunMigrations, AllowFileImageProxy, Notifications:TopicArn * Use full config key paths (TextServices:...) in both doc tables to make nesting unambiguous * Remove dead CorsAllowedOrigins from Search API example appsettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Add UserAgents.Builder / UserAgents.Search constants to
TextServices.Infrastructure so both APIs reference one source
* Builder was TextServices/1.0, Search had the GitHub URL suffix —
now both are TextServices-{Role}/1.0 (+github URL)
* Log at Debug when S3 storage or SNS notifications are activated,
making startup config easier to diagnose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Required for local development against AWS
Replace {**id} (optional catch-all) with {*id:minlength(1)} across all
endpoints in both APIs. {**id} matched empty strings, meaning routes like
DELETE /textbuilder/ would reach the handler with an empty id. The
minlength(1) constraint causes the route not to match at all, returning
404 cleanly.
Add E2E tests verifying that mutation and resource endpoints without an
id segment return 404.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* InjectSearchServices now appends the v2 and v1 Search API context URLs to @context (promoting a string value to an array as needed, with deduplication). Context belongs at the manifest level alongside the service descriptors it enables. * All three inject methods (search services, rendering links, canvas annotation refs) are now skipped when text is null. If there is no indexed text there is nothing to link to; previously search service descriptors were always written regardless. * InjectRenderingLinks and InjectCanvasAnnotationRefs tightened to non-nullable Text parameter now that the null guard lives at the call site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hardcoded route-segment interpolations in TextAugmentedHandler and JobResponse are replaced with calls to SearchApiRoutes, a new static class in TextServices.Infrastructure. ResolvedRequest gains extension methods that delegate to SearchApiRoutes, enabling resolved.SearchV1Url() etc. TextAugmentedRequest now accepts ResolvedRequest directly rather than individual URL strings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
idroute variables - required and cannot start with/@contexton Manifest when search service added.