[ENHANCEMENT] Added provider, resource, and version parameters to sources#535
Open
Eric Godwin (ericgodwin) wants to merge 2 commits into
Open
[ENHANCEMENT] Added provider, resource, and version parameters to sources#535Eric Godwin (ericgodwin) wants to merge 2 commits into
Eric Godwin (ericgodwin) wants to merge 2 commits into
Conversation
…yItem Signed-off-by: ericgodwin <eric@overturemaps.org>
🗺️ Schema reference docs preview is live!
Note ♻️ This preview updates automatically with each push to this PR. |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds additional provenance fields to sources schema items so that a source can be identified with finer granularity than the current dataset string (as groundwork for future deprecation of dataset). All changed files in the PR were reviewed.
Changes:
- Extend
sourcePropertyItemwith optionalprovider,resource, andversionstring fields (withminLengthconstraints). - Update schema documentation text around
sourcePropertyItemandsourcesto reflect the intended future direction. - Add one new example and one new counterexample illustrating populated vs. invalid empty values.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
schema/defs.yaml |
Adds provider/resource/version fields to the common sourcePropertyItem definition and updates related descriptions. |
examples/buildings/sources-with-version.yaml |
New example demonstrating populated provider/resource/version fields in sources. |
counterexamples/buildings/bad-sources-empty-provider.yaml |
New counterexample validating that empty strings for the new fields are rejected by the schema constraints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/OvertureMaps/schema/sessions/2997187e-b460-4134-a7d3-bd9fab7b2c22 Co-authored-by: ericgodwin <1336911+ericgodwin@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
The intent of this change is to update our source item field to include the information necessary for data provenance:
Together, along with the version_id these values allow a user to uniquely identify what raw input data was used to construct Overture data. Our current system, of providing only a dataset is lacking dataset version information but is also inconsistently constructed. All three new fields will be nullable and optional to start as this is the first step where we are making it so the pipeline can populate these fields.
Major change release plan
While this change in itself is not a breaking change, it is part of a larger plan with major impact. The rough timeline for these changes is:
provider+resource+versiondetails as optional fields (this PR - June / July)provider+resource+versionrequired fields and B) markdatasetas deprecated and make it an optional field. (BREAKING - September)datasetfield. (BREAKING - March 2027 or later)Messaging around this change is that the current method of providing provenance is not sufficient to ensure traceability. Besides documenting the deprecation of
datasetwe will want to provide details on how theprovider,resource,versionwork together to identify a data snapshot.Closes #530
Testing
A couple of new examples / counterexamples have been added. In particular one to check that the length of each of the provided fields is at least 1 and a second which shows what properly populated fields look like.
The tests were then run with the following results:
Documentation website
Docs preview for this PR.