Fix name collisions between service-defined shapes and framework types#682
Fix name collisions between service-defined shapes and framework types#682Alan4506 wants to merge 4 commits into
Conversation
| "identity", | ||
| "StaticCredentialsResolver", | ||
| SmithyPythonDependency.SMITHY_AWS_CORE); | ||
|
|
There was a problem hiding this comment.
I noticed REST_JSON_CLIENT_PROTOCOL and STATIC_CREDENTIALS_RESOLVER are defined in both RuntimeTypes.java and AwsRuntimeTypes.java. Was the duplication intentional? Do we need them in both files?
There was a problem hiding this comment.
Good catch! The duplication was unintentional. These two symbols are only used by generators in the core module (RestJsonProtocolGenerator and HttpProtocolTestGenerator), which should not depend on the AWS module. I've removed them from AwsRuntimeTypes.java and kept them only in RuntimeTypes.java.
jonathan343
left a comment
There was a problem hiding this comment.
Thanks @Alan4506!
I left some comments on potential gaps we should look into. Can you look into these and address if appropriate? It would also be ideal to add some related tests to prevent regressions
| */ | ||
| void aliasImport(String namespace, String name, String alias) { | ||
| aliasImportInMap(namespace, name, alias, externalImports); | ||
| aliasImportInMap(namespace, name, alias, localImports); |
There was a problem hiding this comment.
This collision handling also needs to update stdlib imports when it rewrites symbol references.
Example: if a Smithy model defines a structure named Decimal and another shape uses bigDecimal, the generated Python can end up like this:
from decimal import Decimal
class Decimal:
...
@dataclass
class Foo:
value: _DecimalThe body was rewritten to use _Decimal, but the import still exposes Decimal, so importing the module fails with NameError: name '_Decimal' is not defined.
Could we either include stdlibImports in the alias rewrite path or avoid aliasing stdlib symbols here? If you considered this and there is an issue with it, let me know
| if (symbol.getNamespace().isEmpty() || !symbol.getDefinitionFile().isEmpty()) { | ||
| // No namespace means builtin. Has definition file means locally-defined. | ||
| // Neither case needs collision detection. | ||
| return symbol.getName(); |
There was a problem hiding this comment.
This early return seems too broad. A symbol with a definition file may still be imported into a different generated file, so skipping it here can hide real name collisions from the import resolver.
Example: config.py may need both a runtime resolver and a generated model with the same name:
from smithy_aws_core.endpoints import StandardRegionalEndpointsResolver
from .models import StandardRegionalEndpointsResolver
class Config:
def __init__(self):
self.endpoint_resolver = StandardRegionalEndpointsResolver(
endpoint_prefix="..."
)The .models import shadows the runtime resolver, so this calls the generated model class with resolver arguments. The expected output would alias one of them, for example:
from smithy_aws_core.endpoints import StandardRegionalEndpointsResolver as _RegionalResolver
from .models import StandardRegionalEndpointsResolverCould we only skip symbols that are actually defined in the current writer, while still registering generated symbols that are imported from other generated files?
Problem
When generating a Python client for services like Q Business, the codegen produces invalid Python code due to two categories of name collisions:
1. schemas.py — import collision
The Q Business model defines
com.amazonaws.qbusiness#Blob. The codegen generates a localBLOB = Schema(...)inschemas.py. A member elsewhere targetssmithy.api#Blob, causingfrom smithy_core.prelude import BLOBto be added inschemas.py. The local definition shadows the import.2. models.py — import collision
The Q Business model defines
com.amazonaws.qbusiness#Document. The codegen generatesclass Document:inmodels.py. The model also definesdocumenttype shapes (e.g.,ActionPayloadFieldValue), causingfrom smithy_core.documents import Documentto be added inmodels.py. The local class definition shadows the import.3. models.py — union member variant name collision
The Q Business model defines:
The codegen synthesizes variant class name
Principal+capitalize("user")=PrincipalUser, which collides with the target structurePrincipalUserthat is also generated as a separate class in the same file.Additionally, the framework defines
AuthOptionin bothsmithy_core.authandsmithy_core.interfaces.auth. When both are imported into the same file (e.g.,auth.py), one shadows the other. This was previously handled with manual aliases (e.g.,addImport("smithy_core.interfaces.auth", "AuthOption", "AuthOptionProtocol")), along with similar manual aliases forEndpointResolverandHTTPResponse. These worked but were ad-hoc — each case required a hardcoded alias in the generator code. Our change replaces all manual aliases with a unified automatic collision detection system.Description of Changes
Import collisions — deferred collision detection in PythonWriter
Inspired by smithy-java's
DeferredSymbolWriterpattern:Created
RuntimeTypes.javaandAwsRuntimeTypes.javawith pre-definedSymbolconstants for all framework types used across generators. This replaces hardcodedwriter.addImport("smithy_core.X", "Name")+ string literal"Name"patterns with$Tusing RuntimeTypes symbols, making all framework type references visible to the collision detection system.Modified
PythonWriterto track a symbol table (all symbols referenced via$T, keyed by simple name) and a locally-defined names set (class names and top-level variables registered by generators viaaddLocallyDefinedSymbol()). The$Tformatter now returns a placeholder token («namespace.Name») instead of the final name. AttoString()time,resolveCollisions()compares the two sets and:_Name_smithy_core_auth_AuthOptionand_smithy_core_interfaces_auth_AuthOption)Added
aliasImport()toImportDeclarationsto retroactively rewrite an import's alias attoString()time.All generators that define classes or top-level names now call
addLocallyDefinedSymbol().Union member variant collisions — name check in PythonSymbolProvider
Added
allShapeNamesset toPythonSymbolProvider, populated at construction time with the PascalCase names of all shapes that generate classes (structure, union, enum, intEnum).In
memberShape(), after synthesizing the variant name (UnionName + capitalize(MemberName)), checks if it collides with any shape name. If so, usesUnionName_MemberName(e.g.,Principal_User) instead. Smithy shape names cannot contain underscores, so this format is guaranteed collision-free.Same check in
unionShape()for theUnknownvariant (UnionName + "Unknown"→UnionName_Unknownif collision detected).Testing
We successfully generated Q Business code with no ruff errors and no pyright errors. We inspected the generated code:
Import collisions resolved —
models.py:Import collisions resolved —
_private/schemas.py:Import-vs-import collision resolved —
auth.py:Import-vs-import collision resolved —
tests/test_protocol.py:Union member variant collision resolved —
models.py:We also locally tested that all modules import successfully, and integration tests pass.
We also regenerated all three existing clients (Bedrock Runtime, Sagemaker Runtime, Transcribe Streaming). All build successfully and tests pass.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.