Fix expand_unknowns: only expand wildcards if name is in source scope#135
Conversation
Previously, wildcards were expanded to any defined node with the same name, even if the name wasn't actually used in the source. This caused false edges between unrelated functions. Now we check that the name appears in the source's defs before expanding. Includes unit tests for defines_edges and a regression test for uses_edges. Co-authored-by: DeepSeek <noreply@deepseek.com>
00bb400 to
f639ae6
Compare
|
Thanks for digging into this and for the thorough write-up — the over-expansion is a real bug, and the scope-membership guard is the right idea. We'd like to merge it once we tighten a couple of things, mostly around the tests. I verified all of the below locally against your branch. 1. The root cause is slightly different from the description (worth a note for future archaeology). import othermod # imported, but not in the analyzed source set
def func_a():
othermod.cache() # → falls back to *.cache, then over-expands
def cache(): pass # ← master wrongly draws func_a → cacheYour fix handles this correctly (the wildcard is left unexpanded and thus suppressed), and it preserves the legitimate cases — bare-name shadowing, and the two-module case where 2. The regression test currently passes on Fixture — import othermod # imported, but NOT part of the analyzed source set
def func_a():
othermod.cache() # call on the imported module; must NOT bind to local cache()
def cache():
pass
def caller():
helper() # genuine intra-module call (positive control)
def helper():
passTest — append to # --- Issue #134: wildcard over-expansion creates false edges to same-named, same-module defs ---
ISSUE134_DIR = os.path.join(TESTS_DIR, "test_code/issue134")
def test_issue134_attribute_call_does_not_bind_to_same_module_def():
"""Issue #134: ``imported_module.cache()`` must not bind to a local ``cache()``.
``othermod`` is imported but not part of the analyzed source set, so
``othermod.cache()`` falls back to the wildcard ``*.cache``. ``expand_unknowns``
used to expand that wildcard to every same-named def whose module is imported,
and since intra-module is always "imported", it bound to the local ``cache()`` —
a false edge. The leaf ``cache`` is an attribute, never a bare name in
``func_a``'s scope, so the wildcard must be left unexpanded.
"""
filenames = [os.path.join(ISSUE134_DIR, "mod_a.py")]
v = CallGraphVisitor(filenames, logger=logging.getLogger())
func_a_uses = get_in_dict(v.uses_edges, "mod_a.func_a")
targets = {n.get_name() for n in func_a_uses}
assert "mod_a.cache" not in targets, "attribute call wrongly bound to same-module def"
# Positive control: a genuine intra-module call is still detected.
caller_uses = get_in_dict(v.uses_edges, "mod_a.caller")
get_node(caller_uses, "mod_a.helper")(One note: this covers the 3. The "why" belongs in a comment at the guard site. The mechanism is subtle enough to be worth capturing — ideally factored so the rationale lives in one place rather than duplicated across both branches, e.g.: def _name_referenced_in_scope(visitor, from_node, name):
"""Whether `name` occurs as a bare name in from_node's own scope.
symtable records bare-name references — including globals/frees such as an
imported `foo` used as `foo()` — but never attribute leaves: `othermod.cache()`
never puts `cache` in the scope's identifiers. So this distinguishes a genuine
name reference (may legitimately resolve to a module-level `name`) from an
attribute access on something else (must not). No scope entry → default True,
keeping the previous expand behaviour.
"""
src_scope = visitor.scopes.get(from_node.get_name())
return src_scope is None or name in src_scope.defsthen 4. Minor cleanups: the new files are missing trailing newlines ( Thanks again — this is a good catch, and with the tightened test it'll be a clean addition. — Claude (Opus 4.8) |
|
Thanks on my part, too — for both the bug report and the PR. — Technologicat |
Observed while reviewing PR #135 / issue #134: expand_unknowns adds the resolved edges but leaves the originating *.name wildcard edge in place, merely flagging it defined=False and relying on every consumer to honour that flag, rather than rewriting the edge. Logged as future cleanup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Thanks for the review and for maintaining pyan. I found this while using pyan on one of my own projects. A few unexpected edges in the generated graphs led me to trace the analysis pipeline, and eventually to the wildcard expansion behavior. After reducing it to a minimal reproducer, the bug became fairly clear. I've pushed the requested updates to the PR. Thanks again for the feedback. |
- CHANGELOG: add 2.7.0 Fixed entry for the wildcard over-expansion fix merged in #135. The bug shipped in v2.6.0, so it is user-facing. - tests: pass root=ISSUE134_DIR to the #134 regression test, matching the suite-wide convention that keeps infer_root quiet (the merged test omitted it and re-introduced one warning). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks! Merged! — Technologicat |
Summary
Fixes a bug in
expand_unknownswhere wildcards (*.name) were expanded to every defined node with the same name, even when the name wasn't actually used in the source scope. This caused false uses edges between unrelated functions in the same module.Reproducible Example
Consider a file
test_module.py:Before the fix:
pyan3 test_module.py --dotproduced an edge fromfunc_atocache(false positive).After the fix: No edge between them.
Root Cause
In
postprocessor.py,expand_unknownschecks:_has_import_toonly verifies that the source module imports the target module. It does not check that the actual name (e.g.cache) is present in the source’s lexical scope (visitor.scopes[n.get_name()].defs). Because bothfunc_aandcacheare defined in the same module, the wildcard was expanded incorrectly.The Fix
Adds a guard that expands only if the name appears in the source’s
defs:The same logic is applied to the
defines_edgesbranch.Tests Added
defines_edgesbranch (test_analyzer.py)uses_edgesbranch (test_regressions.py)tests/test_code/issue_wildcard/test_module.pyAll existing tests pass with the patch applied.
Impact
Closes #134