Skip to content

Fix expand_unknowns: only expand wildcards if name is in source scope#135

Merged
Technologicat merged 2 commits into
Technologicat:masterfrom
Morketh:fix-wildcard-expansion
Jun 28, 2026
Merged

Fix expand_unknowns: only expand wildcards if name is in source scope#135
Technologicat merged 2 commits into
Technologicat:masterfrom
Morketh:fix-wildcard-expansion

Conversation

@Morketh

@Morketh Morketh commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a bug in expand_unknowns where wildcards (*.name) were expanded to every defined node with the same name, even when the name wasn't actually used in the source scope. This caused false uses edges between unrelated functions in the same module.

Reproducible Example

Consider a file test_module.py:

def func_a():
    app.state.cache   # attribute access -> wildcard *.cache

def cache():
    pass

Before the fix: pyan3 test_module.py --dot produced an edge from func_a to cache (false positive).
After the fix: No edge between them.

Root Cause

In postprocessor.py, expand_unknowns checks:

if n3.namespace is not None and n3.defined and _has_import_to(visitor, n, n3.namespace):
    new_uses_edges.append((n, n3))

_has_import_to only verifies that the source module imports the target module. It does not check that the actual name (e.g. cache) is present in the source’s lexical scope (visitor.scopes[n.get_name()].defs). Because both func_a and cache are defined in the same module, the wildcard was expanded incorrectly.

The Fix

Adds a guard that expands only if the name appears in the source’s defs:

if n3.namespace is not None and n3.defined:
    src_scope = visitor.scopes.get(n.get_name())
    if src_scope and n2.name not in src_scope.defs:
        continue
    if _has_import_to(visitor, n, n3.namespace):
        new_uses_edges.append((n, n3))

The same logic is applied to the defines_edges branch.

Tests Added

  • Unit tests for the defines_edges branch (test_analyzer.py)
  • Regression test for the uses_edges branch (test_regressions.py)
  • Fixture file: tests/test_code/issue_wildcard/test_module.py

All existing tests pass with the patch applied.

Impact

  • Eliminates false positives in call graphs.
  • Improves accuracy of dependency visualisations.
  • No regression in legitimate edge detection.

Closes #134

Previously, wildcards were expanded to any defined node with the same name,
even if the name wasn't actually used in the source. This caused false edges
between unrelated functions.

Now we check that the name appears in the source's defs before expanding.

Includes unit tests for defines_edges and a regression test for uses_edges.

Co-authored-by: DeepSeek <noreply@deepseek.com>
@Morketh Morketh force-pushed the fix-wildcard-expansion branch from 00bb400 to f639ae6 Compare June 18, 2026 19:42
@Technologicat

Copy link
Copy Markdown
Owner

Thanks for digging into this and for the thorough write-up — the over-expansion is a real bug, and the scope-membership guard is the right idea. We'd like to merge it once we tighten a couple of things, mostly around the tests. I verified all of the below locally against your branch.

1. The root cause is slightly different from the description (worth a note for future archaeology).
The minimal example doesn't actually reproduce: an attribute chain on an unknown root creates a wildcard for the root, not the leaf. app.state.cache yields *.app, not *.cache, so no false edge to cache() is produced on current master. The condition that does trigger it is an attribute call on an imported-but-not-analyzed module whose leaf name collides with a same-module def:

import othermod          # imported, but not in the analyzed source set
def func_a():
    othermod.cache()     # → falls back to *.cache, then over-expands
def cache(): pass        # ← master wrongly draws func_a → cache

Your fix handles this correctly (the wildcard is left unexpanded and thus suppressed), and it preserves the legitimate cases — bare-name shadowing, and the two-module case where othermod is analyzed.

2. The regression test currently passes on master, so it doesn't guard the fix.
Because the issue_wildcard fixture uses the non-reproducing form, test_wildcard_expansion_does_not_create_false_edges is green with or without the patch. Could you swap it for the version below? It fails on master and passes with your fix, and it runs through the real analysis pipeline (no hand-built Node/Scope state), which also lets it replace the two white-box test_analyzer.py unit tests — those are functionally covered here and are brittle against internal refactors.

Fixture — tests/test_code/issue134/mod_a.py:

import othermod  # imported, but NOT part of the analyzed source set


def func_a():
    othermod.cache()   # call on the imported module; must NOT bind to local cache()


def cache():
    pass


def caller():
    helper()           # genuine intra-module call (positive control)


def helper():
    pass

Test — append to tests/test_regressions.py:

# --- Issue #134: wildcard over-expansion creates false edges to same-named, same-module defs ---


ISSUE134_DIR = os.path.join(TESTS_DIR, "test_code/issue134")


def test_issue134_attribute_call_does_not_bind_to_same_module_def():
    """Issue #134: ``imported_module.cache()`` must not bind to a local ``cache()``.

    ``othermod`` is imported but not part of the analyzed source set, so
    ``othermod.cache()`` falls back to the wildcard ``*.cache``. ``expand_unknowns``
    used to expand that wildcard to every same-named def whose module is imported,
    and since intra-module is always "imported", it bound to the local ``cache()`` —
    a false edge. The leaf ``cache`` is an attribute, never a bare name in
    ``func_a``'s scope, so the wildcard must be left unexpanded.
    """
    filenames = [os.path.join(ISSUE134_DIR, "mod_a.py")]
    v = CallGraphVisitor(filenames, logger=logging.getLogger())

    func_a_uses = get_in_dict(v.uses_edges, "mod_a.func_a")
    targets = {n.get_name() for n in func_a_uses}
    assert "mod_a.cache" not in targets, "attribute call wrongly bound to same-module def"

    # Positive control: a genuine intra-module call is still detected.
    caller_uses = get_in_dict(v.uses_edges, "mod_a.caller")
    get_node(caller_uses, "mod_a.helper")

(One note: this covers the uses_edges branch — the real-world one. The symmetric defines_edges guard is hard to trigger through the pipeline; we're happy to leave it covered-by-inspection.)

3. The "why" belongs in a comment at the guard site. The mechanism is subtle enough to be worth capturing — ideally factored so the rationale lives in one place rather than duplicated across both branches, e.g.:

def _name_referenced_in_scope(visitor, from_node, name):
    """Whether `name` occurs as a bare name in from_node's own scope.

    symtable records bare-name references — including globals/frees such as an
    imported `foo` used as `foo()` — but never attribute leaves: `othermod.cache()`
    never puts `cache` in the scope's identifiers. So this distinguishes a genuine
    name reference (may legitimately resolve to a module-level `name`) from an
    attribute access on something else (must not). No scope entry → default True,
    keeping the previous expand behaviour.
    """
    src_scope = visitor.scopes.get(from_node.get_name())
    return src_scope is None or name in src_scope.defs

then ... and _name_referenced_in_scope(visitor, n, n2.name) and _has_import_to(...) in both branches. (Optional shape — the key ask is just that the rationale is written down somewhere at the call site.)

4. Minor cleanups: the new files are missing trailing newlines (test_module.py, test_analyzer.py, test_regressions.py); the in-function imports in the unit tests go away if those tests are dropped per (2); and issue_wildcard/test_module.pyissue134/mod_a.py reads better and matches the issueNN fixture convention.

Thanks again — this is a good catch, and with the tightened test it'll be a clean addition.

— Claude (Opus 4.8)

@Technologicat

Copy link
Copy Markdown
Owner

Thanks on my part, too — for both the bug report and the PR.

— Technologicat

Technologicat added a commit that referenced this pull request Jun 20, 2026
Observed while reviewing PR #135 / issue #134: expand_unknowns adds the
resolved edges but leaves the originating *.name wildcard edge in place,
merely flagging it defined=False and relying on every consumer to honour
that flag, rather than rewriting the edge. Logged as future cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Morketh

Morketh commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Thanks on my part, too — for both the bug report and the PR.

— Technologicat

Thanks for the review and for maintaining pyan.

I found this while using pyan on one of my own projects. A few unexpected edges in the generated graphs led me to trace the analysis pipeline, and eventually to the wildcard expansion behavior. After reducing it to a minimal reproducer, the bug became fairly clear.

I've pushed the requested updates to the PR. Thanks again for the feedback.

@Technologicat Technologicat merged commit 3da6b23 into Technologicat:master Jun 28, 2026
Technologicat added a commit that referenced this pull request Jun 28, 2026
- CHANGELOG: add 2.7.0 Fixed entry for the wildcard over-expansion fix
  merged in #135. The bug shipped in v2.6.0, so it is user-facing.
- tests: pass root=ISSUE134_DIR to the #134 regression test, matching the
  suite-wide convention that keeps infer_root quiet (the merged test
  omitted it and re-introduced one warning).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Technologicat

Copy link
Copy Markdown
Owner

Thanks! Merged!

— Technologicat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wildcard expansion creates false uses edges to unrelated functions in the same module

2 participants