Skip to content

BUG: walk_toposort spuriously raises 'graph contains cycles' on acyclic int-keyed graphs > 256 nodes #2170

@jaanerik

Description

@jaanerik

Describe the issue:

pytensor/graph/traversal.py line 570:

d = deps_cache[client] = [
    a for a in deps_cache[client] if a is not node
]

a is not node is identity comparison. CPython caches int singletons only in [-5, 256]; for larger values, two int objects with the same value are equal but not identical.

When FusionOptimizer.apply (post-#2150) calls walk_toposort(range(N), deps=direct_ancestors), the integers reaching the Kahn loop come from two distinct sources:

  • range(N) iteration → the node popped from sources.
  • sg_idx_of_out.get(inp) → the a stored in deps_cache[client].

For N above the cache ceiling, those two int objects have non-matching identity, so a is not node is always True — resolved dependencies are never removed, Kahn's gets stuck, and walk_toposort reports a spurious cycle.

Reproducable code example:

from pytensor.graph.traversal import walk_toposort

N = 258  # trip-point; any N >= 258 fails on stock 3.0.3
deps = lambda i: [i + 1] if i < N - 1 else []  # arithmetic produces fresh int objects
tuple(walk_toposort(range(N), deps=deps))
# ValueError: graph contains cycles
# (the chain n-1 -> ... -> 0 is plainly acyclic)

Proposed fix:

-    a for a in deps_cache[client] if a is not node
+    a for a in deps_cache[client] if a != node

Error message:

<details>
File "pytensor/tensor/rewriting/elemwise.py", line 870, in apply
  for inputs, outputs in find_fuseable_subgraphs(fgraph):
File "pytensor/tensor/rewriting/elemwise.py", line 862, in find_fuseable_subgraphs
  tuple(walk_toposort(subgraph_indices, deps=direct_ancestors))
File "pytensor/graph/traversal.py", line 579, in walk_toposort
  raise ValueError("graph contains cycles")
ValueError: graph contains cycles
</details>

PyTensor version information:

Details floatX ({'float64', 'float16', 'float32'}) Doc: Default floating-point precision for python casts.

Note: float16 support is experimental, use at your own risk.
Value: float64

warn_float64 ({'warn', 'raise', 'ignore', 'pdb'})
Doc: Do an action when a tensor variable with float64 dtype is created.
Value: ignore

cast_policy ({'custom', 'numpy+floatX'})
Doc: Rules for implicit type casting
Value: custom

print_global_stats (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8230>>)
Doc: Print some global statistics (time spent) at the end
Value: False

unpickle_function (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061ab080>>)
Doc: Replace unpickled PyTensor functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't
Value: True

<pytensor.configparser.ConfigParam object at 0x1061f8260>
Doc: Default compilation mode
Value: Mode

linker ({'cvm', 'vm', 'numba', 'auto', 'c|py', 'py', 'c', 'c|py_nogc', 'vm_nogc', 'jax', 'cvm_nogc'})
Doc: Default linker used if the pytensor flags mode is Mode or FAST_RUN
Value: auto

allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105adb0b0>>)
Doc: Do we default to delete intermediate results during PyTensor function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers.
Value: True

optimizer ({'fast_run', 'fast_compile', 'o4', 'merge', 'o1', 'o2', 'unsafe', 'o3', 'None'})
Doc: Default optimizer. If not None, will use this optimizer with the Mode
Value: o4

optimizer_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f82c0>>)
Doc: Print information about rewrites that are applied during a graph transformation.
Value: False

optimizer_verbose_ignore (<class 'str'>)
Doc: Do not print information for rewrites with these names when optimizer_verbose is True. Separate names with ','
Value:

compiler_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1060e4f20>>)
Doc: Print information about compilation steps.
Value: False

on_opt_error ({'warn', 'raise', 'ignore', 'pdb'})
Doc: What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger.
Value: warn

on_unused_input ({'warn', 'raise', 'ignore'})
Doc: What to do if a variable in the 'inputs' list of pytensor.function() is not used in the graph.
Value: raise

cxx (<class 'str'>)
Doc: The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled.
Value: /usr/bin/clang++

gcc_version_str (<class 'str'>)
Doc:
Value: 17.0.0

gcc__cxxflags (<class 'str'>)
Doc: Extra compiler flags for gcc
Value:

nocleanup (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8350>>)
Doc: Suppress the deletion of code files that did not compile cleanly
Value: False

cmodule__warn_no_version (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8380>>)
Doc: If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops.
Value: False

cmodule__remove_gxx_opt (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f83b0>>)
Doc: If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by PyTensor.The parameter -g is passed by default to g++
Value: False

cmodule__compilation_warning (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f83e0>>)
Doc: If True, will print compilation warnings.
Value: False

cmodule__preload_cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8410>>)
Doc: If set to True, will preload the C module cache at import time
Value: False

cmodule__age_thresh_use (<class 'int'>)
Doc: In seconds. The time after which PyTensor won't reuse a compile c module.
Value: 2073600

cmodule__debug (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x10581d670>>)
Doc: If True, define a DEBUG macro (if not exists) for any compiled C code.
Value: False

compile__wait (<class 'int'>)
Doc: Time to wait before retrying to acquire the compile lock.
Value: 5

compile__timeout (<class 'int'>)
Doc: In seconds, time that a process will wait before deciding to
override an existing lock. An override only happens when the existing
lock is held by the same owner and has not been 'refreshed' by this
owner for more than this period. Refreshes are done every half timeout
period for running processes.
Value: 120

tensor__cmp_sloppy (<class 'int'>)
Doc: Relax pytensor.tensor.math._allclose (0) not at all, (1) a bit, (2) more
Value: 0

lib__amdlibm (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f84d0>>)
Doc: Use amd's amdlibm numerical library
Value: False

tensor__insert_inplace_optimizer_validate_nb (<class 'int'>)
Doc: -1: auto, if graph have less then 500 nodes 1, else 10
Value: -1

traceback__limit (<class 'int'>)
Doc: The number of stack to trace. -1 mean all.
Value: 8

traceback__compile_limit (<class 'int'>)
Doc: The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save PyTensor internal stack trace.
Value: 0

warn__ignore_bug_before ({'1.0.1', '0.10', '1.0.5', '1.0', '1.0.2', '0.9', '1.0.3', '0.5', '0.4', '0.3', '0.8', '0.7', '0.8.2', '0.4.1', '1.0.4', '0.6', 'None', '0.8.1', 'all'})
Doc: If 'None', we warn about all PyTensor bugs found by default. If 'all', we don't warn about PyTensor bugs found by default. If a version, we print only the warnings relative to PyTensor bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.
Value: 0.9

exception_verbosity ({'high', 'low', 'medium'})
Doc: Verbosity of exceptions generated by PyTensor functions.
Value: low

check_input (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f85c0>>)
Doc: Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files.
Value: True

NanGuardMode__nan_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f85f0>>)
Doc: Default value for nan_is_error
Value: True

NanGuardMode__inf_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8620>>)
Doc: Default value for inf_is_error
Value: True

NanGuardMode__big_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8650>>)
Doc: Default value for big_is_error
Value: True

NanGuardMode__action ({'warn', 'raise', 'pdb'})
Doc: What NanGuardMode does when it finds a problem
Value: raise

DebugMode__patience (<class 'int'>)
Doc: Optimize graph this many times to detect inconsistency
Value: 10

DebugMode__check_c (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f86b0>>)
Doc: Run C implementations where possible
Value: True

DebugMode__check_py (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f86e0>>)
Doc: Run Python implementations where possible
Value: True

DebugMode__check_finite (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8710>>)
Doc: True -> complain about NaN/Inf results
Value: True

DebugMode__check_strides (<class 'int'>)
Doc: Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error
Value: 0

DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8770>>)
Doc: Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output.
Value: True

DebugMode__check_preallocated_output (<class 'str'>)
Doc: Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above).
Value:

DebugMode__check_preallocated_output_ndim (<class 'int'>)
Doc: When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2.
Value: 4

profiling__time_thunks (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8800>>)
Doc: Time individual thunks when profiling
Value: True

profiling__n_apply (<class 'int'>)
Doc: Number of Apply instances to print by default
Value: 20

profiling__n_ops (<class 'int'>)
Doc: Number of Ops to print by default
Value: 20

profiling__output_line_width (<class 'int'>)
Doc: Max line width for the profiling output
Value: 512

profiling__min_memory_size (<class 'int'>)
Doc: For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower than this threshold
Value: 1024

profiling__min_peak_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f88f0>>)
Doc: The min peak memory usage of the order
Value: False

profiling__destination (<class 'str'>)
Doc: File destination of the profiling output
Value: stderr

profiling__debugprint (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8950>>)
Doc: Do a debugprint of the profiled functions
Value: False

profiling__ignore_first_call (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8980>>)
Doc: Do we ignore the first call of an PyTensor function.
Value: False

on_shape_error ({'warn', 'raise'})
Doc: warn: print a warning and use the default value. raise: raise an error
Value: warn

openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f89e0>>)
Doc: Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the PyTensor configuration file ~/.pytensorrc or with the environment variable PYTENSOR_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in PyTensor by default.
Value: False

openmp_elemwise_minsize (<class 'int'>)
Doc: If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops.
Value: 200000

optimizer_excluding (<class 'str'>)
Doc: When using the default mode, we will remove optimizer with these tags. Separate tags with ':'.
Value:

optimizer_including (<class 'str'>)
Doc: When using the default mode, we will add optimizer with these tags. Separate tags with ':'.
Value:

optimizer_requiring (<class 'str'>)
Doc: When using the default mode, we will require optimizer with these tags. Separate tags with ':'.
Value:

optdb__position_cutoff (<class 'float'>)
Doc: Where to stop earlier during optimization. It represent the position of the optimizer where to stop.
Value: inf

optdb__max_use_ratio (<class 'float'>)
Doc: A ratio that prevent infinite loop in EquilibriumGraphRewriter.
Value: 8.0

cycle_detection ({'regular', 'fast'})
Doc: If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain.
Value: regular

check_stack_trace ({'log', 'raise', 'off', 'warn'})
Doc: A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing
Value: off

profile (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8b60>>)
Doc: If VM should collect profile information
Value: False

profile_optimizer (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8b90>>)
Doc: If VM should collect optimizer profile information
Value: False

profile_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8bc0>>)
Doc: If VM should collect memory profile information and print it
Value: False

<pytensor.configparser.ConfigParam object at 0x105adb7d0>
Doc: Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM.
Value: None

numba__fastmath (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8bf0>>)
Doc: If True, use Numba's fastmath mode.
Value: True

numba__cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1061f8c20>>)
Doc: If True, use Numba's file based caching.
Value: True

compiledir_format (<class 'str'>)
Doc: Format string for platform-dependent compiled module subdirectory
(relative to base_compiledir). Available keys: gxx_version, hostname,
numpy_version, platform, processor, pytensor_version, python_bitwidth,
python_int_bitwidth, python_version, short_platform.
Value: compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s

<pytensor.configparser.ConfigParam object at 0x1061f8d40>
Doc: platform-independent root directory for compiled modules
Value: /Users/erik/.pytensor

<pytensor.configparser.ConfigParam object at 0x10580db20>
Doc: platform-dependent cache directory for compiled modules
Value: /Users/erik/.pytensor/compiledir_macOS-15.6.1-arm64-arm-64bit-arm-3.12.13-64

blas__ldflags (<class 'str'>)
Doc: lib[s] to include for [Fortran] level-3 blas implementation
Value: -framework Accelerate -Wl,-rpath,/opt/anaconda3/envs/salk-pymc6/lib

blas__check_openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x106758dd0>>)
Doc: Check for openmp library conflict.
WARNING: Setting this to False leaves you open to wrong results in blas-related operations.
Value: True

scan__allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x10743f8f0>>)
Doc: Allow/disallow gc inside of Scan (default: False)
Value: False

scan__allow_output_prealloc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x10743f920>>)
Doc: Allow/disallow memory preallocation for outputs inside of scan (default: True)
Value: True

Context for the issue:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions