feat: cythonise all sources#176
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #176 +/- ##
==========================================
- Coverage 95.43% 95.17% -0.26%
==========================================
Files 7 7
Lines 2126 2157 +31
Branches 458 461 +3
==========================================
+ Hits 2029 2053 +24
- Misses 54 60 +6
- Partials 43 44 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for opening this. I am not sure when I will be able to follow up. Among the failures I noticed a comparison of instruction name which could be optimized as an integer comparison. |
86db41f to
217b80c
Compare
|
I tried rebasing but now all tests are failing... |
|
@MatthieuDartiailh sorry for taking this long, this should now be good for review 🤞 |
|
Nice ! Do you have an estimate of the speed up ? |
I have added some benchmarks. CI might be too noisy to get reliable numbers, so probably best running locally.
I have added some output to the |
|
I will merge after getting a chance to run benchmarks locally. |
Add Cython .pxd declaration files and @cython.cclass decorators so that BaseInstr, Instr, ConcreteInstr and InstrLocation are compiled as C extension types (cdef class) instead of regular Python classes. Without typed declarations, every attribute access on these objects went through _PyObject_GenericGetAttrWithDict (Python's generic slot/dict lookup), which dominated the native CPU profile at ~19% combined. With cdef public attributes declared in .pxd files, Cython generates direct C struct field access, eliminating the dict lookup chain for the declared fields. Changes: - src/bytecode/instr.pxd: declares InstrLocation, BaseInstr, Instr as cdef classes with public C-level attributes - src/bytecode/concrete.pxd: declares ConcreteInstr(BaseInstr) with _extended_args and _size as cdef public fields - @cython.cclass added to InstrLocation, BaseInstr, Instr, ConcreteInstr in their .py files (no-op when Cython not installed) - BaseInstr drops Generic[A] base (incompatible with cdef class); adds __class_getitem__ so BaseInstr[int] syntax still works for annotations - object.__new__ replaced with cls.__new__ throughout fast-path constructors (copy, _from_trusted, _from_opcode, _from_tuple) - InstrLocation.__init__ and _from_tuple branch on cython.compiled to use direct assignment in compiled mode vs object.__setattr__ in pure Python (where @DataClass(frozen=True) is still in effect) - .pxd files are included in Cython wheel builds only (package_data) - setup.py: cython added as unconditional setup_requires so import cython is always available; annotation_typing left False to avoid treating function parameter annotations as C types - pyproject.toml: mypy ignores errors in the affected modules since dropping Generic[A] from BaseInstr cascades type errors on this branch Native CPU profile (~4.2 kHz sampling, ~6k samples, Python 3.14.4): | Hotspot | Before | After | |---|---|---| | `_PyObject_GenericGetAttrWithDict` own | 5.47% | 4.28% | | `_PyObject_GenericGetAttrWithDict` total | 19.74% | 13.20% | | `PyMember_GetOne` (slot access) | 1.77% | eliminated | Throughput analysis: | Build | r/s range | median | |---|---|---| | Pure Python 3.14 | 125–130 | ~129 | | Cythonized 3.14 (before) | 130–134 | ~133 | | Cythonized 3.14 (after) | 153–163 | ~161 | The Cython speedup over pure Python went from ~3% to ~25%.
In Cython cdef class, a property without a deleter raises NotImplementedError instead of Python's AttributeError. Add an explicit deleter to BaseInstr.arg that raises AttributeError to keep consistent behaviour across pure Python and Cython builds.
|
@MatthieuDartiailh i think this is as good as it gets for now. To push things a bit further we would have to refactor the code to avoid multiple inheritance, since that prevents us from turning more classes into more efficient c structs with Cython |
| path: dist | ||
| merge-multiple: true | ||
|
|
||
| - uses: pypa/gh-action-pypi-publish@release/v1 |
There was a problem hiding this comment.
This will fail if cythonized and pure python wheels have the same name.
There was a problem hiding this comment.
I believe they can't possibly have the same name because cythonised wheels will have at the very least a target CPython version tag, whereas the pure Python ones will have a generic tag.
MatthieuDartiailh
left a comment
There was a problem hiding this comment.
We are getting close. I will have to check manually what wheel pip will choose when confronted with both universal and cython compiled wheels.
|
|
||
| _arg: A | ||
|
|
||
| def _set(self, name: str, arg: A) -> None: |
There was a problem hiding this comment.
This typing looks wrong now.
|
|
||
| # Include .pxd declaration files only in Cython builds so they are available | ||
| # to downstream Cython users who want to cimport from bytecode. | ||
| _package_data = {} if _pure_python else {"bytecode": ["*.pxd"]} |
There was a problem hiding this comment.
I have not dealt with that in a long time but since you specify files manually we may need to add .pyi now.
|
|
||
| setup( | ||
| name="bytecode", | ||
| setup_requires=["setuptools_scm[toml]>=4", "cython"] + ([] if _pure_python else ["cmake>=3.24.2,<3.28"]), |
There was a problem hiding this comment.
Why is cmake required ?
Summary
Compile all bytecode sources as Cython extension modules to improve runtime performance. With plain Cython compilation (no type annotations), the speedup over pure Python was only ~3% (with 3.14, much higher with older versions), because every attribute access on the hot
BaseInstr/InstrLocationobjects still went through Python's generic attribute lookup (_PyObject_GenericGetAttrWithDict).This PR adds
.pxddeclaration files that tell Cython these classes arecdef classextension types with C-level struct fields, eliminating the attribute lookup overhead for the hot paths.Key changes
src/bytecode/instr.pxd— declaresInstrLocation,BaseInstr,Instrascdef classwithpublicC-level attributessrc/bytecode/concrete.pxd— declaresConcreteInstr(BaseInstr)with_extended_argsand_sizeascdef publicfields@cython.cclassadded toInstrLocation,BaseInstr,Instr,ConcreteInstrin their.pyfiles (no-op when Cython is not installed, so pure Python mode is unaffected)BaseInstrdropsGeneric[A]base (incompatible withcdef class); adds__class_getitem__soBaseInstr[int]syntax still works in annotationsobject.__new__replaced withcls.__new__throughout fast-path constructors (copy,_from_trusted,_from_opcode,_from_tuple)InstrLocation.__init__and_from_tuplebranch oncython.compiledto use direct assignment in compiled mode vsobject.__setattr__in pure Python (where@dataclass(frozen=True)is still in effect).pxdfiles are included in Cython wheel builds only (package_data)setup.py:cythonadded as unconditionalsetup_requires;annotation_typingleftFalseto avoid treating function parameter annotations as C typespyproject.toml:mypyignores type errors in the affected modules (droppingGeneric[A]fromBaseInstrcascades type errors on this branch that don't affect runtime correctness)Native CPU profile comparison
PyMember_GetOne disappears entirely — slot-based member access is replaced by direct C struct field access. The remaining _PyObject_GenericGetAttrWithDict total comes from other classes not yet declared as cdef class (e.g. ConcreteBytecode, Bytecode, ControlFlowGraph).
Throughput (Bytecode.from_code().to_code() round-trip, Python 3.14.4)
The Cython speedup over pure Python is ~25%.