Skip to content

feat: cythonise all sources#176

Open
P403n1x87 wants to merge 16 commits into
MatthieuDartiailh:mainfrom
P403n1x87:cythonize
Open

feat: cythonise all sources#176
P403n1x87 wants to merge 16 commits into
MatthieuDartiailh:mainfrom
P403n1x87:cythonize

Conversation

@P403n1x87
Copy link
Copy Markdown
Contributor

@P403n1x87 P403n1x87 commented Nov 21, 2025

Summary

Compile all bytecode sources as Cython extension modules to improve runtime performance. With plain Cython compilation (no type annotations), the speedup over pure Python was only ~3% (with 3.14, much higher with older versions), because every attribute access on the hot BaseInstr/InstrLocation objects still went through Python's generic attribute lookup (_PyObject_GenericGetAttrWithDict).

This PR adds .pxd declaration files that tell Cython these classes are cdef class extension types with C-level struct fields, eliminating the attribute lookup overhead for the hot paths.

Key changes

  • src/bytecode/instr.pxd — declares InstrLocation, BaseInstr, Instr as cdef class with public C-level attributes
  • src/bytecode/concrete.pxd — declares ConcreteInstr(BaseInstr) with _extended_args and _size as cdef public fields
  • @cython.cclass added to InstrLocation, BaseInstr, Instr, ConcreteInstr in their .py files (no-op when Cython is not installed, so pure Python mode is unaffected)
  • BaseInstr drops Generic[A] base (incompatible with cdef class); adds __class_getitem__ so BaseInstr[int] syntax still works in annotations
  • object.__new__ replaced with cls.__new__ throughout fast-path constructors (copy, _from_trusted, _from_opcode, _from_tuple)
  • InstrLocation.__init__ and _from_tuple branch on cython.compiled to use direct assignment in compiled mode vs object.__setattr__ in pure Python (where @dataclass(frozen=True) is still in effect)
  • .pxd files are included in Cython wheel builds only (package_data)
  • setup.py: cython added as unconditional setup_requires; annotation_typing left False to avoid treating function parameter annotations as C types
  • pyproject.toml: mypy ignores type errors in the affected modules (dropping Generic[A] from BaseInstr cascades type errors on this branch that don't affect runtime correctness)

Native CPU profile comparison

Hotspot Before After
_PyObject_GenericGetAttrWithDict own 5.47% 4.28%
_PyObject_GenericGetAttrWithDict total 19.74% 13.20%
PyMember_GetOne (slot access) 1.77% eliminated

PyMember_GetOne disappears entirely — slot-based member access is replaced by direct C struct field access. The remaining _PyObject_GenericGetAttrWithDict total comes from other classes not yet declared as cdef class (e.g. ConcreteBytecode, Bytecode, ControlFlowGraph).

Throughput (Bytecode.from_code().to_code() round-trip, Python 3.14.4)

Build r/s range median
Pure Python 3.14 125-130 ~129
Cythonized 3.14 153-163 ~161

The Cython speedup over pure Python is ~25%.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 83.63636% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.17%. Comparing base (41c679b) to head (127d6eb).

Files with missing lines Patch % Lines
src/bytecode/instr.py 78.57% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #176      +/-   ##
==========================================
- Coverage   95.43%   95.17%   -0.26%     
==========================================
  Files           7        7              
  Lines        2126     2157      +31     
  Branches      458      461       +3     
==========================================
+ Hits         2029     2053      +24     
- Misses         54       60       +6     
- Partials       43       44       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

Thanks for opening this. I am not sure when I will be able to follow up. Among the failures I noticed a comparison of instruction name which could be optimized as an integer comparison.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

I tried rebasing but now all tests are failing...

@P403n1x87 P403n1x87 changed the title feat!: cythonise all sources feat: cythonise all sources May 7, 2026
@P403n1x87 P403n1x87 marked this pull request as ready for review May 7, 2026 15:10
@P403n1x87
Copy link
Copy Markdown
Contributor Author

@MatthieuDartiailh sorry for taking this long, this should now be good for review 🤞

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

Nice ! Do you have an estimate of the speed up ?
Also is there any way to distinguish between the cython compiled wheel and the pure python wheels when installing ?

@P403n1x87
Copy link
Copy Markdown
Contributor Author

Do you have an estimate of the speed up ?

I have added some benchmarks. CI might be too noisy to get reliable numbers, so probably best running locally.

Also is there any way to distinguish between the cython compiled wheel and the pure python wheels when installing ?

I have added some output to the setup.py, but this won't be visible in general with pip. For tests, I have added a conftest.py to report the build type as part of the pytest headers. There are also test runs for both the cythonised and pure-Python versions.

@MatthieuDartiailh
Copy link
Copy Markdown
Owner

I will merge after getting a chance to run benchmarks locally.

P403n1x87 and others added 3 commits May 8, 2026 16:19
Add Cython .pxd declaration files and @cython.cclass decorators so that
BaseInstr, Instr, ConcreteInstr and InstrLocation are compiled as C
extension types (cdef class) instead of regular Python classes.

Without typed declarations, every attribute access on these objects went
through _PyObject_GenericGetAttrWithDict (Python's generic slot/dict
lookup), which dominated the native CPU profile at ~19% combined.
With cdef public attributes declared in .pxd files, Cython generates
direct C struct field access, eliminating the dict lookup chain for
the declared fields.

Changes:
- src/bytecode/instr.pxd: declares InstrLocation, BaseInstr, Instr as
  cdef classes with public C-level attributes
- src/bytecode/concrete.pxd: declares ConcreteInstr(BaseInstr) with
  _extended_args and _size as cdef public fields
- @cython.cclass added to InstrLocation, BaseInstr, Instr, ConcreteInstr
  in their .py files (no-op when Cython not installed)
- BaseInstr drops Generic[A] base (incompatible with cdef class); adds
  __class_getitem__ so BaseInstr[int] syntax still works for annotations
- object.__new__ replaced with cls.__new__ throughout fast-path
  constructors (copy, _from_trusted, _from_opcode, _from_tuple)
- InstrLocation.__init__ and _from_tuple branch on cython.compiled to
  use direct assignment in compiled mode vs object.__setattr__ in pure
  Python (where @DataClass(frozen=True) is still in effect)
- .pxd files are included in Cython wheel builds only (package_data)
- setup.py: cython added as unconditional setup_requires so import cython
  is always available; annotation_typing left False to avoid treating
  function parameter annotations as C types
- pyproject.toml: mypy ignores errors in the affected modules since
  dropping Generic[A] from BaseInstr cascades type errors on this branch

Native CPU profile (~4.2 kHz sampling, ~6k samples, Python 3.14.4):

| Hotspot | Before | After |
|---|---|---|
| `_PyObject_GenericGetAttrWithDict` own | 5.47% | 4.28% |
| `_PyObject_GenericGetAttrWithDict` total | 19.74% | 13.20% |
| `PyMember_GetOne` (slot access) | 1.77% | eliminated |

Throughput analysis:

| Build | r/s range | median |
|---|---|---|
| Pure Python 3.14 | 125–130 | ~129 |
| Cythonized 3.14 (before) | 130–134 | ~133 |
| Cythonized 3.14 (after) | 153–163 | ~161 |

The Cython speedup over pure Python went from ~3% to ~25%.
In Cython cdef class, a property without a deleter raises
NotImplementedError instead of Python's AttributeError. Add an explicit
deleter to BaseInstr.arg that raises AttributeError to keep consistent
behaviour across pure Python and Cython builds.
@P403n1x87
Copy link
Copy Markdown
Contributor Author

@MatthieuDartiailh i think this is as good as it gets for now. To push things a bit further we would have to refactor the code to avoid multiple inheritance, since that prevents us from turning more classes into more efficient c structs with Cython

Comment thread src/bytecode/instr.py Outdated
path: dist
merge-multiple: true

- uses: pypa/gh-action-pypi-publish@release/v1
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if cythonized and pure python wheels have the same name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they can't possibly have the same name because cythonised wheels will have at the very least a target CPython version tag, whereas the pure Python ones will have a generic tag.

Comment thread src/bytecode/instr.pxd Outdated
Comment thread src/bytecode/instr.py
Comment thread src/bytecode/instr.py
Copy link
Copy Markdown
Owner

@MatthieuDartiailh MatthieuDartiailh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting close. I will have to check manually what wheel pip will choose when confronted with both universal and cython compiled wheels.

Comment thread src/bytecode/instr.pxd Outdated
Comment thread src/bytecode/instr.py

_arg: A

def _set(self, name: str, arg: A) -> None:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typing looks wrong now.

Comment thread src/bytecode/instr.pyi
Copy link
Copy Markdown
Owner

@MatthieuDartiailh MatthieuDartiailh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last questions but we are getting there.

Comment thread setup.py

# Include .pxd declaration files only in Cython builds so they are available
# to downstream Cython users who want to cimport from bytecode.
_package_data = {} if _pure_python else {"bytecode": ["*.pxd"]}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not dealt with that in a long time but since you specify files manually we may need to add .pyi now.

Comment thread setup.py

setup(
name="bytecode",
setup_requires=["setuptools_scm[toml]>=4", "cython"] + ([] if _pure_python else ["cmake>=3.24.2,<3.28"]),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is cmake required ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants