Skip to content

PERF: Optimize fetch API performance#558

Draft
jahnvi480 wants to merge 2 commits intomainfrom
jahnvi/perf-fetch-optimization
Draft

PERF: Optimize fetch API performance#558
jahnvi480 wants to merge 2 commits intomainfrom
jahnvi/perf-fetch-optimization

Conversation

@jahnvi480
Copy link
Copy Markdown
Contributor

@jahnvi480 jahnvi480 commented May 7, 2026

Work Item / Issue Reference

AB#44921

GitHub Issue: #554


Summary

This pull request introduces significant performance optimizations to the mssql_python driver's row fetching and construction logic, particularly for the common case where no output converters or UUID processing are required. The changes focus on reducing Python overhead by caching encoding settings, using fast paths for row construction, and moving bulk row creation to C++ for improved speed. Additionally, some code refactoring and simplification have been applied.

Key improvements and optimizations:

Row Fetching and Construction Performance:

  • Added a C++ implementation (construct_rows) for building Row objects in bulk, bypassing Python list comprehensions and per-row initialization overhead, and exposed it to Python via the ddbc_bindings module. This is used as a fast path in fetchall and fetchmany when no converters or UUID processing are needed.
  • Introduced a static method Row._fast_create and added __slots__ to Row for memory and speed improvements, enabling direct, zero-copy assignment of row data in the fast path.

Encoding and Decoding Optimization:

  • Cached character and wide character encoding strings in the cursor, eliminating repeated method calls and dictionary lookups during row fetching. All fetch methods now use these cached values.

Internal Logic Improvements:

  • Simplified the _is_unicode_string check by using the built-in str.isascii() method for efficiency.
  • Optimized the SQL-to-C type mapping by moving the lookup table to a class-level cache, avoiding repeated construction of the mapping dictionary.

These changes collectively reduce per-row overhead, improve memory usage, and make row fetching significantly faster for the most common query scenarios.

Benchmark Results (5-run average, richbench repeat=5 number=5)

Tested back-to-back on the same machine, both branches freshly built from source:

Operation main (avg) This PR (avg) Improvement
Fetch one (mssql vs pyodbc) -1.7x -1.4x 18% faster
Fetch many (mssql vs pyodbc) -1.7x -1.3x 24% faster
100 inserts (mssql vs pyodbc) 4.9x 5.6x 14% faster
SELECT (mssql vs pyodbc) -1.1x -1.0x On par with pyodbc

Profiler Wall Clock (50K rows, single run)

Scenario main This PR Improvement
fetchall (50K rows) 176.7ms 158.1ms 11% faster
fetchmany (50K rows, batch=1000) 166.6ms 138.6ms 17% faster
fetchone (1K rows) 6.7ms 6.2ms 7% faster

Profiler Breakdown: row_wrap phase (50K rows)

Metric Before After Improvement
row_wrap total 39ms 10.5ms 73% faster (3.7x)
Per-row cost 0.78µs 0.21µs
% of fetchall wall time 22% 6%

… on SUCCESS, __slots__ Row, and C++ Row construction - Cache decoding encoding strings in cursor __init__ to avoid 2 method calls + 2 dict.get() per fetch - Skip DDBCSQLGetAllDiagRecords on SQL_SUCCESS (ODBC spec: zero records on SUCCESS) - Replace param.encode('ascii') try/except with str.isascii() (C-level check) - Class-level _SQL_TO_C_TYPE lookup table (built once, shared across cursors) - Add __slots__ to Row class (eliminates per-instance __dict__, ~232 bytes/row savings) - Add Row._fast_create static method (bypasses __init__ for common case) - Add C++ construct_rows function (builds Row objects in tight C loop, avoiding Python loop overhead) - Zero-copy Row fast path when no converters/UUID processing needed Benchmark results (5-run average, richbench repeat=5 number=5): - Fetch one: -1.7x -> -1.4x (18% improvement) - Fetch many: -1.7x -> -1.3x (24% improvement) - 100 inserts: 4.9x -> 5.6x (14% faster) - SELECT: -1.1x -> -1.0x (on par with pyodbc) Profiler wall clock (50K rows): - fetchall: 176.7ms -> 158.1ms (11% faster) - fetchmany: 166.6ms -> 138.6ms (17% faster) No overlap with PR #549 (execute fast path) or PR #526 (simdutf).
@github-actions github-actions Bot added the pr-size: medium Moderate update size label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

📊 Code Coverage Report

🔥 Diff Coverage

91%


🎯 Overall Coverage

79%


📈 Total Lines Covered: 6898 out of 8680
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/cursor.py (100%)
  • mssql_python/pybind/ddbc_bindings.cpp (88.0%): Missing lines 5892-5894
  • mssql_python/row.py (82.4%): Missing lines 63,69,72

Summary

  • Total: 67 lines
  • Missing: 6 lines
  • Coverage: 91%

mssql_python/pybind/ddbc_bindings.cpp

Lines 5888-5898

  5888         // Set __slots__ via GenericSetAttr (uses descriptor offsets — fast path)
  5889         if (PyObject_GenericSetAttr(row, attr_values, row_data) < 0 ||
  5890             PyObject_GenericSetAttr(row, attr_column_map, column_map.ptr()) < 0 ||
  5891             PyObject_GenericSetAttr(row, attr_cursor, cursor_obj.ptr()) < 0) {
! 5892             Py_DECREF(row);
! 5893             throw py::error_already_set();
! 5894         }
  5895 
  5896         // PyList_SET_ITEM steals the reference — don't Py_DECREF row
  5897         PyList_SET_ITEM(result.ptr(), i, row);
  5898     }

mssql_python/row.py

Lines 59-67

  59         """
  60         # Fast path: no converters and no UUID stringification (common case).
  61         # Avoids the converter_map iteration and list copy entirely.
  62         if not converter_map and not uuid_str_indices:
! 63             if (
  64                 cursor
  65                 and hasattr(cursor.connection, "_output_converters")
  66                 and cursor.connection._output_converters
  67             ):

Lines 65-76

  65                 and hasattr(cursor.connection, "_output_converters")
  66                 and cursor.connection._output_converters
  67             ):
  68                 # Fallback to original method for backward compatibility
! 69                 self._values = self._apply_output_converters(values, cursor)
  70             else:
  71                 # Zero-copy: just store the reference directly
! 72                 self._values = values
  73         else:
  74             # Apply output converters if available using pre-computed converter map
  75             if converter_map:
  76                 self._values = self._apply_output_converters_optimized(values, converter_map)


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.9%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.row.py: 72.3%
mssql_python.pybind.ddbc_bindings.cpp: 74.7%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant