PERF: Optimize fetch API performance by jahnvi480 · Pull Request #558 · microsoft/mssql-python

jahnvi480 · 2026-05-07T06:49:02Z

Work Item / Issue Reference

AB#44921

GitHub Issue: #554

Summary

This pull request introduces significant performance optimizations to the mssql_python driver's row fetching and construction logic, particularly for the common case where no output converters or UUID processing are required. The changes focus on reducing Python overhead by caching encoding settings, using fast paths for row construction, and moving bulk row creation to C++ for improved speed. Additionally, some code refactoring and simplification have been applied.

Key improvements and optimizations:

Row Fetching and Construction Performance:

Added a C++ implementation (construct_rows) for building Row objects in bulk, bypassing Python list comprehensions and per-row initialization overhead, and exposed it to Python via the ddbc_bindings module. This is used as a fast path in fetchall and fetchmany when no converters or UUID processing are needed.
Introduced a static method Row._fast_create and added __slots__ to Row for memory and speed improvements, enabling direct, zero-copy assignment of row data in the fast path.

Encoding and Decoding Optimization:

Cached character and wide character encoding strings in the cursor, eliminating repeated method calls and dictionary lookups during row fetching. All fetch methods now use these cached values.

Internal Logic Improvements:

Simplified the _is_unicode_string check by using the built-in str.isascii() method for efficiency.
Optimized the SQL-to-C type mapping by moving the lookup table to a class-level cache, avoiding repeated construction of the mapping dictionary.

These changes collectively reduce per-row overhead, improve memory usage, and make row fetching significantly faster for the most common query scenarios.

Benchmark Results (5-run average, richbench repeat=5 number=5)

Tested back-to-back on the same machine, both branches freshly built from source:

Operation	main (avg)	This PR (avg)	Improvement
Fetch one (mssql vs pyodbc)	-1.7x	-1.4x	18% faster
Fetch many (mssql vs pyodbc)	-1.7x	-1.3x	24% faster
100 inserts (mssql vs pyodbc)	4.9x	5.6x	14% faster
SELECT (mssql vs pyodbc)	-1.1x	-1.0x	On par with pyodbc

Profiler Wall Clock (50K rows, single run)

Scenario	main	This PR	Improvement
fetchall (50K rows)	176.7ms	158.1ms	11% faster
fetchmany (50K rows, batch=1000)	166.6ms	138.6ms	17% faster
fetchone (1K rows)	6.7ms	6.2ms	7% faster

Profiler Breakdown: `row_wrap` phase (50K rows)

Metric	Before	After	Improvement
row_wrap total	39ms	10.5ms	73% faster (3.7x)
Per-row cost	0.78µs	0.21µs
% of fetchall wall time	22%	6%

… on SUCCESS, __slots__ Row, and C++ Row construction - Cache decoding encoding strings in cursor __init__ to avoid 2 method calls + 2 dict.get() per fetch - Skip DDBCSQLGetAllDiagRecords on SQL_SUCCESS (ODBC spec: zero records on SUCCESS) - Replace param.encode('ascii') try/except with str.isascii() (C-level check) - Class-level _SQL_TO_C_TYPE lookup table (built once, shared across cursors) - Add __slots__ to Row class (eliminates per-instance __dict__, ~232 bytes/row savings) - Add Row._fast_create static method (bypasses __init__ for common case) - Add C++ construct_rows function (builds Row objects in tight C loop, avoiding Python loop overhead) - Zero-copy Row fast path when no converters/UUID processing needed Benchmark results (5-run average, richbench repeat=5 number=5): - Fetch one: -1.7x -> -1.4x (18% improvement) - Fetch many: -1.7x -> -1.3x (24% improvement) - 100 inserts: 4.9x -> 5.6x (14% faster) - SELECT: -1.1x -> -1.0x (on par with pyodbc) Profiler wall clock (50K rows): - fetchall: 176.7ms -> 158.1ms (11% faster) - fetchmany: 166.6ms -> 138.6ms (17% faster) No overlap with PR #549 (execute fast path) or PR #526 (simdutf).

github-actions · 2026-05-07T07:01:23Z

📊 Code Coverage Report

🔥 Diff Coverage 91%	🎯 Overall Coverage 79%	📈 Total Lines Covered: `6898` out of `8680` 📁 Project: `mssql-python`

Diff Coverage

Diff: main...HEAD, staged and unstaged changes

mssql_python/cursor.py (100%)
mssql_python/pybind/ddbc_bindings.cpp (88.0%): Missing lines 5892-5894
mssql_python/row.py (82.4%): Missing lines 63,69,72

Summary

Total: 67 lines
Missing: 6 lines
Coverage: 91%

mssql_python/pybind/ddbc_bindings.cpp

Lines 5888-5898

  5888         // Set __slots__ via GenericSetAttr (uses descriptor offsets — fast path)
  5889         if (PyObject_GenericSetAttr(row, attr_values, row_data) < 0 ||
  5890             PyObject_GenericSetAttr(row, attr_column_map, column_map.ptr()) < 0 ||
  5891             PyObject_GenericSetAttr(row, attr_cursor, cursor_obj.ptr()) < 0) {
! 5892             Py_DECREF(row);
! 5893             throw py::error_already_set();
! 5894         }
  5895 
  5896         // PyList_SET_ITEM steals the reference — don't Py_DECREF row
  5897         PyList_SET_ITEM(result.ptr(), i, row);
  5898     }

mssql_python/row.py

Lines 59-67

  59         """
  60         # Fast path: no converters and no UUID stringification (common case).
  61         # Avoids the converter_map iteration and list copy entirely.
  62         if not converter_map and not uuid_str_indices:
! 63             if (
  64                 cursor
  65                 and hasattr(cursor.connection, "_output_converters")
  66                 and cursor.connection._output_converters
  67             ):

Lines 65-76

  65                 and hasattr(cursor.connection, "_output_converters")
  66                 and cursor.connection._output_converters
  67             ):
  68                 # Fallback to original method for backward compatibility
! 69                 self._values = self._apply_output_converters(values, cursor)
  70             else:
  71                 # Zero-copy: just store the reference directly
! 72                 self._values = values
  73         else:
  74             # Apply output converters if available using pre-computed converter map
  75             if converter_map:
  76                 self._values = self._apply_output_converters_optimized(values, converter_map)

📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)

mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.9%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.row.py: 72.3%
mssql_python.pybind.ddbc_bindings.cpp: 74.7%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%

🔗 Quick Links

⚙️ Build Summary	📋 Coverage Details
View Azure DevOps Build	Browse Full Coverage Report

github-actions Bot added the pr-size: medium Moderate update size label May 7, 2026

Applying python linting changes

5b91325

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Optimize fetch API performance#558

PERF: Optimize fetch API performance#558
jahnvi480 wants to merge 2 commits intomainfrom
jahnvi/perf-fetch-optimization

jahnvi480 commented May 7, 2026 •

edited by azure-boards Bot

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

🔥 Diff Coverage

91%

🎯 Overall Coverage

79%

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jahnvi480 commented May 7, 2026 • edited by azure-boards Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Work Item / Issue Reference

Summary

Benchmark Results (5-run average, richbench repeat=5 number=5)

Profiler Wall Clock (50K rows, single run)

Profiler Breakdown: row_wrap phase (50K rows)

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Code Coverage Report

🔥 Diff Coverage

91%

🎯 Overall Coverage

79%

Diff Coverage

Diff: main...HEAD, staged and unstaged changes

Summary

mssql_python/pybind/ddbc_bindings.cpp

mssql_python/row.py

📋 Files Needing Attention

🔗 Quick Links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jahnvi480 commented May 7, 2026 •

edited by azure-boards Bot

Loading

Profiler Breakdown: `row_wrap` phase (50K rows)

github-actions Bot commented May 7, 2026 •

edited

Loading