Skip to content

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549

Draft
bewithgaurav wants to merge 8 commits intomainfrom
bewithgaurav/insertmany-perf-detect-types
Draft

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549
bewithgaurav wants to merge 8 commits intomainfrom
bewithgaurav/insertmany-perf-detect-types

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav commented Apr 29, 2026

Work Item / Issue Reference

AB#44979

GitHub Issue: #500


Summary

This pull request refactors the parameter handling logic in the execute method of mssql_python/cursor.py to introduce a more efficient "fast path" for parameter binding and execution. The new approach allows the code to skip Python-side type detection and binding when there are no inputsizes overrides, delegating the entire process to the C++ layer for better performance. The slow path is retained for cases where inputsizes are set.

Performance improvements and code simplification:

  • Added a "fast path" execution branch that uses DDBCSQLExecuteFast to handle parameter type detection, binding, and execution entirely in C++ when there are no inputsizes overrides, improving performance for common cases. (F783d0c6L1471)
  • Removed the unconditional creation of the Python-side parameter type list and its associated logic from the main execution path, now only used in the "slow path" when inputsizes are present. [1] [2]
  • Retained the existing Python-side type detection and parameter binding logic as a fallback for cases where inputsizes overrides are specified, ensuring compatibility with advanced usage. (F783d0c6L1471)

Move parameter type detection from Python into C++ using raw CPython
type checks (PyLong_CheckExact, PyFloat_CheckExact, etc.). Merge the
DetectParamTypes → BindParameters → SQLExecute pipeline into a single
DDBCSQLExecuteFast call so ParamInfo never crosses the pybind11 boundary.

- DetectParamTypes: handles int (range-detected), float, bool, str
  (unicode + geometry sniffing), bytes, datetime/date/time, Decimal
  (MONEY range + generic numeric), UUID, None, with fallback to string
- SQLExecuteFast_wrap: single pipeline with GIL release, always uses
  SQLPrepare for parameterized queries
- cursor.py: fast path routing when no setinputsizes overrides present;
  old DDBCSQLExecute path preserved for setinputsizes callers
- Named constants: MAX_INLINE_CHAR, MAX_INLINE_BINARY, MAX_NUMERIC_PRECISION,
  MONEY/SMALLMONEY ranges, PARAM_C_TYPE_TEXT platform macro
Comment thread mssql_python/pybind/ddbc_bindings.cpp Fixed
- Add complete DAE (Data-At-Execution) loop to SQLExecuteFast_wrap:
  SQL_NEED_DATA → SQLParamData/SQLPutData for large str/bytes/binary,
  matching the existing SQLExecute_wrap logic exactly
- Fix DAE type assignment: non-unicode DAE strings use SQL_C_CHAR
  (not PARAM_C_TYPE_TEXT which maps to SQL_C_WCHAR on macOS/Linux)
- Fix MONEY range lower bound: use MONEY_MIN not SMALLMONEY_MIN so
  negative decimals in MONEY range bind as VARCHAR (matches Python path)
- Raise TypeError for unknown param types instead of silent str conversion
- Add SQLFreeStmt(SQL_RESET_PARAMS) to unbind after execute
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

📊 Code Coverage Report

🔥 Diff Coverage

91%


🎯 Overall Coverage

79%


📈 Total Lines Covered: 7147 out of 9037
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/cursor.py (100%)
  • mssql_python/pybind/ddbc_bindings.cpp (91.5%): Missing lines 553-557,576,2492-2493,2577-2578,2581-2584,2610-2627,2645-2646

Summary

  • Total: 410 lines
  • Missing: 34 lines
  • Coverage: 91%

mssql_python/pybind/ddbc_bindings.cpp

Lines 549-561

  549                     info.paramCType = SQL_C_SBIGINT;
  550                     info.columnSize = 19;
  551                 }
  552             } else {
! 553                 PyErr_Clear();
! 554                 info.paramSQLType = SQL_BIGINT;
! 555                 info.paramCType = SQL_C_SBIGINT;
! 556                 info.columnSize = 19;
! 557             }
  558             info.decimalDigits = 0;
  559             continue;
  560         }

Lines 572-580

  572         if (PyUnicode_Check(obj)) {
  573             Py_ssize_t length = PyUnicode_GET_LENGTH(obj);
  574             unsigned int kind = PyUnicode_KIND(obj);
  575 
! 576             Py_ssize_t utf16_len;
  577             if (kind <= PyUnicode_2BYTE_KIND) {
  578                 utf16_len = length;
  579             } else {
  580                 utf16_len = 0;

Lines 2488-2497

  2488                               py::list is_stmt_prepared,
  2489                               bool /*use_prepare*/,
  2490                               const py::dict& encoding_settings) {
  2491     if (!statementHandle || !statementHandle->get()) {
! 2492         return SQL_INVALID_HANDLE;
! 2493     }
  2494 
  2495     SQLHANDLE hStmt = statementHandle->get();
  2496 
  2497     // Configure forward-only / read-only cursor (matches slow path semantics).

Lines 2573-2588

  2573                     break;
  2574                 }
  2575             }
  2576             if (!matchedInfo) {
! 2577                 ThrowStdException("SQLExecuteFast: unrecognized paramToken from SQLParamData");
! 2578             }
  2579             const py::object& pyObj = matchedInfo->dataPtr;
  2580             if (pyObj.is_none()) {
! 2581                 py::gil_scoped_release release;
! 2582                 SQLPutData_ptr(hStmt, nullptr, 0);
! 2583                 continue;
! 2584             }
  2585 
  2586             if (py::isinstance<py::str>(pyObj)) {
  2587                 if (matchedInfo->paramCType == SQL_C_WCHAR) {
  2588                     std::wstring wstr = pyObj.cast<std::wstring>();

Lines 2606-2631

  2606                         }
  2607                         if (!SQL_SUCCEEDED(rc)) return rc;
  2608                     }
  2609                 } else if (matchedInfo->paramCType == SQL_C_CHAR) {
! 2610                     std::string encodedStr;
! 2611                     py::object encoded = pyObj.attr("encode")(charEncoding, "strict");
! 2612                     encodedStr = encoded.cast<std::string>();
! 2613                     const char* dataPtr = encodedStr.data();
! 2614                     size_t totalBytes = encodedStr.size();
! 2615                     for (size_t offset = 0; offset < totalBytes; offset += DAE_CHUNK_SIZE) {
! 2616                         size_t len = std::min(static_cast<size_t>(DAE_CHUNK_SIZE),
! 2617                                               totalBytes - offset);
! 2618                         {
! 2619                             py::gil_scoped_release release;
! 2620                             rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
! 2621                                                 static_cast<SQLLEN>(len));
! 2622                         }
! 2623                         if (!SQL_SUCCEEDED(rc)) return rc;
! 2624                     }
! 2625                 } else {
! 2626                     ThrowStdException("SQLExecuteFast: unsupported C type for str in DAE");
! 2627                 }
  2628             } else if (py::isinstance<py::bytes>(pyObj) ||
  2629                        py::isinstance<py::bytearray>(pyObj)) {
  2630                 py::bytes b = pyObj.cast<py::bytes>();
  2631                 std::string s = b;

Lines 2641-2650

  2641                     }
  2642                     if (!SQL_SUCCEEDED(rc)) return rc;
  2643                 }
  2644             } else {
! 2645                 ThrowStdException("SQLExecuteFast: DAE only supported for str or bytes");
! 2646             }
  2647         }
  2648         if (!SQL_SUCCEEDED(rc) && rc != SQL_NO_DATA) return rc;
  2649     }


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.9%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.6%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

- Comment out use_prepare parameter name (C4100: unreferenced parameter)
- Remove unused catch variable name (C4101: unreferenced local variable)
Add explicit null pointer and zero-length guards before memcpy in
build_numeric_data to satisfy DevSkim code scanning rule DS121708.
std::memset(&nd.val[0], 0, SQL_MAX_NUMERIC_LEN);
size_t copy_len = std::min(val_str.size(), static_cast<size_t>(SQL_MAX_NUMERIC_LEN));
if (copy_len > 0 && val_str.data() != nullptr) {
std::memcpy(&nd.val[0], val_str.data(), copy_len);
@github-actions github-actions Bot added the pr-size: large Substantial code update label May 7, 2026
…or attrs, parity test

Six review fixes for SQLExecuteFast_wrap and DetectParamTypes:

1. Encoding key: read 'encoding' from settings dict (was 'charEncoding'
   which never matched). Only honor when ctype==SQL_C_CHAR so the default
   utf-16le doesn't corrupt SQL_C_CHAR DAE/inline byte paths.
2. Subclass support: PyLong_Check/PyFloat_Check/PyUnicode_Check/PyBytes_Check
   instead of *_CheckExact. Fixes user-defined int/str/bytes/float
   subclasses that were silently rejected with TypeError. Switched
   PyBytes_GET_SIZE to PyBytes_Size for subclass-safe length.
3. GIL release in DAE loop: SQLParamData and SQLPutData now release the
   GIL during each ODBC call, matching slow-path concurrency for large
   blobs/strings.
4. Preserve exec_rc: stash the SQLExecute return code before SQLFreeStmt
   so SUCCESS_WITH_INFO and other non-success-non-error codes are not
   clobbered by the unbind call.
5. Shallow-copy params: params = py::list(params) at function entry so
   DetectParamTypes' in-place PyList_SET_ITEM cannot mutate the caller's
   list under any future code path that might pass it directly.
6. Cursor attrs: SQLSetStmtAttr(SQL_ATTR_CURSOR_TYPE/CONCURRENCY) at
   entry to match slow-path semantics regardless of prior hstmt state.

Also adds tests/test_023_fast_path_parity.py covering int/str/bytes/float
subclasses, caller-list non-mutation, and unsupported-type TypeError.
Comment thread tests/test_023_fast_path_parity.py Fixed
Eight follow-up fixes after review feedback on c5a827f.

1. Refcount leak (BLOCKER): replace PyList_SET_ITEM (uppercase, no decref of
   old slot) with PyList_SetItem (decrefs old slot before stealing the new
   reference) in DetectParamTypes time/Decimal/UUID branches. The previous
   shallow-copy defense via py::list(params) was a no-op because pybind11s
   list constructor only inc_refs an already-list argument.
2. Geometry + DAE conflict: gate the geometry-prefix override on the not-DAE
   branch so a long POLYGON/POINT/LINESTRING string does not end up with
   isDAE=true, dataPtr set, AND a non-zero columnSize.
3. Decimal NaN/Infinity: throw ValueError instead of silently binding 0 via
   build_numeric_data on an empty digits tuple.
4. Time format: always emit microseconds (HH:MM:SS.ffffff), matching slow
   path isoformat(timespec=microseconds).
5. PyObject_IsInstance: explicit equality check so a custom __instancecheck__
   that raises (returns -1) does not fall through with a Python error set.
6. Dead code: removed unused SMALLMONEY_MIN/SMALLMONEY_MAX constants and the
   unused utf16Len assignments in DetectParamTypes.
7. Encoding-key contract: only honor encoding_settings encoding when the
   user explicitly opted in via setencoding(..., ctype=SQL_C_CHAR=1). The
   Python layer SQL_C_CHAR constant is numerically -8 (real ODBC SQL_C_WCHAR),
   so by default the wide-char path is taken and encoding is irrelevant.
8. Parity test rewrite: drop the dead _force_slow_path_roundtrip helper, use
   the project cursor fixture instead of a hard-coded conn string, and add
   (a) a real fast-vs-slow parity check via setinputsizes-forced slow path,
   (b) a refcount-leak regression test using a Decimal subclass + weakref,
   (c) explicit NaN-rejection coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants