PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549
Draft
bewithgaurav wants to merge 8 commits intomainfrom
Draft
PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549bewithgaurav wants to merge 8 commits intomainfrom
bewithgaurav wants to merge 8 commits intomainfrom
Conversation
Move parameter type detection from Python into C++ using raw CPython type checks (PyLong_CheckExact, PyFloat_CheckExact, etc.). Merge the DetectParamTypes → BindParameters → SQLExecute pipeline into a single DDBCSQLExecuteFast call so ParamInfo never crosses the pybind11 boundary. - DetectParamTypes: handles int (range-detected), float, bool, str (unicode + geometry sniffing), bytes, datetime/date/time, Decimal (MONEY range + generic numeric), UUID, None, with fallback to string - SQLExecuteFast_wrap: single pipeline with GIL release, always uses SQLPrepare for parameterized queries - cursor.py: fast path routing when no setinputsizes overrides present; old DDBCSQLExecute path preserved for setinputsizes callers - Named constants: MAX_INLINE_CHAR, MAX_INLINE_BINARY, MAX_NUMERIC_PRECISION, MONEY/SMALLMONEY ranges, PARAM_C_TYPE_TEXT platform macro
- Add complete DAE (Data-At-Execution) loop to SQLExecuteFast_wrap: SQL_NEED_DATA → SQLParamData/SQLPutData for large str/bytes/binary, matching the existing SQLExecute_wrap logic exactly - Fix DAE type assignment: non-unicode DAE strings use SQL_C_CHAR (not PARAM_C_TYPE_TEXT which maps to SQL_C_WCHAR on macOS/Linux) - Fix MONEY range lower bound: use MONEY_MIN not SMALLMONEY_MIN so negative decimals in MONEY range bind as VARCHAR (matches Python path) - Raise TypeError for unknown param types instead of silent str conversion - Add SQLFreeStmt(SQL_RESET_PARAMS) to unbind after execute
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/ddbc_bindings.cppLines 549-561 549 info.paramCType = SQL_C_SBIGINT;
550 info.columnSize = 19;
551 }
552 } else {
! 553 PyErr_Clear();
! 554 info.paramSQLType = SQL_BIGINT;
! 555 info.paramCType = SQL_C_SBIGINT;
! 556 info.columnSize = 19;
! 557 }
558 info.decimalDigits = 0;
559 continue;
560 }Lines 572-580 572 if (PyUnicode_Check(obj)) {
573 Py_ssize_t length = PyUnicode_GET_LENGTH(obj);
574 unsigned int kind = PyUnicode_KIND(obj);
575
! 576 Py_ssize_t utf16_len;
577 if (kind <= PyUnicode_2BYTE_KIND) {
578 utf16_len = length;
579 } else {
580 utf16_len = 0;Lines 2488-2497 2488 py::list is_stmt_prepared,
2489 bool /*use_prepare*/,
2490 const py::dict& encoding_settings) {
2491 if (!statementHandle || !statementHandle->get()) {
! 2492 return SQL_INVALID_HANDLE;
! 2493 }
2494
2495 SQLHANDLE hStmt = statementHandle->get();
2496
2497 // Configure forward-only / read-only cursor (matches slow path semantics).Lines 2573-2588 2573 break;
2574 }
2575 }
2576 if (!matchedInfo) {
! 2577 ThrowStdException("SQLExecuteFast: unrecognized paramToken from SQLParamData");
! 2578 }
2579 const py::object& pyObj = matchedInfo->dataPtr;
2580 if (pyObj.is_none()) {
! 2581 py::gil_scoped_release release;
! 2582 SQLPutData_ptr(hStmt, nullptr, 0);
! 2583 continue;
! 2584 }
2585
2586 if (py::isinstance<py::str>(pyObj)) {
2587 if (matchedInfo->paramCType == SQL_C_WCHAR) {
2588 std::wstring wstr = pyObj.cast<std::wstring>();Lines 2606-2631 2606 }
2607 if (!SQL_SUCCEEDED(rc)) return rc;
2608 }
2609 } else if (matchedInfo->paramCType == SQL_C_CHAR) {
! 2610 std::string encodedStr;
! 2611 py::object encoded = pyObj.attr("encode")(charEncoding, "strict");
! 2612 encodedStr = encoded.cast<std::string>();
! 2613 const char* dataPtr = encodedStr.data();
! 2614 size_t totalBytes = encodedStr.size();
! 2615 for (size_t offset = 0; offset < totalBytes; offset += DAE_CHUNK_SIZE) {
! 2616 size_t len = std::min(static_cast<size_t>(DAE_CHUNK_SIZE),
! 2617 totalBytes - offset);
! 2618 {
! 2619 py::gil_scoped_release release;
! 2620 rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
! 2621 static_cast<SQLLEN>(len));
! 2622 }
! 2623 if (!SQL_SUCCEEDED(rc)) return rc;
! 2624 }
! 2625 } else {
! 2626 ThrowStdException("SQLExecuteFast: unsupported C type for str in DAE");
! 2627 }
2628 } else if (py::isinstance<py::bytes>(pyObj) ||
2629 py::isinstance<py::bytearray>(pyObj)) {
2630 py::bytes b = pyObj.cast<py::bytes>();
2631 std::string s = b;Lines 2641-2650 2641 }
2642 if (!SQL_SUCCEEDED(rc)) return rc;
2643 }
2644 } else {
! 2645 ThrowStdException("SQLExecuteFast: DAE only supported for str or bytes");
! 2646 }
2647 }
2648 if (!SQL_SUCCEEDED(rc) && rc != SQL_NO_DATA) return rc;
2649 }📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.9%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.6%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%🔗 Quick Links
|
- Comment out use_prepare parameter name (C4100: unreferenced parameter) - Remove unused catch variable name (C4101: unreferenced local variable)
Add explicit null pointer and zero-length guards before memcpy in build_numeric_data to satisfy DevSkim code scanning rule DS121708.
| std::memset(&nd.val[0], 0, SQL_MAX_NUMERIC_LEN); | ||
| size_t copy_len = std::min(val_str.size(), static_cast<size_t>(SQL_MAX_NUMERIC_LEN)); | ||
| if (copy_len > 0 && val_str.data() != nullptr) { | ||
| std::memcpy(&nd.val[0], val_str.data(), copy_len); |
…or attrs, parity test Six review fixes for SQLExecuteFast_wrap and DetectParamTypes: 1. Encoding key: read 'encoding' from settings dict (was 'charEncoding' which never matched). Only honor when ctype==SQL_C_CHAR so the default utf-16le doesn't corrupt SQL_C_CHAR DAE/inline byte paths. 2. Subclass support: PyLong_Check/PyFloat_Check/PyUnicode_Check/PyBytes_Check instead of *_CheckExact. Fixes user-defined int/str/bytes/float subclasses that were silently rejected with TypeError. Switched PyBytes_GET_SIZE to PyBytes_Size for subclass-safe length. 3. GIL release in DAE loop: SQLParamData and SQLPutData now release the GIL during each ODBC call, matching slow-path concurrency for large blobs/strings. 4. Preserve exec_rc: stash the SQLExecute return code before SQLFreeStmt so SUCCESS_WITH_INFO and other non-success-non-error codes are not clobbered by the unbind call. 5. Shallow-copy params: params = py::list(params) at function entry so DetectParamTypes' in-place PyList_SET_ITEM cannot mutate the caller's list under any future code path that might pass it directly. 6. Cursor attrs: SQLSetStmtAttr(SQL_ATTR_CURSOR_TYPE/CONCURRENCY) at entry to match slow-path semantics regardless of prior hstmt state. Also adds tests/test_023_fast_path_parity.py covering int/str/bytes/float subclasses, caller-list non-mutation, and unsupported-type TypeError.
Eight follow-up fixes after review feedback on c5a827f. 1. Refcount leak (BLOCKER): replace PyList_SET_ITEM (uppercase, no decref of old slot) with PyList_SetItem (decrefs old slot before stealing the new reference) in DetectParamTypes time/Decimal/UUID branches. The previous shallow-copy defense via py::list(params) was a no-op because pybind11s list constructor only inc_refs an already-list argument. 2. Geometry + DAE conflict: gate the geometry-prefix override on the not-DAE branch so a long POLYGON/POINT/LINESTRING string does not end up with isDAE=true, dataPtr set, AND a non-zero columnSize. 3. Decimal NaN/Infinity: throw ValueError instead of silently binding 0 via build_numeric_data on an empty digits tuple. 4. Time format: always emit microseconds (HH:MM:SS.ffffff), matching slow path isoformat(timespec=microseconds). 5. PyObject_IsInstance: explicit equality check so a custom __instancecheck__ that raises (returns -1) does not fall through with a Python error set. 6. Dead code: removed unused SMALLMONEY_MIN/SMALLMONEY_MAX constants and the unused utf16Len assignments in DetectParamTypes. 7. Encoding-key contract: only honor encoding_settings encoding when the user explicitly opted in via setencoding(..., ctype=SQL_C_CHAR=1). The Python layer SQL_C_CHAR constant is numerically -8 (real ODBC SQL_C_WCHAR), so by default the wide-char path is taken and encoding is irrelevant. 8. Parity test rewrite: drop the dead _force_slow_path_roundtrip helper, use the project cursor fixture instead of a hard-coded conn string, and add (a) a real fast-vs-slow parity check via setinputsizes-forced slow path, (b) a refcount-leak regression test using a Decimal subclass + weakref, (c) explicit NaN-rejection coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
This pull request refactors the parameter handling logic in the
executemethod ofmssql_python/cursor.pyto introduce a more efficient "fast path" for parameter binding and execution. The new approach allows the code to skip Python-side type detection and binding when there are noinputsizesoverrides, delegating the entire process to the C++ layer for better performance. The slow path is retained for cases whereinputsizesare set.Performance improvements and code simplification:
DDBCSQLExecuteFastto handle parameter type detection, binding, and execution entirely in C++ when there are noinputsizesoverrides, improving performance for common cases. (F783d0c6L1471)inputsizesare present. [1] [2]inputsizesoverrides are specified, ensuring compatibility with advanced usage. (F783d0c6L1471)