Skip to content

Add support for maintaining different connector lists for different connection string #9

Merged
HarshDaryani896 merged 13 commits into
9.0.2.xfrom
9.0.2.x-pooling-bug-fix
Apr 13, 2026
Merged

Add support for maintaining different connector lists for different connection string #9
HarshDaryani896 merged 13 commits into
9.0.2.xfrom
9.0.2.x-pooling-bug-fix

Conversation

@Sfurti-yb
Copy link
Copy Markdown
Collaborator

@Sfurti-yb Sfurti-yb commented Dec 16, 2025

Problem statement:
Since the _pools variable (list of pools. Each pool corresponds to one node) is static, it only has one connectors list per host. These connectors are created with the connection configuration with which each of the pools were created.
In this case, when the connection string changes, the connector created from previous conn string is given if it is idle.

Solution:
Create a wrapper around PoolingDatasource which maintains two new maps, connection string to connectors and connection string to idle connectors map. When an idle connector is available, check the connection string and give a connector corresponding to that. Same with when a connector is to be returned.

Tests:
Added Tests in YBPoolingWrapperTests:

  • Test with two connection strings, and check user for each connection string
  • Same for connection strings with topology keys and Multithreaded
  • Performance test App - Driver-Examples PR
    - With pooling, the average for all iterations (minus the first one) is 0ms for both the upstream and our driver
    - Without pooling, the average time for all iterations (minus the first one) is 138ms for upstream driver and 147ms for smart driver.

Problem statement 2:
In case of a 6 node cluster where the configuration was :
127.0.0.1 (primary) -> datacenter1
127.0.0.2 (primary), 127.0.0.4(read replica) -> datacenter2
127.0.0.3 (primary), 127.0.0.5(read replica) -> datacenter3
127.0.0.6(read replica) -> datacenter 4

Topology key priority: datacenter2:1, datacenter3:2

In case of prefer primary, when 127.0.0.2 and 127.0.0.3 are down, the connection should go to 127.0.0.1 (last primary node). But instead it was going to 127.0.0.4.

Solution:
This was since prioritytopoolindex was a single map with both primary and read replica nodes. So this created an intermittent issue where for priority 1, it could either store 127.0.0.2 or 127.0.0.4. If it stored 127.0.0.1, when betterNodeAvailable was called, it would see the node is down and move on to the next node and the test would pass, but when 127.0.0.4 was stored, it would see that the node is higher up in priority even though it was a read replica node and the connections would go to that node and fail.

The priority map is separated now to primary and RR.

7f01b63

Source Code Changes

  • src/Npgsql/ClusterAwareDataSource.cs

    • New _savedConnectionCounts dictionary — a Dictionary<string, int> keyed by host name that preserves connection counts when the pool-to-connection maps are cleared during a Refresh() cycle as well as when a higher priority nodes comes back up.
    • GetLoad() fallback to saved counts — when a host is not found in either poolToNumConnMapPrimary or poolToNumConnMapRR, the method now checks _savedConnectionCounts before returning 0. This ensures the load balancer still sees the correct load for fallback hosts whose pool entries were temporarily removed.
    • UpdateConnectionMap() — handle pools missing from both maps — after a refresh, a pool object may no longer exist in either connection map because the maps were rebuilt with new pool instances. The method now:
      • On increment: re-inserts the pool into the correct map (primary vs. read-replica) by consulting _hostsToNodeTypeMap.
      • On decrement: if the pool isn't in either map, decrements the count in _savedConnectionCounts instead, and removes the entry once it reaches zero.
    • Prevent duplicate entries in unreachable host lists — added Contains() guard before Add() for both unreachableHostsIndices and unreachableHosts in two code paths (primary and round-robin connection loops).
  • src/Npgsql/TopologyAwareDataSource.cs

    • Refresh() — save connection counts before clearing maps — before calling Clear() on poolToNumConnMapPrimary and poolToNumConnMapRR, the method now copies each pool's current connection count into _savedConnectionCounts keyed by host name.
    • CreatePool() — restore connection counts for existing hosts — when CreatePool() encounters a host that already has a pool (the flag == 1 / continue path), it now restores the saved connection count from _savedConnectionCounts into the appropriate map (primary or RR) and removes the entry from the saved map.

Test Changes

  • Cluster setup reliability (6 test files)
    • Added yb-ctl destroy before every yb-ctl create/start in CreateCluster() / CreateRRCluster() across YBClusterAwareRRSupportTests, YBFallBackTopologyExtended, YBFallbackTopolgyTests, YBLoadBalancerTests, YBPoolingWrapperTests, and YBTopologyAwareRRSupportTests. Ensures a clean cluster state even if a previous test run crashed without cleanup.
  • YBFallbackTopolgyTests.cs
    • CloseConnections() now calls NpgsqlConnection.ClearAllPools() + 1s sleep after closing connections. With pooling enabled, conn.Close() returns the connection to the pool rather than closing the physical TCP connection; ClearAllPools() forces physical disconnection so rpcz-based verification sees 0.
  • YBFallBackTopologyExtended.cs
    • Increased numConnections from 12 to 18; added sleeps after topology changes for refresh to fire; fixed expected counts to match corrected load-balancing behavior; changed checkNodeDownPrimary to public; uses a single-host connection string with save/restore.
  • YBTopologyAwareRRSupportTests.cs
    • Fixed connection string: preferrr → preferprimary in TestPreferPrimaryAllPrimaryNodesDown.
    • Both prefer-primary and prefer-RR tests now establish initial connections before stopping nodes to exercise the saved-connection-count mechanism.
    • Fixed conns.Concat(...) → conns.AddRange(...) (LINQ Concat returns a new enumerable without mutating the list, so connections were silently discarded).
    • Added YB Servers Refresh Interval=10 to prefer-RR test.
  • YBPreparedStatementsTest.cs
    • Now extends YBTestUtils; added [OneTimeSetUp]/[OneTimeTearDown] for cluster and northwind DB lifecycle; added DROP TABLE IF EXISTS for idempotency.

@Sfurti-yb Sfurti-yb requested a review from ashetkar December 16, 2025 15:49
@Sfurti-yb Sfurti-yb self-assigned this Dec 16, 2025
@ashetkar
Copy link
Copy Markdown
Collaborator

What happens when the number of unique connection strings are more than the number of connectors (i.e. number of connections in a pool)?
e.g. Let's say the pool size is set to 2

  1. For connstring1, connector1 is created
  2. For connstring2, connector2 is created
  3. For connstring3, it cannot create 3rd connector since the limit is 2. What happens in this case?

Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
@Sfurti-yb
Copy link
Copy Markdown
Collaborator Author

Sfurti-yb commented Dec 24, 2025

What happens when the number of unique connection strings are more than the number of connectors (i.e. number of connections in a pool)?

Since we call the OpenNewConnector of the Pooling datasource first from inside our OpenNewConnector, it has a check in there for max connections to a pool, if that is false, it just throws an error there itself.

@ashetkar
Copy link
Copy Markdown
Collaborator

What happens when the number of unique connection strings are more than the number of connectors (i.e. number of connections in a pool)?

Since we call the OpenNewConnector of the Pooling datasource first from inside our OpenNewConnector, it has a check in there for max connections to a pool, if that is false, it just throws an error there itself.

So this behaviour would be different from that of the upstream driver.
We'll have to figure out what is the effort involved to ensure the behaviour is the same and take it up either as part of this PR or separately. Let's discuss this further offline.

Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread src/Npgsql/PublicAPI.Unshipped.txt Outdated
Comment thread src/Npgsql/YBPoolingWrapperDataSource.cs
Copy link
Copy Markdown
Collaborator

@ashetkar ashetkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not run the tests on my system due to different reasons related to my env, but the changes look fine with some minor comments.

Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
@ashetkar ashetkar self-requested a review January 8, 2026 14:34
@ashetkar
Copy link
Copy Markdown
Collaborator

ashetkar commented Jan 8, 2026

Please hold, I'm now trying these on my dev-server, will update once completed.
Meanwhile, you can address the above comments.

Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Copy link
Copy Markdown
Collaborator

@ashetkar ashetkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the tests and one test case failed for me. Need to run this from the Npgsql.Tests directory.

dotnet test --filter "FullyQualifiedName=YBNpgsql.Tests.YBPoolingWrapperTests.TestPoolingForMultipleConnStringsMultiThread" --framework net9.0

Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs Outdated
Comment thread test/Npgsql.Tests/YBPoolingWrapperTests.cs
Copy link
Copy Markdown
Collaborator

@ashetkar ashetkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing some test failures, please look into those as well.

Comment thread test/Npgsql.Tests/YBPreparedStatementsTest.cs
{
if (ex.Message.Equals("No suitable host was found", StringComparison.OrdinalIgnoreCase))
Console.WriteLine("Expected Failure:" + ex.Message);
Console.WriteLine("Caught expected exception:" + e.Message);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there no need to check for the exact message string now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was initially just printing errors for NoSuitableHostException, instead of all errors

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean all NpgsqlExceptions were and are expected?

{
if (ex.Message.Equals("No suitable host was found", StringComparison.OrdinalIgnoreCase))
Console.WriteLine("Expected Failure:" + ex.Message);
Console.WriteLine("Caught expected Exception:" + e.Message);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

Comment thread test/Npgsql.Tests/YBClusterAwareRRSupportTests.cs Outdated
Comment thread test/Npgsql.Tests/YBTestUtils.cs Outdated
Comment thread test/Npgsql.Tests/YBTestUtils.cs Outdated
Comment thread test/Npgsql.Tests/YBTestUtils.cs Outdated
@HarshDaryani896 HarshDaryani896 self-assigned this Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@ashetkar ashetkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, subject to perf numbers being fine

@HarshDaryani896
Copy link
Copy Markdown

Perf tests results:

  • Without Pooling

    • Single Threaded:
      • Upstream driver: Average connection creation time: 1806 ms
      • YB driver:  Average connection creation time: 1825 ms
    • Multi Threaded:
      • Upstream driver: All 10 threads completed their tasks in 533 ms
      • YB driver: All 10 threads completed their tasks in 528 ms
  • With Pooling:

    • Single Threaded:
      • Upstream driver: Average connection creation time: 0 ms
      • YB driver:  Average connection creation time: 0 ms
    • Multi Threaded:
      • Upstream driver: All 10 threads completed their tasks in 2 ms
      • YB driver: All 10 threads completed their tasks in 2 ms.

@HarshDaryani896 HarshDaryani896 merged commit 8b0883a into 9.0.2.x Apr 13, 2026
2 of 12 checks passed
@HarshDaryani896 HarshDaryani896 deleted the 9.0.2.x-pooling-bug-fix branch April 13, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants