Skip to content

Connection Storm Prevention with Jitter in Connection Pool #1495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
priyanshu-d11 opened this issue Mar 6, 2025 · 6 comments · May be fixed by #1496
Open

Connection Storm Prevention with Jitter in Connection Pool #1495

priyanshu-d11 opened this issue Mar 6, 2025 · 6 comments · May be fixed by #1496

Comments

@priyanshu-d11
Copy link

priyanshu-d11 commented Mar 6, 2025

Problem

When running multiple instances of an application that uses PgPool, we observed a connection storm issue that leads to uneven load distribution across database readers.

Scenario

  • 10 application instances
  • Each instance: 2x large with 8 verticles
  • Database setup: Multiple read replicas behind PgPool
  • Issue timing: Application startup

Current Behavior

When multiple application instances start simultaneously:

  1. All verticles attempt to establish database connections at nearly the same time
  2. This creates a "connection storm" where most connections are directed to the same reader
  3. Results in skewed connection distribution across available readers
  4. Some readers become overloaded, while others are underutilized

Example from Load Test

In our load test with 10 instances (8 verticles each):

  • Most connections were established to Reader-1
  • Created uneven load distribution
  • Impacted application performance
  • Reduced the effectiveness of having multiple readers

Attaching a screenshot of connections made on readers. One of the readers has 227 connections, and the other has 72.

Image

Proposed Solution

Introduce connection jitter to randomize connection timing:

  1. Add jitter parameter to PoolOptions
  2. When setting maxLifetime, apply random jitter within the specified range
  3. This spreads out connection creation/renewal across a time window

Implementation

Contribution

#1496

@vietj
Copy link
Member

vietj commented Mar 6, 2025

are you using a single shared pool or multiple connection pools ?

@priyanshu-d11
Copy link
Author

Multiple Connection pools; We are using multiple clients, each with its own connection pool.
Our current setup is such that each VertX application has N instances of a Verticle, and each verticle has one postgres client object.

@vietj
Copy link
Member

vietj commented Mar 6, 2025

have you tried using a single shared pool ?

@chirag-manwani
Copy link

I would say even with 1 pool with N times pool size (N = num verticles), the issue would still remain.
Connections would still open at the same time given the same conditions, and also close at the same time since the max lifetime is the same for all.

@chirag-manwani
Copy link

It is not much of a problem if the connections are not used often due to less load, and there is randomness in when they are created, but if there is substantial load connections are almost always in use, then their lifecycles become synced.

Due to the nature of how AWS reader endpoints work (DNS load balancing), it poses an issue if all connections are created around the same time. Unless there is a DB Proxy in front of the DB, the connection skew issue will persist.

@vietj
Copy link
Member

vietj commented Mar 6, 2025

I see, thanks for your explanations, looking forward to review the pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants