Monday, May 9, 2016

What is Connection Pooling and why it is Used?

The server hosts more than 50 databases that serve an ASP.NET application hosted on some servers in a web farm. These servers issue connections to the databases in a distributed manner to balance the web application load. I tried to discover what these connections were doing and to what databases they were connected. Connections grouped by database:
select  db_name(dbid) , count(*) 'connections count'
from master..sysprocesses
where spid > 50 and spid != @@spid
group by  db_name(dbid)
order by count(*) desc
This showed some databases having more than 300 connections associated with them.
What about logins used?
select  loginame , nt_username, count(*) 'Connections count'
from master..sysprocesses
where spid > 50 and spid != @@spid
group by  loginame , nt_username
order by count(*) desc
This showed a mix of windows domain accounts (those with values in column nt_username, e.g: domain\user) beside SQL authentication accounts (those with column nt_username empty, e.g: “sa”).
In order to reduce the number of times that new connections must be opened, applications may use connection pooling. This was clearly not the case here, and all these connections resulted in what is known as “pool fragmentation”.
So what’s connection pooling? how does it work, what can cause pool fragmentation and how can we avoid/reduce it?

Connection pooling defined

A connection pool is a set of idle, open, and reusable database connections maintained by the database server so that the connections can be reused when the database receives future requests for data, instead of exclusively opening a new connection. In our case, this connection pool is maintained by ADO.NET and is used to communicate with SQL Server.
The benefit of connection pooling, is that connections placed in the pool and are used over again so that a new connection does not have to be established, reducing amount of time needed to establish connections.
Also, opening and closing of connections to the database becomes less resource-expensive in terms of CPU and memory.

How connection pooling works

When a connection is first opened, a connection pool is created based on matching criteria that associates the pool with the connection string in the connection. Each connection pool is associated with a distinct connection string. If the connection string is not an exact match to an existing pool when a new connection is opened, a new pool is created. Connections are pooled per process, per application domain, per connection string, and, when integrated security is used, per Windows identity.
Whenever a user calls “open” on a connection with a matched connection string, the “pooler” looks for an available connection in the pool that corresponds to that connection string. If a pooled connection is available, it returns it to the caller, otherwise it will add a new connection to the pool, up to the maximum pool size specified (100 is the default). When the application calls “close” on the connection, instead of closing the connection, the pooler returns it to the pooled set of active connections. Once the connection is returned to the pool, it is ready to be reused on the next “open” call.
When the connection is closed or disposed, it is returned to the pool and remains idle until a request for a new connection comes in. The connection pooler removes a connection from the pool after it has been idle for a long time.
So how does it go in ADO.net? In the following C# code, we will open five connections to SQL Server, but only three connection pools are required to manage them.
SqlConnection sqlcon = new SqlConnection();

    string constring = "Data Source=.\\SQL2k8x;Connect Timeout=5;";

    try
    {
        sqlcon.ConnectionString = constring + "database = DB1;Integrated Security=true";
        sqlcon.Open();   // Pool 1 is created.
        sqlcon.Close();  // connection is closed , returned to pool 1

        sqlcon.ConnectionString = constring + "database = DB2;Integrated Security=true";
        sqlcon.Open();   // Pool 2 is created , another database is used
        sqlcon.Close();  // connection is closed , returned to pool 2

        sqlcon.ConnectionString = constring + "database = DB1;Integrated Security=true";
        sqlcon.Open();   // Pool 1 is Used , same connection string as when pool 1 was created.
        sqlcon.Close();  // connection is closed , returned to pool 1

        sqlcon.ConnectionString = constring + "database = DB1;Uid=sa ;Pwd=password";

        sqlcon.Open();   // Pool 3 is created , SQL authentication account is used even if same database as Pool 1.
        sqlcon.Close();  // connection is closed , returned to pool 3

        sqlcon.ConnectionString = constring + "database = DB2;Integrated Security=true";
        sqlcon.Open();   // Pool 2 is Used , same connection string as when pool 2 was created.
        sqlcon.Close();  // connection is closed , returned to pool 2

    }
    catch (Exception ex)
    {
        MessageBox.Show("Error connecting to SQL server.Message : " + Environment.NewLine +      
                         ex.Message.ToString(), "Error connecting to SQl instance");
        return;
    }
If we query the SQL Server instance to get the number of connections opened, the result returned will be three connections where the program_name column is “.Net SqlClient Data provider”
select * from master..sysprocesses where spid > 50 and spid  @@spid
But, if we are connecting to the same instance and closing the connection directly after using it, why there isn’t one connection pool created?

What causes pool fragmentation?

Since connection pooling is created for each distinct connection string, there will be as many different connection pools as the number of connection strings used, increasing the number of connections opened and number of pools managed.
This can happen under two conditions:
  1. Integrated security is used.Connections are pooled according to the connection string plus the user identity. Thus, if integrated security is used to connect to SQL server, you get one pool per user. Although this improves the performance of subsequent database requests for a single user, that user cannot take advantage of connections made by other users. It also results in at leastone connection per user to the database server.
    This is a trade-off between manageability and logging/auditing, since you may like to use different accounts to connect to the database server to audit activity on the sever based on the account used in the connection.
  2. The instance has many databases.This is significantly the case here. If the application extracts data from many databases and makes explicit calls to them, then there will be a separate pool of connections to each database, thus increasing the number of connections to the server. Fortunately, there is a relatively simple way to avoid this side effect. I’ll get to that soon.
Now I will show a demo of an application that causes pool fragmentation and possible ways to avoid it. I have written it in C# using Visual Studio 2008.
In order to use the application, we need to create ten databases on the instance, each database containing once table. Here’s the code:
DeclARE @i int , @sql nvarchar (500)

select @i=0 

while @i < 10

begin

exec ( 'CREATE database pooling' + @i )

select @sql = ' 
 use pooling'+cast (@i as varchar(20))

select @sql = @sql + '

create table pooltest'+cast (@i as varchar(20))+' ( col1 int) 

'
--print @sql
exec sp_executesql @sql

set @i= @i + 1

end
The application connects to the databases and returns list of tables in each database, besides returning ADO.net connection pooling counters. These counters can also be captured with performance monitor (perfmon) using object “.net data provider for Sqlserver”. The old ADO.NET 1.1 object is called “.NET CLR Data”. Note that counters for the object are created only after making at least one connection, so if you attempted to add them without making any connections, you will find them dimmed.
For more information about these counters, look here: Using ADO.NET Performance Counters at MSDN’s Visual Studio 2005 Developer Center. I have used some of the code at this URL inside the application to display counters.
The application loops and makes exclusive connections to each database
Here is the code that makes connections attempts:
string strSQL = "select db_name() + '  -------  ' + name 'table' from sys.tables"; 

for (int i = 0; i < 10; i++)
{

 sqlcon.ConnectionString = "Data Source=" + server.Text.Trim() + ";DataBase=pooling" + i.ToString() + ";" + "Integrated Security=true;Connect Timeout=5";

 if (checkBox1.Checked == true) // SQL autentication used
    {
        sqlcon.ConnectionString = "Data Source=" + server.Text.Trim() + ";DataBase=pooling" + i.ToString() + ";" + "Uid=" + username.Text.Trim() + ";Pwd=" + password.Text.Trim() + ";Connect Timeout=5";
    }

    try
    {
        sqlcon.Open();
        SqlCommand DBCmd = new SqlCommand(strSQL, sqlcon);
    }
We will connect to SQL server once using integrated security and once using SQL authentication:
Running using integrated security
Here we are going to use integrated security to connect to the instance and make an explicit connection to each database, creating 10 connection pools and 10 pooled connections. See the following image.
conn_pooling_win_user.JPG
Running using SQL authentication
Here application produces a similar result, except for NumberOfActiveConnectionPoolGroups because it is cumulative. We get 10 connection pools and 10 pooled connections, but 20 connection pool groups.
Running . . .
select program_name , hostprocess , nt_username  from master..sysprocesses where spid >50 and spid @@spid
. . . returns 20 connections where the program_name column is “.Net SqlClient Data provider” for which each instance of the application is responsible for 10 connections.
conn_pooling_sql_user.JPG
Note that you can get different counter based on the connections already established on the instance.
This data can also be shown from performance monitor, but you must add the counters after making the connections and the application should be left open. The performance monitor distinguishes among different instances of the same application by using process Id (pid in task manager processes tab), The process id is logged in the application main window.
Here’s how it looks in performance monitor:
perfmon_data1.JPG
Explanation
Running the first instance of the application with windows security produced 10 pooled connections, 10 active connection pools, and 10 active pool groups. (This happens at point 1 in the above image.)
Running the second instance of the application with SQL authentication (and then adding the counter) shows 10 pooled connections, 10 active pools, but 20 active pool groups. (Point 2.)
Changing the first instance of the application to use SQL authentication while the application is still open sent the number of connection pools and pooled connection for this application with process id [5504] to rise to 20. They are running side to side that you can’t see the yellow line for activeconnectionpools. (Point 3.)
The same happened when changing the second instance of the application to use integrated security while the application is still open (both instances exchanged the connection method). This also sent the number of connection pools and pooled connections for this application instance (804) to rise to 20. (Point 4.)
Note that number of active connections is always zero, as we close the connection directly after retrieving data. This is a best practice.
Leaving the application open for a while but without activity results in connection pools and connections being closed because they are inactive for a time. When I closed both application instances, all counters dropped to zero and shortly the red perfmon cursor stopped, point 1 in following image.
perfmon_data2.JPG
Now we know what pool fragmentation is and demonstrated it.

How can we reduce or prevent pool fragmentation?

We can address pool fragmentation for the two previously outlined causes as follows.
Pool fragmentation due to integrated security
If it is possible to reduce the number of integrated security accounts used with applications, that in turn will reduce the number of connection pools and thus the overhead of managing them, increasing the performance of SQL Server.
It is also to be preferred when possible to use Windows Authentication mode, as it uses the Kerberos security protocol, which provides password policy enforcement with regard to complexity validation for strong passwords, and provides support for account lockout and password expiration.
Pool fragmentation due to many databases
There is a tiny and easy to apply trick here: instead of connecting to a separate database for each user or group, connect to the same database on the server and then execute the Transact-SQL USEstatement to change the context to the desired database.
For example, instead of this code:
string strSQL = "select db_name() + '  -------  ' + name 'table' from sys.tables"; 
sqlcon.ConnectionString = "Data Source=" + server.Text.Trim() + ";DataBase=pooling" + i.ToString() + ";" + "Integrated Security=true;Connect Timeout=5";                 sqlcon.Open();
SqlCommand DBCmd = new SqlCommand(strSQL, sqlcon);
we can use this:
string strSQL = " use pooling" + i.ToString() + " ; select db_name() + '  -------  ' + name 'table' from sys.tables";

sqlcon.ConnectionString = "Data Source=" + server.Text.Trim() + ";DataBase=master;" + "Integrated Security=true;Connect Timeout=5";

SqlCommand DBCmd = new SqlCommand(strSQL, sqlcon);
We will connect always to the master database and then use the USE keyword to change context to the user databases.
Now let’s test this. From the demo application, we will just check the combo-box with the label Use one connection and “USE” keyword to query databases. This will set the connection string to connect to the master database and later to branch to other databases using the USE keyword.
Data from the performance monitor shows that the number of connection pools and the number of pooled connections didn’t exceed 1 at initial run, and increased to 2 only when changing connection credentials and the “open connections” button is pressed again. You can verify the number of connections by querying “sysprocesses” in master database.
Here is how it looks in performance monitor:
perfmon_data3.JPG
Explanation
When running the first instance of the application with windows security, we had 1 connection pool, 1 pooled connection, and 10 active pool groups. (This happens at point 1 in the above image.)
Running the second instance of the application with SQL authentication (and then adding the counter in perfmon) shows 1 pooled connection, 1 active pools, but 20 active pool groups (they are cumulative). (Point 2.)
Changing the first instance of the application to use SQL authentication while the application is still open sent number of connection pools and pooled connection for this application instance (pid:4808) only to 2. (Point 3.)
The same happened when changing the second instance of the application to use integrated security while the application was still open (both instances exchanged connection method). This also sent number of connection pools and pooled connection for this application instance (pid:5800) only to 2. (Point 4.)
Obviously there is improvement, and with environments with heavy connection loads and more databases, this improvement can be very significant — reduced time to fetch data and fewer resources to manage connections. Opening a connection from scratch is an expensive process.
To cut a long story short, we can summarize this post in few points:
  1. Connection pooling is a great feature to increase your application performance and scalability.
  2. Open connections as late as possible and close them as early as possible to release resources as fast as possible.
  3. In order to reduce overhead of managing connections to server, avoid connection pooling fragmentation as much as you can.
  4. Ensure that there is at least one connection open in the pool to maintain the connection pool, this can be done using the “Min Pool Size” property in ConnectionString. For more info, seeSqlConnection.ConnectionString Property at msdn’s .NET Framework Developer Center.

What is a Connection Pool?

A connection pool is a group of database connections (with same connection properties) maintained in the application server so that these connections can be reused when future requests to the database are required.Creating a connection to the database servers is a  time consuming process.To establish a connection to the database server , a physical  channel such as socket or named pipe must be established , the connection string information to be parsed, it should be authenticated by the server and so on.Connection pool helps to reuse the established connection to serve multiple database request and hence improve the response time. In a practical world, most of the application use only one or a few different  connection configuration.During the application execution, many identical connections will be opened and closed. To minimize the cost of opening connections each time,ADO.NET uses the technique called Connection Pooling. Many people has the misunderstanding that, connection pool is managed in the database server.
Connection pooling reduces the number of times that new connection must be opened.The pooler maintains ownership of the physical connection. Whenever a user calls open on a connection, the pooler looks for an available connection in the pool. If a pooled connection is available, it returns it to the caller instead of opening a new connection. When the application calls close on the connection, the pooler returns it to the pool instead of closing it. Once the connection is returned to the pool, it is ready to reused on the next open call.

How the Connection Pool Works ?

Many people are not very familiar with the  connection pool and how it works. This is because connection pooling is a default property of ADO.NET.We do not have to do anything special to enable the connection pooling.It can be enabled or disabled  by setting the value true or false for connection string  property Pooling  (;pooling =true). Connection pool is tightly coupled with connection string . A slight change in the connection string (change in the case / change in the order of connection property ) will force the ADO.NET to open a new connection pool.Every pool is associated with distinct connection string. ADO.NET maintain separate connection pool for each distinct application ,process and connection string. When  first time application request for a connection , ADO.NET look for any associated connection pool. As it is a first request, there will not be any connection pool and ADO.NET negotiate with the database server to create fresh connection.When application close/dispose this connection after completing the process, the connection will not get closed instead it will be moved to connection pool.When application request for next connection using the same connection string, ADO.NET return the context of the the open connection which is available in the pool.If  second request from application comes in before the first request closed/disposes the connection , ADO.NET create a fresh new connection and assigned to the second request.


The behavior of connection pooling is controlled by the connection string parameters. Below are the list  of parameters that controls the behavior of connection pooling.
  • Connection Timeout : Control the wait period in seconds when a new connection is requested,if this period expires, an exception will be thrown. Default value for connection timeout is 15 seconds.
  • Max Pool Size: This specify the maximum number of connection in the pool.Default is 100.
  • Min Pool Size : Define the initial number of connections that will be added to the pool on opening/creating the first connection.Default is 1
  • Pooling : Controls the connection pooling on or off. Default is true.
  • Connection Lifetime : When a connection is returned to the pool, its creation time is compared with the current time, and the connection destroyed if that time span (in seconds) exceed the value specified by connection lifetime  else added to the pool. This parameter does not control the lifetime of connection in the pool.It is basically decides whether the connection to be added to pool or not once the it got closed by the caller application.A lower value 1 may be equivalent to a state of pooling is off. A value zero cause pooled connection to have the maximum lifetime. 

Connection Leakage

At times, connections are not closed/disposed explicitly, these connections will not go to the pool immediately. We can explicitly close the connection by using Close()  or Dispose() methods of connection object or by using the using statement in C# to instantiate the connection object. It is highly recommended that we close or dispose the connection once it has served the purpose.

Connection leakage will happen in the scenarios where the application is not closing the connection once it is served the request. As it is not closed , these connections can not be used to serve other request from the application.and pooler forced to open new connection to serve the connection requests. Once the total number of connection reaches the Max Pool Size,new connection request wait for a period of Connection Timeout and throw below error.

"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached"

Pool Fragmentation

Pool fragmentation is a common problem in many applications where the application can create a large number of pools that are not freed until the process exits. This leaves a large number of connections open and consuming memory, which results in poor performance.

Pool Fragmentation due to Integrated Security

Connections are pooled according to the connection string plus the user identity. Therefore, if application  use  Windows Authentication  and an integrated security to login, you get one pool per user. Although this improves the performance of subsequent database requests for a single user, that user cannot take advantage of connections made by other users. It also results in at least one connection per user to the database server. This is a side effect of a particular application architecture that developers must weigh against security and auditing requirements.

Pool Fragmentation due to Many Databases used by same application

Many application may use a single database to authenticate the application login and then open a connection to a specific database based on the the user role / region. The connection to the authentication database is pooled and used by everyone. However, there is a separate pool of connections to each database, which increase the number of connection to the database server.This is also a side effect of the application design.The simple way to get rid of this issue with out compromising the security is to connect to the same database always (may be master database) and run USEdatabasename statement  to change the database context to desired database.

Clearing the Pool

ADO.NET have two methods to clear the pool: ClearAllPool() and ClearPool()ClearAllPools clears the connection pools for a given provider, and ClearPool clears the connection pool that is associated with a specific connection. If there are connections being used at the time of the call, they are marked appropriately. When they are closed, they are discarded instead of being returned to the pool.

How to Monitor the connection Pool ?

The connection pool can be monitored using the performance counters in the server where the ADO.NET is initiating the connections.While selecting the counter ,make sure to select the right instance based on your application name and process id which is there in the bracket. Process id of your application can easily get from the task manager. Find below a snapshot of perfmon counters.





The below code snippet will help you to understand the connection pooling in much better way. Do not comment on the slandered of the code snippet !  I am not an expert in writing vb/.net code

Imports System
Imports System.Data.SqlClient

Module Module1
    
Private myConn As SqlConnection
    
Private myCmd As SqlCommand
    
Private myReader As SqlDataReader

    
Private myConn1 As SqlConnection
    
Private myCmd1 As SqlCommand
    
Private myReader1 As SqlDataReader

    
Private myConn2 As SqlConnection
    
Private myCmd2 As SqlCommand
    
Private myReader2 As SqlDataReader

    
Private StrConnectionString_1 As String
    Private 
StrConnectionString_2 As String
    Private 
query As String
 
    Sub 
Main()


        
'Two connction string which help us to create two different pool
        'The Application Name is mentioned as ConnectionPool_1 and ConnectionPool_2 to identify the connection in sql server
        
StrConnectionString_1 "Server=XX.XX.XX.XX;user id=" "connectionpool" ";password=" "connectionpool" +";database=master;packet size=4096;application name=ConnectionPool_1"
        
StrConnectionString_2 "Server= XX.XX.XX.XX ;user id=" "connectionpoo2" ";password=" "connectionpool" +";database=master;packet size=4096;application name=ConnectionPool_2"


        
query "select * from sys.objects"


        
'STEP :1
        'Opening a connection first connection string and excuting the query after it served closing the connection
        
myConn = New SqlConnection(StrConnectionString_1)
        
myCmd myConn.CreateCommand
        myCmd.CommandText 
query
        myConn.
Open()
        
myReader myCmd.ExecuteReader()
        
myReader.Close()
        
myConn.Close()

        
'Now look at the perfmon counters. Numberofactiveconnectionpoll will be 1 and Numberofpooledconenction will be 1
        'In sql server you can see connection is still open even after closing the connetion.You can verify this by querying the sys.dm_exec_connections

        'STEP :2
        'Opening a connection using the second connection string.This will force the pooler to open one more connection pool
        
myConn1 = New SqlConnection(StrConnectionString_2)
        
myCmd1 myConn1.CreateCommand
        myCmd1.CommandText 
query
        myConn1.
Open()
        
myReader1 myCmd1.ExecuteReader()
        
myReader1.Close()
        
myConn1.Close()
        
'Now look at the perfmon counters. Numberofactiveconnectionpoll will be 2 and Numberofpooledconenction will be 2
        'In sql server you can see two active connection one from ConnectionPool_1 and ConnectionPool_2

        'STEP :3
        'Opening a connection again using first connection string. This will be servered by the existing connection created as part of step 1
        
myConn = New SqlConnection(StrConnectionString_1)
        
myCmd myConn.CreateCommand
        myCmd.CommandText 
query
        myConn.
Open()
        
myReader myCmd.ExecuteReader()

        
'Now look at the perfmon counters. Numberofactiveconnectionpoll will be 2 and Numberofpooledconenction will be 2
        'In sql server you can still see only two active connections. one from ConnectionPool_1 and ConnectionPool_2
        'Please note that the connection is not closed


        'STEP :4
        'Opening a connection again using first connection string. This will be forsed to open a new connection as the connection is not closed in Step3 (connection leakage)

        
myConn2 = New SqlConnection(StrConnectionString_1)
        
myCmd2 myConn2.CreateCommand
        myCmd2.CommandText 
query
        myConn2.
Open()
        
myReader myCmd2.ExecuteReader()
        
myConn2.Close()

        
'Now look at the perfmon counters. Numberofactiveconnectionpoll will be 2 and Numberofpooledconenction will be 3
        'In sql server you can see three active connections. two from ConnectionPool_1 and one from ConnectionPool_2
    

        'Closing the connection created as part of Step 3
        
myConn.Close()
        
'Now look at the perfmon counters. Numberofactiveconnectionpolll will be 2 and Numberofpooledconenction will be 3
        'In sql server you can see three active connections. two from ConnectionPool_1 and one from ConnectionPool_2


        'clearing the pool
        
SqlConnection.ClearAllPools()
        
'Now look at the perfmon counters. Numberofactiveconnectionpoll will be 0 and Numberofpooledconenction will be 0. Number of inactiveconnectionpoll will be 2
        'In sql server you can't see any connection from ConnectionPool_1 or ConnectionPool_2


    
End SubEnd Module
 

select * from sys.dm_os_performance_counters
where counter_name ='User Connections'

No comments:

Post a Comment