IOCP Server 1.1 Released

While stressing a TCP server application, I found a nasty bug with the IOCP server library.

After handling 100,000 connections or so, the TCP server stops accepting connections. The output from TCPView shows that clients are still trying to connect to the server, but the connection was never established.

I was able to verify that all existing connections are unaffected. Therefore, the IO completion port is still functional. So I concluded that it is not a non-page pool issue, and has something to do with the handling of the accept completion status.

The Cause

The bug is simple, but it takes half a day to reproduce. Here’s the code snippet that causes the problem.

void CWorkerThread::HandleAccept( CIocpContext &acceptContext, DWORD bytesTransferred )
{
	// Update the socket option with SO_UPDATE_ACCEPT_CONTEXT so that
	// getpeername will work on the accept socket.
	if(setsockopt(
		acceptContext.m_socket,
		SOL_SOCKET,
		SO_UPDATE_ACCEPT_CONTEXT,
		(char *)&m_iocpData.m_listenSocket,
		sizeof(m_iocpData.m_listenSocket)
		) != 0)
	{
		if(m_iocpData.m_iocpHandler != NULL)
		{
			// This shouldn't happen, but if it does, report the error.
			// Since the connection has not been established, it is not
			// necessary to notify the client to remove any connections.
			m_iocpData.m_iocpHandler->OnServerError(WSAGetLastError());
		}
		return;
	}
	... // more code here
	acceptContext.m_socket = CreateOverlappedSocket();
	if(INVALID_SOCKET != acceptContext.m_socket)
	{
		PostAccept(m_iocpData);
	}
	... // more code here

See that innocent little “return” statement when setsockopt() fails, I foolishly concluded that “This shouldn’t happen”. And naturally, since it should never happen, I never thought about properly handling the error case.

Apparently in the real world, some connections comes and goes so quickly that immediately after accepting the connection, it has already been disconnected. setsockopt() would fail with error 10057, and the return statement causes the “accept chain” to break.

The fix is to remove the “return” statement and move on with life.

Others

Along with this fix, I also removed an unnecessary event per Len Holgate’s suggestion. However, I have not yet removed the mutex in ConnectionManager. This require a slight redesign, and a bit more thoughts.

I can see myself maintaining this library for awhile, so I created a Projects page to host the different versions.

Download

For latest version, please see the Projects page.

2 thoughts on “IOCP Server 1.1 Released

  1. If it had been a non-paged pool issue then you’d likely have seen lots of reads and writes failing with WSAENOBUFS… Assuming you’re using a modern OS (vista or later) with a decent amount of memory you’re unlikely to see non-paged pool exhaustion as the amount of non-paged pool available is considerably larger than on pre vista OS’s…

    However, what happens if an overlapped accept returns an error via the IOCP? You need to cater for this. Since this is usually only for low resource situations (WSAENOBUFS – non-paged pool or locked i/o pages limit) it’s best not to post a new accept straight away as that will likely fail too; but if you stop posting accepts you wont accept any more connections…

    I talk about my solution to this on my blog, here: http://www.lenholgate.com/archives/000559.html

    • Len,

      Although I developed the library under Window 7 64bit, the network application I am developing is only being used under Window XP 32bit. Therefore, I suspect that I will run into the infamous non-page pool issue sooner or later. Honestly, I have not pushed the library hard enough to see the non-page pool issue. The nature of the network application I am developing has the following behavior.

      1. Low number of concurrent connection ( less than 1000)
      2. High number of short lived connection. (~200 new connections a second)
      3. High receive throughput (sustainably greater than 100 Mbps)

      Thanks for your suggestion. I will investigate the non-page pool limitation over the X’mas break.

      … Alan

Leave a comment