IOCP Server 1.1 Released

While stressing a TCP server application, I found a nasty bug with the IOCP server library.

After handling 100,000 connections or so, the TCP server stops accepting connections. The output from TCPView shows that clients are still trying to connect to the server, but the connection was never established.

I was able to verify that all existing connections are unaffected. Therefore, the IO completion port is still functional. So I concluded that it is not a non-page pool issue, and has something to do with the handling of the accept completion status.

The Cause

The bug is simple, but it takes half a day to reproduce. Here’s the code snippet that causes the problem.

void CWorkerThread::HandleAccept( CIocpContext &acceptContext, DWORD bytesTransferred )
{
	// Update the socket option with SO_UPDATE_ACCEPT_CONTEXT so that
	// getpeername will work on the accept socket.
	if(setsockopt(
		acceptContext.m_socket,
		SOL_SOCKET,
		SO_UPDATE_ACCEPT_CONTEXT,
		(char *)&m_iocpData.m_listenSocket,
		sizeof(m_iocpData.m_listenSocket)
		) != 0)
	{
		if(m_iocpData.m_iocpHandler != NULL)
		{
			// This shouldn't happen, but if it does, report the error.
			// Since the connection has not been established, it is not
			// necessary to notify the client to remove any connections.
			m_iocpData.m_iocpHandler->OnServerError(WSAGetLastError());
		}
		return;
	}
	... // more code here
	acceptContext.m_socket = CreateOverlappedSocket();
	if(INVALID_SOCKET != acceptContext.m_socket)
	{
		PostAccept(m_iocpData);
	}
	... // more code here

See that innocent little “return” statement when setsockopt() fails, I foolishly concluded that “This shouldn’t happen”. And naturally, since it should never happen, I never thought about properly handling the error case.

Apparently in the real world, some connections comes and goes so quickly that immediately after accepting the connection, it has already been disconnected. setsockopt() would fail with error 10057, and the return statement causes the “accept chain” to break.

The fix is to remove the “return” statement and move on with life.

Others

Along with this fix, I also removed an unnecessary event per Len Holgate’s suggestion. However, I have not yet removed the mutex in ConnectionManager. This require a slight redesign, and a bit more thoughts.

I can see myself maintaining this library for awhile, so I created a Projects page to host the different versions.

Download

For latest version, please see the Projects page.

IOCP Server Library

So I wrote C++ library that provides a scalable TCP server using Windows I/O Completion Port (IOCP).

Couple weeks ago, I started studying IOCP to improve the scalability of a C++ application that may handle thousands of TCP/IP data stream.

It didn’t take long for me to realize why IOCP has the reputation of being difficult to learn. IOCP tutorial online usually fall into the category of difficult to read, overly simplified, or just plain wrong.

Worse yet, Winsock2 contains a mix of confusing APIs that perform very similar functions with subtle differences. I spent a few days just to decide whether I should use WSAAccept, accept or AcceptEx to accept a connection.

Eventually, I stumbled onto two books that helped me out a great deal – Windows Via C++ and Network Programming For Windows.

The Library

The library interface is rather simple. It follows the Proactor design pattern where user supplies a completion handler and event notifications flow through the completion handler asynchronously.

Everyone uses echo server as tutorial. So what the heck, here’s mine. 🙂

class CEchoHandler : public CIocpHandler
{
public:
	virtual void OnReceiveData(
        uint64_t clientId,
        std::vector<uint8_t> const &data)
	{
        // echo data back directly to the connected client
		std::vector<uint8_t> d(data);
		GetIocpServer().Send(clientId, d);
	}
}
void main()
{
    // create a handler that echos data back
	boost::shared_ptr h(new CEchoHandler());
    try
    {
        // bind to port 50000 with the server
        CIocpServer *echoServer = new CIocpServer(50000,h);

        char c;
        std::cin >> c; // enter a key to exit

        delete echoServer;
    }
    // RAII constructor that throws different exceptions upon failure
    catch(...)
    {
    }
}

[10/27/2010 10:34AM EST]
Update: Moved “delete echoServer;” to within the try block per co-worker’s suggestion.

Focus

Of course, there are more to the IOCP server than the code snippet above.

Here are my area of focus when designing the library.

  1. Scalability – By ensuring that there are minimum number of critical section in the library.
  2. TCP Graceful shutdown – Allow user to perform TCP graceful shutdown and simplify the TCP half-closed state.
  3. RAII – A WYSIWYG constructor and a lenient destructor that allows you to do ungraceful shutdown.

Here is a screenshot of the CPU utilization of the echo server at 300 concurrent loopback file transfer.

IOCP Server scalability upon Intel I5-750 (quad-core)

 

License

IOCPServer is released under the Boost Software License 1.0.

Download

For latest version, please see the Projects page.

IOCPServer is tested under the following configurations.

OS: Window XP, Window 7.

Compiler: Visual Studio 2009 with Boost 1.40

Build Type: ANSI, Unicode.