Recently, I have been playing around with reader-writer (RW) locks. I have never encountered RW locks in practice, but I have read that they could be inefficient in practice, and often results in more harm than good.
Recall that traditional mutex ensures that only one thread may enter a critical region. But if the critical region is being written infrequently, it is possible to exploit this concurrency by allowing multiple reader with RW locks.
So when exactly should RW lock be used in place of traditional mutex? To answer this question, I wrote a benchmark program to understand the scalability of RW locks.
Boost shared_mutex Benchmark
Since C++ is my primary programming language at work, I started by picking on shared_mutex of the Boost threading library.
In my benchmark, I focus primarily on two variables – the writer frequency, and the hold time of the mutex.
For implementation, there are 4 worker threads (for my quad-core CPU) working with a critical region that approximate e. At each iteration, one of the threads has a certain probability to become a writer. My goal is to see the performance change as the writing frequency increases.
And to control the hold time of the mutex, each thread will performs a certain number of iterations called E. As E becomes larger, the hold time of the mutex increases.
At E = 1, even when there are zero contention, the overhead completely wipes out any performance gain of the concurrent readers.
E=1 shows the overhead of shared_mutex
At E = 50, the longer hold time pays off slightly under low contention. However, the performance degrades rapidly as contention increases.
E=50, longer hold times allows shared_mutex to scale slightly better.
As you can see, the results are very disappointing. Boost shared_mutex only offers performance gain under extremely low contention with large hold time. The large hold time is unrealistic in practice because most programmers are taught to minimize their critical region.
SRW Lock Benchmark
Since Vista, Microsoft has released a new set of synchronization API called Slim Reader Writer (SRW) Locks. These locks are heavily optimized for performance, but can’t be lock recursively (I hate recursive lock anyway), and is not upgradable.
I was curious to see if SRW performs any better, so I added SRW into my benchmark.
SRW outperforms Boost mutex and shared_mutex even under the shortest hold time.
At longer mutex hold time, SRW degrades similarly to shared_mutex with a lower overhead.
Although SRW offers similar scalability compare to boost shared_mutex, it has lower overhead, outperforms boost shared_mutex in almost all cases.
After looking into the implementation of boost shared_mutex, I realize that its lock-free algorithm is complex and tracks many states. This implementation has so much overhead that it is impractical.
SRW offers has far lower overhead, and can be useful under low contention. Unfortunately, it is only available for Vista and beyond.
Neither mutex type offer real performance advantage when contention goes beyond 2%. Somehow, I speculate that Amdahl’s Law is playing a part here. The chart looks very much like the inverse of speedup graph I plotted last year.
The source and datasheet can be download here.
Tools: Visual Studio 2008 (VC9), Boost 1.45
Machine Specification: Intel i5-750 with 4GB of RAM. Window 7 64bit.