After I learned that Compare-And-Swap (CAS) has an infinite consensus number, I felt like I can rule the world with it. Well, maybe not quite, but I felt pretty darn good.
I got the urge to utilize the power of CAS, so I spent some time looking into its usage in the real-world of mutual exclusion.
Just so happens, I bumped into a chapter in the textbook that describes a spin lock algorithm called the Test-And-Test-And-Set (TTAS) lock.
The algorithm is succinct and it is elegantly solved by CAS.
TTASLock
In a mutual exclusion protocol, only one thread can acquire the lock.
If it acquired the lock, it may access the critical region, get its job done, and get out of critical region.
If it can’t acquire the lock, there are two choices. It can repeatedly test the lock – spinning, or it yields its slice of time to another thread and come back later – blocking.
If the contention is low, and spin locks might make sense.
In the test-and-test-and-set algorithm, the mutex is first tested to see if it is free. If it appears to be free, we perform test-and-set (CAS) on mutex.
The two tests idea is similar to the double-checked locking optimization to reduce the CAS overhead.
Here is my implementation of the TTAS lock in C++. It is a scoped lock that utilizes the Win32 Interlocked Variable Access API to perform atomic synchronization.
// atomic read is just a volatile read in C inline LONG atomicRead(volatile LONG *mem) { return *mem; } class CTtasLock { public: // mutex class that holds the lock state class CTtasMutex { public: CTtasMutex() : m_state(0) {}; LONG m_state; LONG const static Locked = 1; LONG const static NotLocked = 0; }; // RAII constructor that acquire lock upon construction explicit CTtasLock(CTtasMutex&m) : m_mutex(m) { lock(); } // RAII destructor that performs the "scoped" lock behavior ~CTtasLock() { unlock(); } // spin lock that performs the test-test-and-set algorithm void lock() { while(true) { // test until state is cleared while(atomicRead(&m_mutex.m_state) == CTtasMutex::Locked){}; // test-and-set if(InterlockedCompareExchange( &m_mutex.m_state, CTtasMutex::Locked, CTtasMutex::NotLocked) == CTtasMutex::NotLocked) { return; } } } // unlock the mutex with atomic set void unlock() { InterlockedExchange(&m_mutex.m_state, CTtasMutex::NotLocked); } private: CTtasMutex &m_mutex; };
Performance Under Contention
So how well does TTASLock work under contention?
Unfortunately, not very well at all. 😦
As the number of threads increases, there are more contentions and more cycles are wasted spinning. And since the TTASLock spins on a shared mutex, each spinning thread encounters a cache miss whenever a thread acquires the lock and writes to the mutex. Thus, the increase of contention also leads to the increase in cache misses.
And worse yet, the first test in TTAS has almost no benefit in practice because the performance is dominated by cache misses.
[Update: See ttas-spinlock-with-yield for a more scalable solution]
Thoughts and Conclusions
Although TTAS lock has performance issues in practice, I enjoy its simplicity.
The cache misses issue can be relieved if each thread observes a different counter, and ensure that the counter are far enough that they are on different cache lines.
Source
The source and the data sheet can be downloaded here.
Tools: Visual Studio 2008, Boost 1.41, Window 7