Cache Miss

Here is what I was taught in my data structure class.

Vector runs in O(n) for find, insert, and erase. O(1) if you only insert and erase at the end. O(1) for random access.
Map uses a balanced tree structure and runs in O(log n) for find, insert, and erase.
There is a constant k in front of the big-O notation. Since k is a constant, O(log n) will beat O(n) after some value of n.

But is it really that simple?

Not So Constant

In Cliff Click’s video on modern hardware, he showed that latency from cache misses is the dominating factor in today’s software performance.

To show this, we can write up a simple function that performs linear search on different sizes of arrays. To keep it fair, we will loop back to the beginning of the array to normalize the number of iterations.

struct OneKBChunk // 1KB, no padding
{
	int key;
	char buffer[1020];
};
bool LinearFind(unsigned int kBytes)
{
	OneKBChunk * chunks = new OneKBChunk[kBytes]; //array of 1KB chunks
	bool found = false;
	int index = 0;
	for(unsigned int i=0; i<10000000; ++i) //arbitrary value
	{
		// prevent compiler from optimizing away the loop
		if(chunks[index].key == 0xdeadbeef)
			found = true;
		// loop back to the beginning of the array
		++index;
		if(index >= kBytes)
			index = 0;
	}
	delete [] chunks;
	return found;
}

Here are the timing on various array sizes.

Red dots mark the different layers of cache.

The processor on my machine is an Intel I5-750. It is a quad-core processor with 32KB of L1, 256 KB of L2, and 8MB of L3.

The graph demonstrates the penalty on different level of cache misses.

We know that linear search runs in k*O(n). In reality, k really depends on n.

Map and Cache Misses

As Raymond Chen said, vector is about the most cache friendly data structure out there. Trees like data structure uses pointers heavily, and have horrible cache locality.

To see the impact of poor cache locality, here is an example that searches the map.

The program is run twice, once without padding, and another with 2KB padding between each insertion. The padding is used to create some distance between each item inserted into the map.

void MapFind(unsigned int kBytes, int timesToRun)
{
	std::map<int, OneKBChunk *> chunks;
	//std::vector<OneKBChunk*> padding;

	for(unsigned int n=0; n<kBytes; ++n)
	{
		chunks.insert(std::make_pair(n, new OneKBChunk));

		//2KB padding in between each insert to the tree
		//for(unsigned int p=0; p<2; ++p)
		//	padding.push_back(new OneKBChunk);
	}
	// run find() arbitrary number of times
	for(unsigned int i=0; i<timesToRun; ++i)
		chunks.find(0xdeadbeef);

	// .. delete padding here
}

The elements of the maps in both run are identical, and were run in same number of iterations.

The graph shows the impact of cache misses on map.

Again, similar to the example before, the constant k isn’t really constant.

In reality, the cache misses with map are far more unpredictable than my simple example. As the program runs longer, the items in the map could end up very far apart in memory.

Conclusion

The constant k includes cache effects, which is not nearly as constant as as how I was taught in my data structure class.

On a side note, my Computer Architecture professor once said “The understanding of cache locality will determine whether you are a $50K programmer or a $100K programmer (2002 dollar)”.

Source

The source and the spreadsheet can be downloaded here.

ASKLDJD

recording a programmer's thought

Vector, Map, and the Constant k

Not So Constant

Map and Cache Misses

Conclusion

Source