Using Visual Studio PGO as a Profiler

Last year, when our software was running into performance issue, I was desperately looking for a profiler for a large native C++ application. In the past, I’ve tried Rational Purify, and DevPartner and they just could not handle our application (or our machine could not handle the profiler).

So I came across Visual Studio’s Profile Guided Optimization (PGO). In a nutshell, VS compiler uses PGO to optimize the software based on real world scenario, as opposed to the traditional static file analysis. Like you would expect, it consists of three phases – Instrumentation, Training, and PG Optimization.

It turns out that PGO generates an useful profile data from the Training phase. With this profile data, PGO can be used as a lightweight native C++ profiler that provides pretty good code coverage.

The Instruction

PGO is supported from VC8.0 and up. I have tried it on VC9.0 and VC10.0, and the instructions were identical.

Assuming your software can be compiled with Visual Studio, and it is written in native C/C++.

1. Click Build -> Profile Guided Optimization -> Instrument.

2. Click Build -> Profile Guided Optimization -> Run Instrumented/Optimization Application. You will need to exercise the region of the software that you would like to profile. The longer you run it, the more accurate the profile data would be as it averages out the startup overhead.

3. Exit your software. In the folder of your executable (release folder), you should see a xxx.pgd file, and a xxx.pgc file. The pgd file is your profile database that holds all your methods, and the pgc file is the profiling data recorded during the software run.

4. Now open up your Visual Studio Command Prompt. You will probably find it in Start -> Programs ->Microsoft Visual Studio (version) ->Visual Studio Tools.

5. Go to the release folder of your executable. In this step, you need to merge the software run with the profile database. Type pgomgr /merge xxx.pgc xxx.pgd.

6. Once you merged it, you can use the pgomgr to generate a summary of your software run. To do this, type pgomgr /summary xxx.pgd. I recommend piping to output to a text file.

7. The summary file should include the code coverage analysis from your software run.

The summary provides a simple, yet very powerful data on the behavior of your software. It gives you an idea where the hotspots are, and what to optimize.

To find out more about the summary (including the /detail summary), see Kang Su’s blog on “Cracking Profile-Guided Optimization profile data with PGOMGR

Thoughts

Keep in mind that the optimization level of the instrumented build is toned down dramatically. Therefore, the results might not reflect the actual performance in the release build.

In my experience, the instrumented build runs faster than a debug build.

PGO can only instrument DLL and executable. It can not instrument static library.

I have attempted to used PGO to optimize our software. It didn’t turn out too well. Either my machine ran out of memory (4 GB), or the PGO’ed executable didn’t behave properly.

STL String Crashes When HID = 0

Awhile ago, we upgraded our compiler to VC90 (Visual Studio 2008), we found out that Has Iterator Debugging (HID) and Secure SCL in VC9.0 were severely degrading our product’s performance. So we took the time to rebuild everything by disabling those features.

This is done by defining preprocessor macros _SECURE_SCL = 0 and _HAS_ITERATOR_DEBUGGING = 0.

Soon after, we experienced some strange crashes in Debug build that makes no sense to us.

Crashes at std::_Container_base_secure::_Orphan_all()

Here’s one found by my co-worker. [Note: The code was simplified for demonstration purposes]

#define _HAS_ITERATOR_DEBUGGING 0 //turn off Has Iterator Debugging
#include <string>
#include <algorithm>
using namespace std;
int main()
{
	string abc = "abc";

	// Method 1: Crashes upon exit with an access violation
	string dot_abc = "." + abc;

	// Method 2: Works
	//string dot_abc = string(".") + abc;

	string buffer = ".abc";

	// Works without the search call here
	search(buffer.begin(), buffer.end(), dot_abc.begin(), dot_abc.end());

	return 0;
}

If you choose Method 1, you will get an access violation upon the destruction of the string class.

msvcp90d.dll!std::_Container_base_secure::_Orphan_all()  Line 223 + 0x5 bytes    C++
msvcp90d.dll!std::_Container_base_secure::~_Container_base_secure()  Line 115    C++
msvcp90d.dll!std::_String_base::~_String_base()  + 0x11 bytes    C++
msvcp90d.dll!std::_String_val<unsigned short, std::allocator<unsigned short> >::~_String_val<unsigned short,std::allocator<unsigned short> >()  + 0x11 bytes    C++
msvcp90d.dll!std::basic_string<char, std::char_traits<char>,std::allocator<char> >::~basic_string<char,std::char_traits<char>,std::allocator<char> >()  Line 917 + 0xf bytes    C++

However, if you choose Method 2, it will exit gracefully. And both method works under Release build.

The first alarming thing from the call stack is the fact that we are calling into msvcp90d.dll. Strings, unlike other STL containers, is separately compiled into another DLL since VC80.

Remember, to turn off HID and Secure SCL, it is required that all binaries linked by a translation unit to have the same HID and Secure SCL settings. After some online search, it is clear that msvcp90d.dll is built with HID = 1.

Yikes! Since we can’t build msvcp90d.dll, there isn’t much we can do. But Microsoft isn’t stupid, they clearly have worked around some of the problems because Method 2 does work.

Stepping In std::string

In C++, the devil is in the details. Method 1 and Method 2 appears to be functionally equvialent, they are vastly different underneath.

// Method 1
string dot_abc = "." + abc;

At a glance, Method 1 should invoke the operator+ with const char * as the left argument, and std::string as the right argument. After stepping into the function call, it calls into an operator+ in that constructs a basic_string object.

//string L27 operator +(const char *, std::string)
template<class _Elem,
	class _Traits,
	class _Alloc> inline
	basic_string<_Elem, _Traits, _Alloc> __CLRCALL_OR_CDECL operator+(
		const _Elem *_Left,
		const basic_string<_Elem, _Traits, _Alloc>& _Right)
	{	// return NTCS + string
	return (basic_string<_Elem, _Traits, _Alloc>(_Left) += _Right);
	}

It calls a copy constructor that takes in _Left (which is “.”) in this case, and performs operator+= with _Right (which is std::string abc).

// xstring L661 cctor(const char*)
__CLR_OR_THIS_CALL basic_string(const _Elem *_Ptr)
	: _Mybase()
	{	// construct from [_Ptr, <null>)
	_Tidy();
	assign(_Ptr);
	}

In method 2, the operation is different. First, a copy constructor is invoked to create a temp string.

// xstring L798 cctor(const char *)
__CLR_OR_THIS_CALL basic_string(const _Elem *_Ptr, _NO_DEBUG_PLACEHOLDER)
	: _Mybase()
	{	// construct from [_Ptr, <null>)
	this->_Myfirstiter = _IGNORE_MYITERLIST;
	_Tidy();
	assign(_Ptr);
	}

Then it will invoke operator+ with std::string as the left and right argument.

// string L17 operator +(std::string const &, std::string const &)
template<class _Elem,
	class _Traits,
	class _Alloc> inline
	basic_string<_Elem, _Traits, _Alloc> __CLRCALL_OR_CDECL operator+(
		const basic_string<_Elem, _Traits, _Alloc>& _Left,
		const basic_string<_Elem, _Traits, _Alloc>& _Right)
	{	// return string + string
	return (basic_string<_Elem, _Traits, _Alloc>(_Left) += _Right);
	}

Notice anything strange?

For the operation where “.” is copied into a std::string, the copy constructor invoked by Method 1 and Method 2 are different! In method 2, it has a different signature, and there is an extra line in the copy constructor – this->_Myfirstiter = _IGNORE_MYITERLIST.

This is probably one of Visual Studio’s work around to allow programs compiled with HID=0 to safely invoke the std::string library in msvcp90d.dll. Unfortunately, there are loop holes in their patch that fails in Method 1.

Conclusion

If you want to turn off HID and Secure SCL for performance reason, be careful with the string library. There are quite a few bugs in VC9.0 that crashes on perfectly legal C++ code. The example above is one of several scenarios that we’ve found. We have also seen similar crashes on certain usage of stringstream.

On a side note, a co-worker of mine filed this bug in Microsoft Connect. They closed the bug shortly, and told him that it has been fixed in VC10 Beta 2. Basically, they are suggesting that we should upgrade our compiler to a beta version, and pay them more money when it officially comes out. Great customer support, M$.

Some Relief in Debugging Boost Functions

F11 Hell

F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11 F11. Finally!!

That’s the pain of debugging a boost::function in Visual Studio.

Consider the following code:

class CTest { public: void func(int i) {return 0;} }
boost::function f = boost::bind(&CTest::func, &t, _1);
f(1);

To step into the target function f, it requires 30 F11 keystrokes. Since our product uses (or overuses) boost::function, debugging can be a nightmare.

Counter the Counter Arguments

Before I get too far, let’s answer some counter arguments.

1. Who presses F11 30 times? I know exactly what to step over, so I perform combination of F10 and F11 to navigate my way through.

Answer: The exact F10/F11 combination is tricky. There are countless times that I over-pressed F10, skipped over the crucial functions.

2. There isn’t that much code to step through.

Answer: If I have to step into 5 boost functions, that’s 30 *5 = 150 F11’s. If I debug this code 30 times a day, that’s 150 * 30 = 4500 keystrokes.

Just admit it. Debugging boost::function with Visual Studio sucks.

Some Relief

In Visual Studio, there is a hidden feature that allows you to step over certain functions. Basically, it is a bunch of regular expressions that the debugger looks into. If the function signature matches the specified string, the debugger can either step into or step over the matched function.

So I spent some time crafting some regular expression to relief the pain of boost functions. It will bring down the number of F11 from 30 to 16. It’s not a complete solution, but it does help tremendously.

Add the following four keys to [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\9.0\NativeDE\StepOver]

boost_function_step_into = boost\:\:_bi\:\:list[0-9]\<boost\:\:_bi\:\:value.*=StepInto
boost_function_list_no_step_into = boost\:\:_bi\:\:list[0-9].*
boost_function_function_base_no_step_into = boost\:\:function_base\:\:.*
boost_function_unwrapper_no_step_into = boost\:\:_bi\:\:unwrapper.*

More Information

I have tested this on Boost library version 1.36, 1.37 and 1.39.

The .reg file that automatically updates your Visual Studio 9.0 path can be downloaded here.

For older Visual Studio, this trick also work. Click here for the registry location.