Saturday, November 15, 2014

The auto keyword versus portability

Last week I attended Code::Dive conference which took place in Wrocław, Poland. The conference featured names like Scott Meyers, Venkat Subramaniam and Andrzej Krzemieński. Hopefully I'll find some time to write down a few words of comment on that event soon.

But while I'm still excited about the conference and my mind processes the thoughts that came to me that day, I'd like to refer to Scott Meyers' talk titled "Support for embedded programming in C++11 and C++14". During his talk Scott presented the auto keyword and constexpr variables and functions.

On one of his slides he showed a typical bug related to the assignment of the return value of std::vector<T>::size() to unsigned int, for example:

std::vector<int> v;
unsigned int size = v.size();

I've seen such code in C++98/03 code bases plenty of times. Some strive to improve the situation by using std::size_t:

std::vector<int> v;
std::size_t size = v.size();

Unfortunately, such a procedure does not make the code any better as the actual type of the return value of std::vector<T>::size() is std::vector<T>::size_type, which is only said to be of some unsigned integral type.

Now back to Scott's talk on C++11 and its auto keyword. Scott argued that the auto keyword helps in writing portable (and in some cases more effective) code, such as:

std::vector<int> v;
auto size = v.size();

in which the type of size variable is automatically deduced by the compiler to whatever type is returned by std::vector<T>::size(). I have no doubts - the code looks cleaner and is free of any issues.

But when it comes to practice the very first example which used unsigned int for the type of size variable was not that bad. Problems start only when the vector grows to a size larger than UINT_MAX. What's the big deal then? How often do we operate on vectors with more than 4 billion elements? Not that often. Hence the example with vector's size might not convince everyone.

My motivation for this blog post was to show a more common case in which using the wrong type might result in a serious bug. Consider the following code:

std::string str = "this is my first C++ related post on this blog";
unsigned int pos = str.find("Java");
if (pos == std::string::npos)
  std::cout << "Hurray!\n";
else
  std::cout << "We're out of luck.\n";

As some of you remember std::string::find() returns std::size_t, not unsigned int. The thing is that on some platforms, like x86, the types are of equal size, so the code might actually work as expected and print "Hurray!". However, for example on x86_64 platform the size of std::size_t type is larger than the size of unsigned int. In this case the if condition in the exemplary code evaluates to false, regardless of the input and the search strings! This is because std::string::npos is defined to be the greatest possible value for an element of type std::size_t. Therefore assigning it to unsigned int looses some part of the value and makes the code simply buggy.

The issue becomes even more serious when std::string::find() is put inside a loop to parse a string in an iterative manner:

std::string str = "this is my first C++ related post on this blog";
unsigned int pos = str.find(" ");
while (pos != std::string::npos)
  pos = str.find(" ", pos + 1);

Yes, the code isn't much useful but the point is it might result in an endless loop or a crash, and it does so only on certain platforms. I discovered this portability issue some time ago while I was porting a large code base to a 64-bit platform.

Obviously, as in the case with std::vector<T>::size() the solution is to use the auto keyword:

std::string str = "this is my first C++ related post on this blog";
auto pos = str.find(" ");
while (pos != std::string::npos)
  pos = str.find(" ", pos + 1);

and the problem is gone.

The conclusion is that, among a few other things, the auto keyword helps in writing more portable code. Now everyone should be convinced about that.

But C++11 is a luxury not everyone can afford... Those, like me, who cannot use C++11 for any reason, are undoubtedly required to specify the full type as they shall not assume unsigned int fits everywhere. And the above examples with std::string::find() prove the problem is much more common than one might expect.

Tuesday, November 11, 2014

The very first post - late, but still...

Back in 2011 I decided to start my own blog. It was supposed to be programming related mainly. So I registered on blogger.com and wrote a post about one of the common pitfalls in Python. It's a shame it has never seen the light of day. Maybe I will get back to it one day.

Now 2014 is closing and my blog still has no posts. I determined that it's now or never. I chose the former...

Since I'm a full-time software engineer, a husband, a father, a PhD student, a huge fan of progressive metal and an advocate of C++ and Python then due to limited spare time my posts will rather be short. But hopefully I will come up with something interesting and useful every month or two.

Here it is - my first blog post. Done. Published. Finally.