Saturday, November 15, 2014

The auto keyword versus portability

Last week I attended Code::Dive conference which took place in Wrocław, Poland. The conference featured names like Scott Meyers, Venkat Subramaniam and Andrzej Krzemieński. Hopefully I'll find some time to write down a few words of comment on that event soon.

But while I'm still excited about the conference and my mind processes the thoughts that came to me that day, I'd like to refer to Scott Meyers' talk titled "Support for embedded programming in C++11 and C++14". During his talk Scott presented the auto keyword and constexpr variables and functions.

On one of his slides he showed a typical bug related to the assignment of the return value of std::vector<T>::size() to unsigned int, for example:

std::vector<int> v;
unsigned int size = v.size();

I've seen such code in C++98/03 code bases plenty of times. Some strive to improve the situation by using std::size_t:

std::vector<int> v;
std::size_t size = v.size();

Unfortunately, such a procedure does not make the code any better as the actual type of the return value of std::vector<T>::size() is std::vector<T>::size_type, which is only said to be of some unsigned integral type.

Now back to Scott's talk on C++11 and its auto keyword. Scott argued that the auto keyword helps in writing portable (and in some cases more effective) code, such as:

std::vector<int> v;
auto size = v.size();

in which the type of size variable is automatically deduced by the compiler to whatever type is returned by std::vector<T>::size(). I have no doubts - the code looks cleaner and is free of any issues.

But when it comes to practice the very first example which used unsigned int for the type of size variable was not that bad. Problems start only when the vector grows to a size larger than UINT_MAX. What's the big deal then? How often do we operate on vectors with more than 4 billion elements? Not that often. Hence the example with vector's size might not convince everyone.

My motivation for this blog post was to show a more common case in which using the wrong type might result in a serious bug. Consider the following code:

std::string str = "this is my first C++ related post on this blog";
unsigned int pos = str.find("Java");
if (pos == std::string::npos)
  std::cout << "Hurray!\n";
else
  std::cout << "We're out of luck.\n";

As some of you remember std::string::find() returns std::size_t, not unsigned int. The thing is that on some platforms, like x86, the types are of equal size, so the code might actually work as expected and print "Hurray!". However, for example on x86_64 platform the size of std::size_t type is larger than the size of unsigned int. In this case the if condition in the exemplary code evaluates to false, regardless of the input and the search strings! This is because std::string::npos is defined to be the greatest possible value for an element of type std::size_t. Therefore assigning it to unsigned int looses some part of the value and makes the code simply buggy.

The issue becomes even more serious when std::string::find() is put inside a loop to parse a string in an iterative manner:

std::string str = "this is my first C++ related post on this blog";
unsigned int pos = str.find(" ");
while (pos != std::string::npos)
  pos = str.find(" ", pos + 1);

Yes, the code isn't much useful but the point is it might result in an endless loop or a crash, and it does so only on certain platforms. I discovered this portability issue some time ago while I was porting a large code base to a 64-bit platform.

Obviously, as in the case with std::vector<T>::size() the solution is to use the auto keyword:

std::string str = "this is my first C++ related post on this blog";
auto pos = str.find(" ");
while (pos != std::string::npos)
  pos = str.find(" ", pos + 1);

and the problem is gone.

The conclusion is that, among a few other things, the auto keyword helps in writing more portable code. Now everyone should be convinced about that.

But C++11 is a luxury not everyone can afford... Those, like me, who cannot use C++11 for any reason, are undoubtedly required to specify the full type as they shall not assume unsigned int fits everywhere. And the above examples with std::string::find() prove the problem is much more common than one might expect.

10 comments:

  1. +10 for an actual real life example. Send it to Scott so that he can use it the next time instead of the vector one!

    ReplyDelete
    Replies
    1. I sent it to Scott the day I wrote this article. He appreciated the example very much and promised to use it next time.

      Delete
  2. But this is only a problem when compiling with a warning level that doesn't scream on assignment to a smaller type, right?

    ReplyDelete
    Replies
    1. Do you refer to -Wconversion in gcc? Honestly you made me google for it and I just found out about it.

      Delete
  3. C++11 and 64-bit Issues: http://www.viva64.com/en/b/0253/

    ReplyDelete
  4. Note that both gcc and clang catch the npos conversion issue when using -Wconversion see it live on godbolt https://godbolt.org/g/RpcXqJ

    ReplyDelete