The author discusses how compilers go to great lengths to make poorly written C/C++ code work as intended, despite issues like passing non-POD types like std::string to variable argument functions. He provides examples of code that shouldn't work but does, due to efforts by compiler developers. The author suspects compiler optimizations are sometimes designed to produce practical rather than theoretically correct behavior for simple programs. Overall the document praises the unseen work of compiler developers in supporting legacy code.
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Big Brother helps you understand CString issues
1. Big Brother helps you
Author: Andrey Karpov
Date: 13.07.2010
I was convinced one more time that programmers write programs absolutely carelessly, so that their
programs work not because of their skill but due to chance and care of Microsoft or Intel compiler
developers. Right it is they who really care and put crutches under our lop-sided programs when
necessary.
Further is a byte-rending story of the CString class and daughter of its, the Format function, for you to
read.
Pray, pray for compilers and their developers! They are spending so much effort to make our programs
work despite many drawbacks and even errors. At the same time, their work is hard and invisible. They
are noble knights of coding and guardian angels of us all.
I knew that Microsoft has a department responsible for providing maximum compatibility of new
versions of operating systems with old applications. Their base contains more than 10000 most popular
obsolete programs that must work in new versions of Windows. It is these efforts thanks to which I
managed recently to play Heroes of Might and Magic II (a game of 1996) under 64-bit Windows Vista
without problems. I think the game can be successfully launched under Windows 7 as well. Here are
interesting notes by Alexey Pahunov on the topic (RU) of compatibility [1, 2, 3].
However, it seems that there are also other departments whose business is to help our horrible C/C++
code work and work on. But let me start this story from the very beginning.
I am involved in development of the PVS-Studio tool intended for analysis of application source code.
Quiet, friends this is not an ad. This time it is really a work of mercy since we have started to create a
free of charge general-purpose static analyzer. It is far from an alpha-version but the work is going on
and I will write a post about this analyzer some day. I started speaking about it because we have begun
to collect the most interesting type errors and learn to diagnose them.
Many errors are related to using ellipses in programs. Here is a theoretical reference:
There are functions in definition of which it is impossible to specify the number and types of all the
acceptable parameters. In this case the list of the formal parameters ends with an ellipsis (...) that
means: and perhaps some more arguments". For instance:
int printf(const char* ...);
One of such unpleasant yet easily diagnosed errors is passing of an object of the class type instead of a
pointer to a string into a function with a variable number of arguments. Here is an example of this error:
wchar_t buf[100];
std::wstring ws(L"12345");
swprintf(buf, L"%s", ws);
2. This code will cause generation of total rubbish in the buffer or a program crash. Certainly in a real
program, the code will be more complicated, so please do not write comments on my post telling me
that the GCC compiler will check the arguments and warn you unlike Visual C++. Strings might be passed
from resources or other functions and you will fail to check anything. But diagnosis is simple in this case
a class object is passed into a function of string formation and it causes an error.
The correct version of this code looks as follows:
wchar_t buf[100];
std::wstring ws(L"12345");
swprintf(buf, L"%s", ws.c_str());
It is this reason that you might pass any thing into functions with a variable number of arguments why
almost every book on C++ programming does not recommend to use them. Instead of these, they
suggest to use safe mechanisms, for instance, boost::format. However, let these recommendations be,
but there is very much code with various printfs, sprintfs, CString::Formats in the world and we will have
to live with it for a long time. That is why we implemented a diagnostic rule to detect such dangerous
constructs.
Lets carry out theoretical investigations and see what is incorrect about the code above. Well, it is
incorrect twice.
1. The argument does not correspond to the defined format. Since we define "%s", we must pass a
pointer to the string into the function either. But in theory we may write our own sprintf
function that will know that an object of the std::wstring class was passed to it and correctly
print it. However, it is also impossible because of the second reason.
2. Only a POD-type can be an argument for the ellipsis "..." while std::string is not a POD-type.
Theoretical reference on POD types:
POD is abbreviation of "Plain Old Data". The following types refer to POD-types:
1. all predefined arithmetic types (including wchar_t and bool);
2. types defined with the enum key word;
3. pointers;
4. POD-structures (struct or class) and POD-unions which meet the following requirements:
a. do not contain user constructors, destructors or copying assignment operator;
b. do not have base classes;
c. do not contain virtual functions;
d. do not contain protected or private non-static data members;
e. do not contain non-static data members of non-POD-types (or arrays of such types) and
also references.
Correspondingly, the std::wstring class does not refer to POD-types since it has constructors, base class
and so on.
3. If you pass an object which is not a POD-type to an ellipsis, it causes an unexpected behavior. Thus, at
least theoretically, we cannot in any way correctly pass an object of the std::wstring type as an ellipsis
argument.
The same thing must be with the Format function from the CString class. This is an incorrect version of
the code:
CString s;
CString arg(L"OK");
s.Format(L"Test CString: %sn", arg);
This is the correct version of the code:
s.Format(L"Test CString: %sn", arg.GetString());
Or, as it is suggested in MSDN [4], we may use an explicit cast operator LPCTSTR implemented in the
CString class to get a pointer to the string. Here is an example of correct code from MSDN:
CString kindOfFruit = "bananas";
int howmany = 25;
printf("You have %d %sn", howmany, (LPCTSTR)kindOfFruit);
So, everything seems clear and transparent. It is also clear how to make a rule. We will detect misprints
made when using functions with a variable number of arguments.
We did this. And I was shocked by the result. It turned out that most developers never think of these
issues and write code like the following one with a quiet conscience:
class CRuleDesc
{
CString GetProtocol();
CString GetSrcIp();
CString GetDestIp();
CString GetSrcPort();
CString GetIpDesc(CString strIp);
...
CString CRuleDesc::GetRuleDesc()
{
CString strDesc;
strDesc.Format(
5. // I think you understand
// that we may give you such examples endlessly.
Some developers do think but then forget. That is why the code like this looks so touching:
CString sAddr;
CString m_sName;
CString sTo = GetNick( hContact );
sAddr.Format(_T("%smailslot%s"),
sTo, (LPCTSTR)m_sName);
We collected so many such examples in projects we test our PVS-Studio on that I cannot understand
how it all can be. And still everything works I was convinced in it after writing a test program and trying
various ways of using CString.
What is the reason? It seems to me that compiler developers could not stand anymore endless
questions why Indian programs using CString do not work and accusations of the compiler being bad
and unable to work with strings. So they secretly held a sacred rite of exorcism by driving out evil from
CString. They made an impossible thing possible they implemented the CString class in such a crafty
way that you may pass it to functions like printf and Format.
It was done quite intricately and those who want to know how read the source code of the CStringT
class and also the detailed discussion "Pass CString to printf?" [5]. I will not go into details and will
stress only one important thing. Special implementation of CString is not enough since passing of a non-
POD-type theoretically causes an unexpected behavior. So, the Visual C++ developers together with Intel
C++ developers made it so that the unexpected behavior is always a correct result :) For correct program
operation can well be a subset of an unexpected behavior. :)
I also start thinking about some strange things in the compilers behavior when it builds 64-bit programs.
I suspect that the compilers developers deliberately make the programs behavior not theoretical but
practical (i.e. efficient) in those simple cases when they recognize some pattern. The clearest example is
a pattern of a loop. Here is an example of incorrect code:
size_t n = BigValue;
for (unsigned i = 0; i < n; i++) { ... }
Theoretically, if the value n > UINT_MAX is larger, an eternal loop must occur. But it does not occur in
the Release version since a 64-bit register is used for the variable "i". Of course, if the code is a bit more
complicated, the eternal loop will occur but at least in some cases the program will be lucky. I wrote
about this in the article "A 64-bit horse that can count" [6].
I thought earlier that this unexpectedly lucky behavior of a program is determined only by the specifics
of optimization of Release versions. But now I am not sure about this. Perhaps it is a conscious attempt
to make an inefficient program work at least sometimes. Certainly I do not know whether the cause lies
6. in optimization or care of Big Brother, but it is a good occasion to philosophize, isnt it? :) Well, and the
one who knows will hardly tell us. :)
I am sure there are also other cases when the compiler stretches out its hand to cripple programs. If I
encounter something interesting I will tell you.
May your code never glitch!
References
1. Alexey Pahunov's Russian blog. Backward compatibility is serious.
http://www.viva64.com/go.php?url=390
2. Alexey Pahunov's Russian blog. AppCompat. http://www.viva64.com/go.php?url=391
3. Alexey Pahunov's Russian blog. Is Windows 3.x live? http://www.viva64.com/go.php?url=392
4. MSDN. CString Operations Relating to C-Style Strings. Topic: Using CString Objects with Variable
Argument Functions . http://www.viva64.com/go.php?url=393
5. Discussion at eggheadcafe.com. Pass CString to printf? http://www.viva64.com/go.php?url=394
6. Andrey Karpov. A 64-bit horse that can count. http://www.viva64.com/art-1-2-377673569.html