I'm going on to tell you about how programmers walk on thin ice without even noticing it. Let's speak on shift operators <<,>>. The working principles of the shift operators are evident and many programmers even don't know that using them according to the C/C++ standard might cause undefined or unspecified behavior.
1. Wade not in unknown waters. Part three.
Author: Andrey Karpov
Date: 11.04.2012
I'm going on to tell you about how programmers walk on thin ice without even noticing it. Let's speak on
shift operators <<, >>. The working principles of the shift operators are evident and many programmers
even don't know that using them according to the C/C++ standard might cause undefined or unspecified
behavior.
You can read the previous articles here: [1], [2].
Excursus to the history
A bit of history first. The necessity of bit shifting operations is evident to any programmer. Anyone
sooner or later faces the need to handle individual bits and bit masks. However, shift operators are
much more popular among programmers than they should. The reason is that you can multiply and
divide numbers by powers of two. For example, the "X << 3" operation will multiply X by 8. In the past,
the advantage of this number multiplication/division method lied in the speed of its work.
I've just got a book from the dusty shelf with a description of assembler commands for processors from
8086 to 80486. I've found a table with the number of clock cycles necessary to perform various
instructions.
Multiplying a 16-bit register by a memory cell using the MUL instruction takes about 124-139 clock
cycles on the 8086 processor!
A shift of a 16-bit register by N digits using the SHL instruction takes 8+4*N clock cycles on the 8086
processor. That is, it will take 72 clock cycles at worst.
You could get a noticeable speed gain by using various tricks handling bitwise operations when
calculating arithmetic expressions. This is what became the reason for massively using shifts - first in
assembler, and then in C and C++. The first C/C++ compilers were simple. You could get a performance
gain by explicitly prompting the compiler to use a shift instead of multiplication or division instructions
in certain places.
As processors were developing, shift operators were of use for a long time. On the 80486 processor,
multiplication now took about 26 clock cycles. Seems like it became much better, doesn't it? But a shift
operator took just 3 clock cycles at that time and again appeared to be better than multiplication.
Fortunately, most of these forced optimizations have been forgotten by now. First, compilers have
become smarter and now use an optimal instruction set to calculate arithmetic expressions. Second,
processors have undergone great changes too. Pipelines, branch predictions, register renaming and
many other things have appeared. That's why an ordinary programmer of nowadays cannot tell for sure
how much time will take the execution of a certain instruction. But it's clear that if some fragments of
code are not ideal, you may not even notice it. The processor will split instructions into micro-
instructions and start executing them in parallel. To be honest, I don't make out now how it all goes on
there. I've come to understanding that it's no longer reasonable to know all the subtleties starting with
the Intel Pentium processor. So, I've concluded that one should not think that one knows better how to
2. write an optimized code and use shifts and bitwise operations wherever possible. It's not necessarily
true that you can make the code faster than the compiler's optimizer can. But you can tell for sure that
the program will become complicated and difficult to understand in that case.
Note. Everything said above doesn't mean that you cannot benefit from bitwise operations anymore.
There are many interesting and useful tricks [3]; just don't get too fond of them.
Undefined behavior
It all began when I decided to create more diagnostics related to undefined behavior [4] and unspecified
behavior [5] in PVS-Studio. It took me rather little time and effort to create a rule to detect incorrect use
of shift operators. And after that I had to stop and think it over.
It turned out that programmers are very fond of shifts. They use them in every way they could, which
often leads to undefined behavior from the viewpoint of the coding standard. But theory is one thing
and practice is another. Is there sense in persecuting code that has been faithfully serving you for many
decades and gone through many compilers? That's a difficult question. Despite the code being incorrect,
compilers adhere to some secret agreement and process it uniformly.
After pondering over it for a long time, I finally decided to leave this diagnostic rule in PVS-Studio
without making any exceptions to it. If there are too many complaints from users, maybe I will change
my mind. However, perhaps users will be satisfied by the capability of disabling this diagnostic or use
other methods of warning suppression.
By the way, it is these painful thoughts that made me write the article. I hope that you will find the
information I'm going to show you interesting and useful.
So, let's see what the C++11 standard has to say about shift operators:
The shift operators << and >> group left-to-right.
shift-expression << additive-expression
shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
1. The type of the result is that of the promoted left operand. The behavior is undefined if the right
operand is negative, or greater than or equal to the length in bits of the promoted left operand.
2. The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an
unsigned type, the value of the result is E1 * 2^E2, reduced modulo one more than the maximum value
representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1*2^E2
is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
3. The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a
signed type and a non-negative value, the value of the result is the integral part of the quotient of
E1/2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
It is sad to read such texts. But don't worry - now we will study various issues by examples.
The simplest case leading to undefined behavior is the situation when the right operand has a negative
value. For example:
3. int A = 10;
int B = A << -5;
Thanks God, nobody does it that way. Well, at least we haven't seen such errors after analyzing more
than 70 open-source projects.
The next case is much more interesting. This is a shift by N bits where N is larger than the number of bits
in the left operand. Here is a simple example:
int A = 10;
int B = A << 100;
Let's see what such an error looks like in practice. The next code fragment was found in the Lib7z library:
SZ_RESULT
SafeReadDirectUInt64(ISzInStream *inStream, UInt64 *value)
{
int i;
*value = 0;
for (i = 0; i < 8; i++)
{
Byte b;
RINOK(SafeReadDirectByte(inStream, &b));
*value |= ((UInt32)b << (8 * i));
}
return SZ_OK;
}
PVS-Studio's diagnostic message: V610 Undefined behavior. Check the shift operator '<<. The right
operand ('(8 * i)' = [0..56]) is greater than or equal to the length in bits of the promoted left operand.
lib7z 7zin.c 233
The function tries to read the 64-bit value byte-by-byte. Unfortunately, it will fail if the number was
larger than 0x00000000FFFFFFFF. Note the "(UInt32)b << (8 * i)" shift. The size of the left operand is 32
bits. The shift takes from 0 to 56 bits. In practice, it will cause the high-order part of the 64-bit value to
remain filled with zeroes. Theoretically, it is undefined behavior here and the result cannot be predicted.
This is the correct code:
*value |= ((UInt64)b << (8 * i));
Readers may ask if the code below is correct:
4. char A = 1;
int B = A << 20;
Yes, it is. To the left of the << operator is the A variable consisting of only 8 bits. But the left part will be
extended to the int type before the shift. Therefore, a value of the 'int' type can be shifted by 20 bits.
And now for the most interesting thing - shifting of negative values. Here is a simple example:
int A = (-1) << 5; // undefined behavior
int B = (-1) >> 5; // unspecified behavior
We can see undefined or unspecified behavior in this code. There's no difference between them from a
practical point of view. Only one conclusion is to be drawn from this case - you should not write such
code.
We could finish at this point and cite a couple of examples. But unfortunately, there are two
peculiarities that spoil this idealistic picture.
The peculiarities that spoil the idealistic picture
Peculiarity N1. In the old C++ language standard of 1998, cases with undefined behavior are avoided. It
says only how the << operator behaves when unsigned values are shifted, but it doesn't say anything
about signed values. So, it is that very case when reading the standard doesn't make the point any
clearer to you: this case is simply not considered, and that's it.
So, from the viewpoint of C++ of 1998, the "(-1) << 5" construct doesn't cause undefined behavior.
However, it doesn't describe how it should work either.
Peculiarity N2. Programmers feel safe to shift negative values in many programs. It's hard to argue with
them, as the code does work.
Let's try to find out if we should refuse implementing the new diagnostic because of the above
mentioned peculiarities. We believe that we shouldn't.
The old C++ standard doesn't say anything about undefined behavior. But the new one does. It turns out
that the old standard simply was not precise enough. By the way, the new C language standard (I
checked the rough copy of 25-th June, 2010) also says that shifts of negative values cause undefined
behavior. The conclusion is you should eliminate incorrect code.
Now to the subject of a widespread use of dangerous shifts. They are really numerous. For example, in
the JPEG library you need to fill an array with the following values:
11...11111111111111b
11...11111111111101b
11...11111111111001b
11...11111111110001b
5. ....
This is how it is written:
/* entry n is (-1 << n) + 1 */
static const int extend_offset[16] = { 0,
((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1,
((-1)<<4) + 1, ((-1)<<5) + 1, ((-1)<<6) + 1,
((-1)<<7) + 1, ((-1)<<8) + 1, ((-1)<<9) + 1,
((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1
};
We cannot tell that the JPEG library is a bad one. This code is time-proved and has gone through various
compilers.
From the standard's viewpoint, it should be rewritten in the following way:
static const int extend_offset[16] =
{ 0,
((~0u)<<1) | 1, ((~0u)<<2) | 1, ((~0u)<<3) | 1,
((~0u)<<4) | 1, ((~0u)<<5) | 1, ((~0u)<<6) | 1,
((~0u)<<7) | 1, ((~0u)<<8) | 1, ((~0u)<<9) | 1,
((~0u)<<10) | 1, ((~0u)<<11) | 1, ((~0u)<<12) | 1,
((~0u)<<13) | 1, ((~0u)<<14) | 1, ((~0u)<<15) | 1
};
But it's up to you to decide whether or not you need such corrections. I can only advise that you should
do this: you don't know when and to what consequences it may lead.
We could give you other examples of negative value shifts, but they are all alike and won't be interesting
to read about.
Conclusions
1. Using bitwise operations and shifts was earlier considered as a token of programmer's skill and
allowed you to write fast code. Now it has almost no relevance. It's much more important that
the code is understandable. I advise that you play with bits only when it is really necessary.
2. Expressions of the "(-1) << N" kind are now declared as incorrect and leading to an undefined
behavior.
3. Expressions of the "(-1) << N" kind have been used for a long time and quite often. That's why
we cannot give strong arguments against using such constructs. The only arguments are the new
C and C++ language standards.
6. 4. It is up to you to decide if you should fix negative value shifts. But I do recommend doing this.
Just in case, at least.
5. Diagnostic messages covering dangerous shifts will be available in PVS-Studio starting with
version 4.60 which is to be released soon.
References
1. Wade not in unknown waters. Part one. http://www.viva64.com/en/b/0127/
2. Wade not in unknown waters. Part two. http://www.viva64.com/en/b/0129/
3. Sean Eron Anderson. Bit Twiddling Hacks. http://www.viva64.com/go.php?url=837
4. Wikipedia. Undefined behavior. http://www.viva64.com/go.php?url=668
5. Wikipedia. Unspecified behavior. http://www.viva64.com/go.php?url=747