'Premature optimisation is the root of all evil', Donald Knuth told us in 1974. He was talking about the perceived trade-off between optimising performance and keeping code readable and maintainable. And we all know that we shouldn't even try optimising anything without measuring if there's an actual bottleneck in our product first. Or do we?
Making something faster, even at the micro-level, doesn't always mean that readability suffers. There are a lot of improvements that make code both more readable and better-performing, turning what might be called premature optimisation into just another healthy refactoring. And although some developers like to label almost all forms of optimisation as premature, this depends on a lot of factors, and designing for performance might actually be important for the product you are building. Let us also not forget that optimising, even prematurely, can be a lot of fun!
In this workshop we take a look at examples of optimisations that make sense to both the reader and the end-user of code. We also let ourselves go and prematurely optimise the heck out of some (C++ and Python) code, so make sure to bring your laptop if you have one!
7. PROGRAMMERS WASTE ENORMOUS AMOUNTS OF
TIME THINKING ABOUT, OR WORRYING ABOUT, THE
SPEED OF NONCRITICAL PARTS OF THEIR
PROGRAMS, AND THESE ATTEMPTS AT
EFFICIENCY ACTUALLY HAVE A STRONG NEGATIVE
IMPACT WHEN DEBUGGING AND MAINTENANCE
ARE CONSIDERED.
WE SHOULD FORGET ABOUT SMALL EFFICIENCIES,
SAY ABOUT 97% OF THE TIME: PREMATURE
OPTIMISATION IS THE ROOT OF ALL EVIL.
YET WE SHOULD NOT PASS UP OUR
OPPORTUNITIES IN THAT CRITICAL 3%.
“
”
8. IN ESTABLISHED ENGINEERING
DISCIPLINES A 12%
IMPROVEMENT, EASILY OBTAINED,
IS NEVER CONSIDERED MARGINAL
AND I BELIEVE THE SAME
VIEWPOINT SHOULD PREVAIL IN
SOFTWARE ENGINEERING.
“
”
13. C++
Close to the metal
Object model well-defined [Lippman96]
Efficiency has been a major design goal for C++
from the beginning
“You don’t pay for what you don’t use”
Benefits from years of C optimisation experience
15. if-else-if
void GetAndProcessResult() {
if (GetResult() == DOWNLOADED)
return ProcessDownloadedFile();
else if (GetResult() == NEEDS_DOWNLOAD)
return DownloadFile();
else if (GetResult() == NOT_AVAILABLE)
return ReportNotAvailable();
else if (GetResult() == ERROR)
return ReportError();
}
16. if-else-if
void GetAndProcessResult() {
const int result = GetResult();
if (result == DOWNLOADED)
return ProcessDownloadedFile();
else if (result == NEEDS_DOWNLOAD)
return DownloadFile();
else if (result == NOT_AVAILABLE)
return ReportNotAvailable();
else if (result == ERROR)
return ReportError();
}
17. if-else-if switch!
void GetAndProcessResult() {
switch (GetResult()) {
case DOWNLOADED:
return ProcessDownloadedFile();
case NEEDS_DOWNLOAD:
return DownloadFile();
case NOT_AVAILABLE:
return ReportNotAvailable();
case ERROR:
return ReportError();
}
}
18. The joys of switch
Clarifies intention
Clearer warnings / error messages
Always allows compiler to create jump table or do
binary search
O(1) lookups
19. Jump table
void GetAndProcessResult() {
switch (GetResult()) {
case DOWNLOADED:
return ProcessDownloadedFile();
case NEEDS_DOWNLOAD:
return DownloadFile();
case NOT_AVAILABLE:
return ReportNotAvailable();
case ERROR:
return ReportError();
}
}
20. Jump table
void GetAndProcessResult() {
switch (GetResult()) {
}
}
case 0:
return ProcessDownloadedFile();
case 1:
return DownloadFile();
case 2:
return ReportNotAvailable();
case 3:
return ReportError();
21. Jump table
case 0: return ProcessDownloadedFile();
case 1: return DownloadFile();
case 2: return ReportNotAvailable();
case 3: return ReportError();
22. Jump table
void GetAndProcessResult() {
switch (GetResult()) {
}
}
case 0:
return ProcessDownloadedFile();
case 1:
return DownloadFile();
case 2:
return ReportNotAvailable();
case 3:
return ReportError();
23. Jump table
void GetAndProcessResult() {
switch (GetResult()) {
case 102:
return ProcessDownloadedFile();
case 103:
return DownloadFile();
case 104:
return ReportNotAvailable();
case 105:
return ReportError();
}
}
24. Jump table
void GetAndProcessResult() {
switch (GetResult()) {
case 102+0:
return ProcessDownloadedFile();
case 102+1:
return DownloadFile();
case 102+2:
return ReportNotAvailable();
case 102+3:
return ReportError();
}
}
25. Jump table?
void GetAndProcessResult() {
switch (GetResult()) {
case 1:
return ProcessDownloadedFile();
case 16:
return DownloadFile();
case 88:
return ReportNotAvailable();
case 65536:
return ReportError();
}
}
26. Jump table Binary search
void GetAndProcessResult() {
switch (GetResult()) {
case 1:
return ProcessDownloadedFile();
case 16:
return DownloadFile();
case 88:
return ReportNotAvailable();
case 65536:
return ReportError();
}
}
Compilers are smart
27. Predicting branches
Predicting branches is hard
Automated mechanisms (profile-guided
optimisations) can offer big gains at the cost of
having to profile your build
If you’re very certain of your case, some compilers
offer instructions such as __builtin_expect
(gcc, clang)
28. Strings
Most used and mis-used
type in programming
Mutable strings are the root
of all evil
29. Strings misuse
String is not a basic type
A mutable string is a dynamic array of characters
Almost anything you can do with a string is a
function of the characters in that string
Think about what will happen with long strings
30. Using std::string
Be careful with modifying operations such as
append()
Avoid creating a string out of many parts, better to
create at once
Look into when alternative string types are useful
31. Growing strings
std::string CopyString(
const char* to_copy, size_t length) {
std::string copied;
for (size_t i = 0; i < length; i += BLOCKSIZE)
copied.append(to_copy + i,
std::min(BLOCKSIZE, length - i));
return copied;
}
32. Growing strings
std::string CopyString(
const char* to_copy, size_t length) {
std::stringstream copied;
for (size_t i = 0; i < length; i += BLOCKSIZE)
copied.write(to_copy + i,
std::min(BLOCKSIZE, length - i));
return copied.str();
}
33. Growing strings
std::string CopyString(
const char* to_copy, size_t length) {
std::string copied;
copied.reserve(length);
for (size_t i = 0; i < length; i += BLOCKSIZE)
copied.append(to_copy + i,
std::min(BLOCKSIZE, length - i));
return copied;
}
34. Growing strings
Method Time spent, 3 run average (ms)
std::string::append() 1399
std::stringstream 5102
std::string::append() with
std::string::reserve()
851
35. Converting numbers to
strings and vice versa
Can be a major source of slowness
Often more features than needed
Investigate alternative libraries (boost::spirit)
Writing specialised functions a possibility (but with
its own maintainability issues)
44. Avoiding virtual functions or
virtual function calls
Only declare functions (this includes destructors)
virtual when it’s actually needed
Don’t use virtual functions for types that are
handled by value
If type is known, no lookup is needed
Sometimes compile-time polymorphism offers an
alternative
45. Avoiding function calls
For small functions called in tight loops, inlining
helps
Allow the compiler to inline functions where it
makes sense (have definition available)
If the compiler doesn’t co-operate and you’re sure
it makes sense (measure this), force it
46. Tail calls
A tail call happens when a function is the final call
made in another function
Tail calls can be eliminated, so that they end up
being a jump construction
Eliminates call overhead
Be aware of this and create tail calls where possible
Also allows efficient recursive functions
47. Facilitating tail calls
unsigned djb_hash(const char* string) {
int c = *string;
if (!c)
return 5381;
return djb_hash(string + 1) * 33 + c;
}
49. Facilitating tail calls
Method Time spent, 3 run average (ms)
Tail call elimination not possible 2274
Tail call elimination possible 1097
50. Use lambda functions
C++11 lambdas can always be trivially inlined,
unlike function pointers
Offers an elegant and fast way of processing data
Combines well with aggregate functions
53. Use lambda functions
Method Time spent, 3 run average (ms)
Function pointer (not inlined) 1684
Lambda function (inlined) 220
54. Return-value optimisation
Allows the compiler to avoid copy construction on
temporaries
Executed by compilers when function returns one
named variable
Be aware of where it could be possible, allow the
compiler to help you
But sometimes it’s more helpful to implement…
55. Move semantics
User defines for movable types how they can be
moved correctly
‘Guaranteed’ way of getting return value
optimisation
Helpful in combination with std::vector (to keep
data local)
Can come for free using “Rule of zero”
56. Move semantics
class Typical {
public:
Typical()
: content_("this is a typical string") {}
Typical(const Typical& other)
: content_(other.content_) {}
private:
std::string content_;
};
57. Move semantics
class Typical {
public:
TypicalMove ()
: content_("this is a typical string") {}
private:
std::string content_;
};
59. Move semantics
Method Time spent, 3 run average (ms)
With copy constructor 2617
Following “Rule of zero” 1002
60. Data
Make sure that all data you
need in a loop is physically
as close together as possible
Allows CPU to use its cache
efficiently
Use contiguous memory
arrays where possible
Avoid data structures that
rely on pointers (eg. linked
lists)
61. Data
int sum() {
std::forward_list<int> data(1024, 5);
int result;
for (int i = 0; i < 1000000; ++i) {
result = std::accumulate(
data.begin(), data.end(), 0);
}
return result;
}
62. Data
int sum() {
std::vector<int> data(1024, 5);
int result;
for (int i = 0; i < 1000000; ++i) {
result = std::accumulate(
data.begin(), data.end(), 0);
}
return result;
}
65. Python
Emphasises readability
Dynamic type system, automatic memory
management
Several projects dedicated to improving
performance
Always try to avoid calling functions many times
71. Prefer slice notation over
“copy constructor”
Method Time spent, 3 run minimum (ms)
Copy via list() 2671
Slice notation 1679
72. All functions have overhead
Function call overhead in Python is substantial
All functions can be redefined - even built-ins need
to be looked up first
Try to avoid function calls (even more so than in
C++)
Using literals or other built-in constructs can help
avoid function calls
73. String formatting
Python has a built-in function str() to convert
other types to string
In most cases this offers enough features for
conversions of types to strings
Faster than formatting
80. Method Time spent, 3 run minimum (ms)
Summing manually 1728
Using sum() 587
Prefer aggregate functions
81. Prefer aggregate functions
Python has a number of built-in functions for
aggregates: all(), min(), max(), sum(), etc
Using them brings big speed advantages
Always preferred over manually iterating
92. Conclusions
Optimising is fun!
Knowledge about optimisations can help you help
your compiler or interpreter
Not all optimisations worsen maintainability
Micro-optimisations can differ between languages,
compilers, architectures… Measuring works!
Test your assumptions