ACCU 2015 Premature Optimisation Workshop

ACCU 2015
PROJECT
DATE CONFERENCE
23 APRIL
PREMATURE OPTIMISATION WORKSHOP
ARJAN VAN LEEUWEN

WWW.OPERA.COM
JOIN THE COOL KIDS ON THE INFORMATION SUPERHIGHWAY

OPTIMISING IS FUN
AND KNOWING HOW TO DO IT CAN BE USEFUL

PREMATURE
OPTIMISATION
IS THE ROOT OF
ALL EVIL
DONALD KNUTH,
“STRUCTURED
PROGRAMMING WITH GOTO
STATEMENTS”

PROGRAMMERS WASTE ENORMOUS AMOUNTS OF
TIME THINKING ABOUT, OR WORRYING ABOUT, THE
SPEED OF NONCRITICAL PARTS OF THEIR
PROGRAMS, AND THESE ATTEMPTS AT
EFFICIENCY ACTUALLY HAVE A STRONG NEGATIVE
IMPACT WHEN DEBUGGING AND MAINTENANCE
ARE CONSIDERED.
WE SHOULD FORGET ABOUT SMALL EFFICIENCIES,
SAY ABOUT 97% OF THE TIME: PREMATURE
OPTIMISATION IS THE ROOT OF ALL EVIL.
YET WE SHOULD NOT PASS UP OUR
OPPORTUNITIES IN THAT CRITICAL 3%.
“
”

IN ESTABLISHED ENGINEERING
DISCIPLINES A 12%
IMPROVEMENT, EASILY OBTAINED,
IS NEVER CONSIDERED MARGINAL
AND I BELIEVE THE SAME
VIEWPOINT SHOULD PREVAIL IN
SOFTWARE ENGINEERING.
“
”

SMALL THINGS CAN MAKE A DIFFERENCE
AND ARE WORTH STUDYING

Goals
Find small changes that can
make a diﬀerence

Don’t sacriﬁce elegance for
speed

Give ideas on how to
optimise

In the toolbox
Common sense (doing
nothing is always faster)

Disassembler

Time measurement

Proﬁling tools

C++
Close to the metal

Object model well-defined [Lippman96]

Efficiency has been a major design goal for C++
from the beginning

“You don’t pay for what you don’t use”

Benefits from years of C optimisation experience

Branches
Basis of much we do in
imperative languages

Compare and branch

if-else-if
void GetAndProcessResult() {
if (GetResult() == DOWNLOADED)
return ProcessDownloadedFile();
else if (GetResult() == NEEDS_DOWNLOAD)
return DownloadFile();
else if (GetResult() == NOT_AVAILABLE)
return ReportNotAvailable();
else if (GetResult() == ERROR)
return ReportError();
}

if-else-if
const int result = GetResult();
if (result == DOWNLOADED)
else if (result == NEEDS_DOWNLOAD)
else if (result == NOT_AVAILABLE)
else if (result == ERROR)
}

if-else-if switch!
switch (GetResult()) {
case DOWNLOADED:
case NEEDS_DOWNLOAD:
case NOT_AVAILABLE:
case ERROR:
}
}

The joys of switch
Clariﬁes intention

Clearer warnings / error messages

Always allows compiler to create jump table or do
binary search

O(1) lookups

Jump table
case DOWNLOADED:
case NEEDS_DOWNLOAD:
case NOT_AVAILABLE:
case ERROR:
}
}

Jump table
}
}
case 0:
case 1:
case 2:
case 3:

Jump table
case 0: return ProcessDownloadedFile();
case 1: return DownloadFile();
case 2: return ReportNotAvailable();
case 3: return ReportError();

Jump table
case 102:
case 103:
case 104:
case 105:
}
}

Jump table
case 102+0:
case 102+1:
case 102+2:
case 102+3:
}
}

Jump table?
case 1:
case 16:
case 88:
case 65536:
}
}

Jump table Binary search
case 1:
case 16:
case 88:
case 65536:
}
}
Compilers are smart

Predicting branches
Predicting branches is hard

Automated mechanisms (profile-guided
optimisations) can offer big gains at the cost of
having to profile your build

If you’re very certain of your case, some compilers
offer instructions such as __builtin_expect
(gcc, clang)

Strings
Most used and mis-used
type in programming

Mutable strings are the root
of all evil

Strings misuse
String is not a basic type

A mutable string is a dynamic array of characters

Almost anything you can do with a string is a
function of the characters in that string

Think about what will happen with long strings

Using std::string
Be careful with modifying operations such as
append()

Avoid creating a string out of many parts, better to
create at once

Look into when alternative string types are useful

Growing strings
std::string CopyString(
const char* to_copy, size_t length) {
std::string copied;
for (size_t i = 0; i < length; i += BLOCKSIZE)
copied.append(to_copy + i,
std::min(BLOCKSIZE, length - i));
return copied;
}

Growing strings
std::stringstream copied;
copied.write(to_copy + i,
return copied.str();
}

Growing strings
std::string copied;
copied.reserve(length);
copied.append(to_copy + i,
return copied;
}

Growing strings
Method Time spent, 3 run average (ms)
std::string::append() 1399
std::stringstream 5102
std::string::append() with
std::string::reserve()
851

Converting numbers to
strings and vice versa
Can be a major source of slowness

Often more features than needed

Investigate alternative libraries (boost::spirit)

Writing specialised functions a possibility (but with
its own maintainability issues)

Integer-to-string conversion
std::string Convert(int i) {
std::stringstream stream;
stream << i;
return stream.str();
}

return std::to_string(i);
}

namespace karma = boost::spirit::karma;
std::string converted;
std::back_insert_iterator<std::string>
sink(converted);
karma::generate(sink, karma::int_, i);
return converted;
}

std::stringstream 2959
std::to_string 1012
boost::spirit::karma 332

String-to-integer
conversion
int Convert(const std::string& str) {
return std::stoi(str);
}

String-to-integer
conversion
int Convert(const std::string& str) {
namespace qi = boost::spirit::qi;
int converted;
qi::parse(str.begin(), str.end(),
qi::int_, converted);
return converted;
}

String-to-integer
conversion
std::stoi 3920
boost::spirit::qi 1276

Function calls
Function calls have
overhead

Lookup in virtual function
table

Setting up stack, restoring
stack

Avoiding virtual functions or
virtual function calls
Only declare functions (this includes destructors)
virtual when it’s actually needed

Don’t use virtual functions for types that are
handled by value

If type is known, no lookup is needed

Sometimes compile-time polymorphism oﬀers an
alternative

Avoiding function calls
For small functions called in tight loops, inlining
helps

Allow the compiler to inline functions where it
makes sense (have deﬁnition available)

If the compiler doesn’t co-operate and you’re sure
it makes sense (measure this), force it

Tail calls
A tail call happens when a function is the ﬁnal call
made in another function

Tail calls can be eliminated, so that they end up
being a jump construction

Eliminates call overhead

Be aware of this and create tail calls where possible

Also allows eﬃcient recursive functions

Facilitating tail calls
unsigned djb_hash(const char* string) {
int c = *string;
if (!c)
return 5381;
return djb_hash(string + 1) * 33 + c;
}

unsigned djb_hash(
const char* string, unsigned seed) {
int c = *string;
if (!c)
return seed;
return djb_hash(
string + 1, seed * 33 + c);
}

Tail call elimination not possible 2274
Tail call elimination possible 1097

Use lambda functions
C++11 lambdas can always be trivially inlined,
unlike function pointers

Oﬀers an elegant and fast way of processing data

Combines well with aggregate functions

void twice(int& value) {
value *= 2;
}
std::vector<int> EverythingTwice(
const std::vector<int>& original) {
std::vector<int> result(original);
std::for_each(result.begin(), result.end(),
&twice);
return result;
}

std::vector<int> EverythingTwice2(
const std::vector<int>& original) {
std::vector<int> result(original);
std::for_each(result.begin(), result.end(),
[](int& value){ value *= 2; });
return result;
}

Function pointer (not inlined) 1684
Lambda function (inlined) 220

Return-value optimisation
Allows the compiler to avoid copy construction on
temporaries

Executed by compilers when function returns one
named variable

Be aware of where it could be possible, allow the
compiler to help you

But sometimes it’s more helpful to implement…

Move semantics
User deﬁnes for movable types how they can be
moved correctly

‘Guaranteed’ way of getting return value
optimisation

Helpful in combination with std::vector (to keep
data local)

Can come for free using “Rule of zero”

Move semantics
class Typical {
public:
Typical()
: content_("this is a typical string") {}
Typical(const Typical& other)
: content_(other.content_) {}
private:
std::string content_;
};

Move semantics
class Typical {
public:
TypicalMove ()
: content_("this is a typical string") {}
private:
std::string content_;
};

Move semantics
std::vector<Typical> CreateTypical() {
std::vector<Typical> new_content;
for (int i = 0; i < 1024; ++i)
new_content.push_back(Typical());
return new_content;
}

Move semantics
With copy constructor 2617
Following “Rule of zero” 1002

Data
Make sure that all data you
need in a loop is physically
as close together as possible

Allows CPU to use its cache
eﬃciently

Use contiguous memory
arrays where possible

Avoid data structures that
rely on pointers (eg. linked
lists)

Data
int sum() {
std::forward_list<int> data(1024, 5);
int result;
for (int i = 0; i < 1000000; ++i) {
result = std::accumulate(
data.begin(), data.end(), 0);
}
return result;
}

Data
int sum() {
std::vector<int> data(1024, 5);
int result;
for (int i = 0; i < 1000000; ++i) {
result = std::accumulate(
data.begin(), data.end(), 0);
}
return result;
}

Data
std::forward_list 1115
std::vector 61

Python
Emphasises readability

Dynamic type system, automatic memory
management

Several projects dedicated to improving
performance

Always try to avoid calling functions many times

Prefer literals over
“constructors”
def a():
return dict(firstkey=1, secondkey=2)

“constructors”
def a():
return dict(firstkey=1, secondkey=2)
def b():
return { 'firstkey': 1, 'secondkey': 2 }

“constructors”
Method Time spent, 3 run minimum (ms)
dict() 376
Dictionary literals 135

Prefer slice notation over
“copy constructor”
l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]
def a():
return list(l)

l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]
def a():
return list(l)
def b():
return l[:]

Copy via list() 2671
Slice notation 1679

All functions have overhead
Function call overhead in Python is substantial

All functions can be redeﬁned - even built-ins need
to be looked up ﬁrst

Try to avoid function calls (even more so than in 
C++)

Using literals or other built-in constructs can help
avoid function calls

String formatting
Python has a built-in function str() to convert
other types to string

In most cases this oﬀers enough features for
conversions of types to strings

Faster than formatting

String formatting
def a():
a = 5
b = 2
c = 3
return "%d" % (a*(b+c))

String formatting
def a():
a = 5
b = 2
c = 3
def b():
a = 5
b = 2
c = 3
return str(a*(b+c))

String formatting
def a():
a = 5
b = 2
c = 3
def c():
a = 5
b = 2
c = 3
return "%s" % (a*(b+c))

“%d” 514
str() 260
“%s” 233
String formatting

Prefer aggregate functions
def a():
s = 0;
for i in range(50000):
s += i
return s

def a():
s = 0;
s += i
return s
def b():
return sum(range(50000))

Summing manually 1728
Using sum() 587

Python has a number of built-in functions for
aggregates: all(), min(), max(), sum(), etc

Using them brings big speed advantages

Always preferred over manually iterating

Use list comprehensions
def a():
l = []
l.append(i)
return l

def a():
l = []
l.append(i)
return l
def b():
return [i for i in range(1000)]

Append to list 701
List comprehension 321

List comprehensions oﬀer a concise way of
creating lists

Speed as well as readability advantages

Can be nested as well!

Don’t use optimisations
from other languages
def a():
x = 1;
x = x + x
return x

def b():
x = 1;
x = x * 2
return x

def c():
x = 1;
x = x << 1
return x

x + x 736
x * 2 1001
x << 1 1342

LET’S TRY IT
PREPARE YOUR LAPTOPS!

PYTHON: WWW.CYBER-
DOJO.ORG  
E94905
C++:
git clone https://github.com/
avl7771/premature_optimization.git

Conclusions
Optimising is fun!

Knowledge about optimisations can help you help
your compiler or interpreter

Not all optimisations worsen maintainability

Micro-optimisations can diﬀer between languages,
compilers, architectures… Measuring works!

Test your assumptions

ACCU 2015 Premature Optimisation Workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to ACCU 2015 Premature Optimisation Workshop

Similar to ACCU 2015 Premature Optimisation Workshop (20)

Recently uploaded

Recently uploaded (20)

ACCU 2015 Premature Optimisation Workshop