Optimizing Performance in Qt-Based Applications

Optimizing Performance in
Qt Applications
11/16/09

Introduction

• Bjørn Erik Nilsen
– Software Engineer / Qt Widget Team

– The architect behind Alien Widgets

– Rewrote the Backing Store for Qt 4.5

– One of the guys implementing
WidgetsOnGraphicsView

– Author of QMdiArea/QMdiSubWindow

– Author of QGraphicsEffect/QGraphicsEffectSource

2

Agenda

• Why Performance Matters

• Performance Improvements in Qt 4.6

• How You Can Improve Performance

3

Why Performance Matters

• Attractive to users

• Looks more professional

• Help you get things done more efficiently

• Keeps the flow

4

• An example explains more than a thousand words

5


• Performance is more important than ever before
– Dynamic user interfaces

• Qt Everywhere
– Desktop
– Embedded platforms with limited hardware

• We cannot just buy better hardware anymore

• Clock speed vs. number of cores

6


• Not all applications can take advantage of
multiple cores

• And some will actually run slower:
– Each core in the processor is slower

– Most applications not programmed to be multi-
threaded

• Multi-core crisis?

7

Agenda




8

Performance Improvements in Qt 4.6
• We continuously strive to optimize the performance

– QWidget painting performance, for example:

– Qt 4.6 no exception!
9


QtOpenGL QtGui

QtOpenVG
QtSvg
QtScript

QtCore QtWebKit

QtNetwork

10

• Graphics View

– New update mechanism QtGui

– New painting algorithm

– New scene indexing

– Reduced QTransform/QVariant/floating point
overhead

• QPixmapCache

– Extended with an int based API

11

• Item Views
– Item selection QtGui
– Drag 'n' drop
– QTableView and QHeaderView

• QTransform
– fromTranslate/fromScale
– mapRect for projective transforms

• QRegion
– No longer a GDI object on Windows

12

• QObject
– Destruction QtCore
– Connect and disconnect
– Signal emission

• QVariant
– Construction from float and pointers

• QIODevice
– Less (re)allocations in readAll()

13

• QNetworkAccessManager
– HTTP back-end QtNetwork

• QHttpNetworkConnectionChannel
– Pipelining HTTP requests (off by default)

• QHttpNetworkConnection
– Increased the number of concurrent connections

• QLocalSocket
– New Windows implementation
– Major performance improvements

14


QtScript

• QtScript now uses JavaScriptCore as the back-end!
– Still the same API, but with JSC performance

15

• New OpenGL 2.x paint engine

• General improvements QtOpenGL

– Clipping
– Text drawing

16

• New OpenVG paint engine New module!

– Uses Khronos EGL API QtOpenVG
– Configure Qt with “-openvg”

• Support for hardware-accelerated 2D vector graphics on:
– Embedded, mobile and consumer electronic devices
– Desktop

• More info: http://labs.trolltech.com/blogs

17


Embedded

• Improved support for DirectFB
– Enabling hardware graphics acceleration on
embedded platforms

• Maemo Harmattan optimizations

18

Agenda




19

How You Can Improve Performance
• Theory of Constraints (TOC) by Eliyahu M. Goldratt
• The theory is based on the idea that in any complex
system, there is usually one aspect of that system that
limits its ability to achieve its goal or optimal
functioning. To achieve any significant improvement of
the system, the constraint must be identified and
resolved.

• Applications will perform as fast as their bottlenecks

20

Theory of Constraints
• Define a goal:
– For example: This application must run at 30 FPS

• Then:
1) Identify the constraint
2) Decide how to exploit the constraint
3) Improve
4) If goal not reached, go back to 1)
5) Done

21

Identifying hot spots (1)
• The number one and most important task

• Make sure you have plausible data

• Don't randomly start looking for slow code paths!
– An O(n2) algorithm isn't necessarily bad
– Don't spend time on making it O(n log n) just for fun

• Don't spend time on optimizing bubble sort

22

• “Bottlenecks occur in
surprising places, so
don't try second guess
and put in a speed hack
until you have proven
that is where the
bottleneck is” -- Rob Pike

23

• The right approach for identifying hot spots:

– Any profiler suitable for your platform
• Shark (Mac OSX)
• Valgrind (X11)
• Visual Studio Profiler (Windows)
• Embedded Trace Macrocell (ETM) (ARM devices)
• NB! Always profile in release mode

24

• Run application: “valgrind --tool=callgrind ./application”

• This will collect data and information about the program

• Data saved to file: callgrind.out.<pid>

• Beware:
– I/O costs won't show up
– Cache misses (--simulate-cache=yes)
• The next step is to analyze the data/profile
• Example

25

• Profiling a section of code (run with “–instr-atstart=no”):

#include<BbrValgrind/callgrind.h>

int myFunction() const
{
CALLGRIND_START_INSTRUMENTATION;

int number = 10;
...

CALLGRIND_STOP_INSTRUMENTATION;
CALLGRIND_DUMP_STATS;

return number;
}

26

• When a hot-spot is identified:
– Look at the code and ask yourself: Is this the right
algorithm for this task?

• Once the best algorithm is selected, you can exploit the
constraint

27

How to exploit the constraint (2)
• Optimize
– Design level
– Source code level
– Compile level

• Optimization trade-offs:
– Memory consumption, cache misses
– Code clarity and conciseness

28

• “Any intelligent fool can
make things bigger,
more complex, and more
violent. It takes a touch
of genius – and a lot of
courage – to move in the
opposite direction.”
--Einstein

29

• Wouldn't it be great to have a cross-platform tool to
measure performance?

30

QTestLib
• Say hello to QBENCHMARK

• Extension to the QTestLib framework

• Cross-platform

• Straight forward: QBENCHMARK { <code here> }

• Code will then be measured based on
– Walltime (default)
– CPU tick counter (-tickcounter)
– Valgrind/Callgrind (-callgrind)
– Event counter (-eventcounter)

31

QTestLib
• Let's create a benchmark

• Run with ./mytest -xml -o results.xml

• git clone git://gitorious.org/qt-labs/qtestlib-tools.git

• Visualize with
– Graph (generatereport results.xml)
– BMCompare (bmcompare results1.xml results2.xml)

• Now that we have tool, it is easier to measure and
decide which algorithm to use

32

• General tricks:
– Caching
– Delay a computation until the result is required
– Reduce computation in tight loops
– Compiler optimizations

• Optimization Techniques for Qt:
– Choose the right container
– Use implicit data sharing efficiently
– Discover the magic flags

33

Implicit data sharing in Qt
• Maximize resource usage and minimize copying

Object obj0; // Creates ObjectData

// Copies (share the same data)
Object obj1, obj2, obj3 = obj0;
Object 1

Object 0 ObjectData Object 2

Object 3

Shallow copies

34

• Data is only copied if someone modifies it:

Deep copy

ObjectData Object 1

Object 0 ObjectData Object 2

Object 3

Shallow copies

35

• How to avoid deep-copy:
– Only use const operators and functions if possible
– Be careful with the foreach keyword
• For classes that are not implicitly shared:
– Always pass them around as const references
– Passing const references is a good habit in any case
• Examples

36

Original Optimized

T *readOnly = list[index]; T *readOnly = list.at(index);

QList<T>::iterator i; QList<T>::const_iterator i;
i = list.begin(); i = list.constBegin();

foreach (QString s, strings) foreach (const QString &s, strings)

NB! QTransform is not implicitly shared!

void foo(QTransform t); void foo(const QTransform &t);

37

• See the “Implicitly Shared Classes” documentation for a
complete list of implicitly shared classes in Qt

• http://doc.trolltech.com/4.6-snapshot/shared.html

• Note: All Qt containers are implicitly shared

38

Qt Containers

QLinkedList

QList
QVector

QSet

QStack
QQueue

QMultiHash

QMap QHash
QMultiMap

39

Qt Containers

QHash QMap

QMultiHash QMultiMap
Associative Containers

40

Qt Containers
QSet

QList QVector

QLinkedList

QQueue QStack
Sequential Containers

41

Qt Containers

vs
QList QVector

QLinkedList

QQueue QStack
Sequential Containers

42

QVector<T>
• Items are stored contiguously in memory

• One block of memory is allocated:

QBasicAtomicInt int int uint T T

ref alloc size flags array[0] ... array[alloc - 1]
QVectorTypedData<T>

d
QVector<T>

43

QVector<T>
• Reserves space at the end

• Growth strategy depends on the type T
– Movable types: realloc by increments of 4096 bytes
– Non-movable types: 50% increments
• What is a movable type?
– Primitive types: bool, int, char, enums, pointers, …
– Plain Old Data (POD) with no constructor/destructor
– Basically everything that can be moved around in
memory using memcpy() or memmove()
– Good article: http://www.ddj.com/cpp/184401508

44

Movable types
• User-defined classes are treated as non-movable by
default

• Oh no!

• Have no fear, Q_DECLARE_TYPEINFO is here

• You can tell Qt that your class is a:
– Q_PRIMITIVE_TYPE: POD with no constr./destr.
– Q_MOVABLE_TYPE: has constr./destr., but can be
moved in memory using memcpy()/memmove()

45

Movable types (Q_PRIMITIVE_TYPE)

struct Point2D
{
int x;
int y;
};

Q_DECLARE_TYPEINFO(Point2D, Q_PRIMITIVE_TYPE);

46

Movable types (Q_MOVABLE_TYPE)

class Point2D
{
public:
Point2D() { data = new int[2]; }
Point2D(const Point2D &other) { … }
~Point2D() { delete [] data; }

Point2D &operator=(const Point2D &other) { … }

int x() const { return data[0]; }
int y() const { return data[1]; }

private:
int *data;
};

Q_DECLARE_TYPEINFO(Point2D, Q_MOVABLE_TYPE);

47

QVector<T>
• Insertion in the middle:
– Movable type: memmove()
– Non-movable type: operator=()

1

0 1 2 3 4 5 6

0 1 2 3 4 5 6 7

48

QList<T>
• Two representations

• Array of pointers to items on the heap (general case)

QBasicAtomicInt int int int uint void * void *

ref alloc begin end flags array[0] ... array[alloc - 1]
QListData::Data

T T

d
QList<T>

49

QList<T>
• Special case: T is movable and sizeof(T) <= sizeof(void *)

• Items are stored directly (same as QVector)

QBasicAtomicInt int int int uint T T

ref alloc begin end flags array[0] ... array[alloc - 1]
QListData::Data

d
QList<T>

50

QList<T>
• Reserves space at the beginning and at the end

• Benefits of reserving space at the beginning
– Prepending an item usually takes constant time
– Removing the first item usually takes constant time
– Faster insertion

51

QVector<T> vs. QList<T>
• QList expands to less code in the executable

• For most purposes, QList is the right class to use

• If all you do is append(), use QVector
– Use reserve() if you know the size in advance
– Also consider QVarLengthArray or plain C array

• When T is movable and sizeof(T) <= sizeof(void *)
– Almost no difference, except that QList provides faster
insertions/removals in the first half of the list

• (Constant time insertions in the middle: Use QLinkedList)

52

General Qt Container Advices
• Avoid deep copies, e.g:
– Use at() rather than operator[]
– constData()/constBegin()/constEnd()
– Basically: limit usage of non-const functions

• When you know the size in advance:
– Use reserve()

• Let Qt know whether your class is movable or not
– Q_DECLARE_TYPEINFO

• Choose the right container for the right circumstance

53

General Painting Optimizations
• Prefer QPixmap over QImage (if possible)
– QPixmap is accelerated
– QPixmap caches information about the pixels

• Avoid QPixmap/QImage::setAlphaChannel()
– Use QPainter::setCompositionMode instead

• Avoid QPixmap/QImage::transformed()
– Use QPainter::setWorldTransform instead
• If you for sure know the image has alpha:
– Qt::NoOpaqueDetection (QPixmap::fromImage)

54

Original

int width = image.width();
int height = image.height();

for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
QRgb pixel = image.pixel(x, y);
…
}
}

NB! Image is 32 bit

55

Optimized

int width = image.width();
int height = image.height();

for (int y = 0; y < height; ++y) {
QRgb *line = reinterpret_cast<QRgb *>(image.scanLine(y));
for (int x = 0; x < width; ++x) {
QRgb pixel = line[x];
…
}
}

56

Even more optimized

int numPixels = image.width() * image.height();
QRgb *pixels = reinterpret_cast<QRgb *>(image.bits());

for (int i = 0; i < numPixels; ++i) {
QRgb pixel = pixels[i];
…
}
}

57

Original Optimized

MyWidget::paintEvent(...) MyWidget::paintEvent(...)
{ {
QPainter painter(this); QPainter painter(this);
painter.fillRect(rect(), Qt::red); painter.fillRect(rect(), Qt::red);
} }

int main(int argc, char **argv) int main(int argc, char **argv)
{ {
... ...
MyWidget widget; MyWidget widget;
... widget.setAttribute(
} Qt::WA_OpaquePaintEvent);
...
}

58

Original Optimized

painter.drawLine(line1); QLine lines[3];
painter.drawLine(line2); ...
painter.drawLine(line3); painter.drawLines(lines, 3);

painter.drawPoint(point1); QPoint points[3];
painter.drawPoint(point2); ...
painter.drawPoint(point3); painter.drawPoints(points, 3);

QString key(“abcd”); QPixmapCache::Key key;
QPixmapCache::insert(key, pm); key = QPixmapCache::insert(pm);
QPixmapCache::find(key, pm); pm = QPixmapCache::find(key);

59

Other Optimizations
Original Optimized

const QString s = s1 + s2 + s3; #include <QStringBuilder>
...
#define QT_USE_FAST_CONCATENATION const QString s = s1 % s2 % s3;
#define QT_USE_FAST_OPERATOR_PLUS

QTransform xform = a.inverted(); QTransform xform = b;
xform *= b.inverted(); xform *= a;
xform = xform.inverted();

foreach (const QString &s, slist) { foreach (const QString &s, slist) {
if (s.size() < 5) if (s.size() < 5)
continue; continue;
const QString m = s.mid(2, 3); QStringRef m(&s, 2, 3);
if (m == magicString) if (m == magicString)
doMagicStuff(); doMgicStuff();
} }
60

Other Optimizations
Original Optimized

qFuzzyCompare(opacity+1, 1)); qFuzzyIsNull(opacity));

int nRects = qregion.rects().size(); int nRects = qregion.numRects();

if (expensive() && cheap()) if (cheap() && expensive())

#button1 { background:red } #button1,
#button2 { background:red } #button2 { background:red }

*[readOnly=”1”] { color:blue } /* Only QLineEdit can possibly be
read-only in my application
*/
QLineEdit[readOnly = “1”]
{ color:blue }

61

Graphics View Optimizations
• Viewport update modes

• Scene index
– BSP tree index
– No index

• Avoid QGraphicsScene::changed signal

• QGraphicsScene::setSceneRect

• Cache modes
– Device coordinates
– Item coordinates
• OpenGL viewport
62

Platform Specific Optimizations
• Link time optimization LTCG (Windows only)
– Approx. 10%-15% speedup
– Configure Qt with “-ltcg”

• Don't use explicit double arithmetic
– qreal is float on embedded (QWS)
– 100 / 2.54 → 100 / qreal(2.54)

• It's time time to take advantage of what we have
learned

• Let's do some real optimizations!

63

Theory of Constraints
• Define a goal:
– For example: This application must run at 30 FPS

• Then:
1) Identify the constraint
2) Decide how to exploit the constraint
3) Improve
4) If goal not reached, go back to 1)
5) Done

65

Agenda




66

Optimizing Performance in Qt-Based Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Optimizing Performance in Qt-Based Applications

Similar to Optimizing Performance in Qt-Based Applications (20)

More from account inactive

More from account inactive (20)

Recently uploaded

Recently uploaded (20)

Optimizing Performance in Qt-Based Applications