Performance is a key component of usability and crucial for the user experience, especially in today's modern user interfaces where graphical elements are being animated and transitioned. Bringing Qt Everywhere means a significant need for speed across desktop and embedded platforms. This presentation will give you a brief overview of performance improvements done in Qt, and will be highly interactive with hands-on sessions on how to squeeze every last drop of performance out of your Qt application.
Presentation by Bjørn Erik Nilsen held during Qt Developer Days 2009.
http://qt.nokia.com/whatsnew
2. Introduction
• Bjørn Erik Nilsen
– Software Engineer / Qt Widget Team
– The architect behind Alien Widgets
– Rewrote the Backing Store for Qt 4.5
– One of the guys implementing
WidgetsOnGraphicsView
– Author of QMdiArea/QMdiSubWindow
– Author of QGraphicsEffect/QGraphicsEffectSource
2
3. Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
3
4. Why Performance Matters
• Attractive to users
• Looks more professional
• Help you get things done more efficiently
• Keeps the flow
4
6. Why Performance Matters
• Performance is more important than ever before
– Dynamic user interfaces
• Qt Everywhere
– Desktop
– Embedded platforms with limited hardware
• We cannot just buy better hardware anymore
• Clock speed vs. number of cores
6
7. Why Performance Matters
• Not all applications can take advantage of
multiple cores
• And some will actually run slower:
– Each core in the processor is slower
– Most applications not programmed to be multi-
threaded
• Multi-core crisis?
7
8. Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
8
9. Performance Improvements in Qt 4.6
• We continuously strive to optimize the performance
– QWidget painting performance, for example:
– Qt 4.6 no exception!
9
11. Performance Improvements in Qt 4.6
• Graphics View
– New update mechanism QtGui
– New painting algorithm
– New scene indexing
– Reduced QTransform/QVariant/floating point
overhead
• QPixmapCache
– Extended with an int based API
11
12. Performance Improvements in Qt 4.6
• Item Views
– Item selection QtGui
– Drag 'n' drop
– QTableView and QHeaderView
• QTransform
– fromTranslate/fromScale
– mapRect for projective transforms
• QRegion
– No longer a GDI object on Windows
12
13. Performance Improvements in Qt 4.6
• QObject
– Destruction QtCore
– Connect and disconnect
– Signal emission
• QVariant
– Construction from float and pointers
• QIODevice
– Less (re)allocations in readAll()
13
14. Performance Improvements in Qt 4.6
• QNetworkAccessManager
– HTTP back-end QtNetwork
• QHttpNetworkConnectionChannel
– Pipelining HTTP requests (off by default)
• QHttpNetworkConnection
– Increased the number of concurrent connections
• QLocalSocket
– New Windows implementation
– Major performance improvements
14
15. Performance Improvements in Qt 4.6
QtScript
• QtScript now uses JavaScriptCore as the back-end!
– Still the same API, but with JSC performance
15
16. Performance Improvements in Qt 4.6
• New OpenGL 2.x paint engine
• General improvements QtOpenGL
– Clipping
– Text drawing
16
17. Performance Improvements in Qt 4.6
• New OpenVG paint engine New module!
– Uses Khronos EGL API QtOpenVG
– Configure Qt with “-openvg”
• Support for hardware-accelerated 2D vector graphics on:
– Embedded, mobile and consumer electronic devices
– Desktop
• More info: http://labs.trolltech.com/blogs
17
18. Performance Improvements in Qt 4.6
Embedded
• Improved support for DirectFB
– Enabling hardware graphics acceleration on
embedded platforms
• Maemo Harmattan optimizations
18
19. Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
19
20. How You Can Improve Performance
• Theory of Constraints (TOC) by Eliyahu M. Goldratt
• The theory is based on the idea that in any complex
system, there is usually one aspect of that system that
limits its ability to achieve its goal or optimal
functioning. To achieve any significant improvement of
the system, the constraint must be identified and
resolved.
• Applications will perform as fast as their bottlenecks
20
21. Theory of Constraints
• Define a goal:
– For example: This application must run at 30 FPS
• Then:
1) Identify the constraint
2) Decide how to exploit the constraint
3) Improve
4) If goal not reached, go back to 1)
5) Done
21
22. Identifying hot spots (1)
• The number one and most important task
• Make sure you have plausible data
• Don't randomly start looking for slow code paths!
– An O(n2) algorithm isn't necessarily bad
– Don't spend time on making it O(n log n) just for fun
• Don't spend time on optimizing bubble sort
22
23. Identifying hot spots (1)
• “Bottlenecks occur in
surprising places, so
don't try second guess
and put in a speed hack
until you have proven
that is where the
bottleneck is” -- Rob Pike
23
24. Identifying hot spots (1)
• The right approach for identifying hot spots:
– Any profiler suitable for your platform
• Shark (Mac OSX)
• Valgrind (X11)
• Visual Studio Profiler (Windows)
• Embedded Trace Macrocell (ETM) (ARM devices)
• NB! Always profile in release mode
24
25. Identifying hot spots (1)
• Run application: “valgrind --tool=callgrind ./application”
• This will collect data and information about the program
• Data saved to file: callgrind.out.<pid>
• Beware:
– I/O costs won't show up
– Cache misses (--simulate-cache=yes)
• The next step is to analyze the data/profile
• Example
25
26. Identifying hot spots (1)
• Profiling a section of code (run with “–instr-atstart=no”):
#include<BbrValgrind/callgrind.h>
int myFunction() const
{
CALLGRIND_START_INSTRUMENTATION;
int number = 10;
...
CALLGRIND_STOP_INSTRUMENTATION;
CALLGRIND_DUMP_STATS;
return number;
}
26
27. Identifying hot spots (1)
• When a hot-spot is identified:
– Look at the code and ask yourself: Is this the right
algorithm for this task?
• Once the best algorithm is selected, you can exploit the
constraint
27
28. How to exploit the constraint (2)
• Optimize
– Design level
– Source code level
– Compile level
• Optimization trade-offs:
– Memory consumption, cache misses
– Code clarity and conciseness
28
29. How to exploit the constraint (2)
• “Any intelligent fool can
make things bigger,
more complex, and more
violent. It takes a touch
of genius – and a lot of
courage – to move in the
opposite direction.”
--Einstein
29
30. How to exploit the constraint (2)
• Wouldn't it be great to have a cross-platform tool to
measure performance?
30
31. QTestLib
• Say hello to QBENCHMARK
• Extension to the QTestLib framework
• Cross-platform
• Straight forward: QBENCHMARK { <code here> }
• Code will then be measured based on
– Walltime (default)
– CPU tick counter (-tickcounter)
– Valgrind/Callgrind (-callgrind)
– Event counter (-eventcounter)
31
32. QTestLib
• Let's create a benchmark
• Run with ./mytest -xml -o results.xml
• git clone git://gitorious.org/qt-labs/qtestlib-tools.git
• Visualize with
– Graph (generatereport results.xml)
– BMCompare (bmcompare results1.xml results2.xml)
• Now that we have tool, it is easier to measure and
decide which algorithm to use
32
33. How to exploit the constraint (2)
• General tricks:
– Caching
– Delay a computation until the result is required
– Reduce computation in tight loops
– Compiler optimizations
• Optimization Techniques for Qt:
– Choose the right container
– Use implicit data sharing efficiently
– Discover the magic flags
33
34. Implicit data sharing in Qt
• Maximize resource usage and minimize copying
Object obj0; // Creates ObjectData
// Copies (share the same data)
Object obj1, obj2, obj3 = obj0;
Object 1
Object 0 ObjectData Object 2
Object 3
Shallow copies
34
35. Implicit data sharing in Qt
• Data is only copied if someone modifies it:
Deep copy
ObjectData Object 1
Object 0 ObjectData Object 2
Object 3
Shallow copies
35
36. Implicit data sharing in Qt
• How to avoid deep-copy:
– Only use const operators and functions if possible
– Be careful with the foreach keyword
• For classes that are not implicitly shared:
– Always pass them around as const references
– Passing const references is a good habit in any case
• Examples
36
37. Implicit data sharing in Qt
Original Optimized
T *readOnly = list[index]; T *readOnly = list.at(index);
QList<T>::iterator i; QList<T>::const_iterator i;
i = list.begin(); i = list.constBegin();
foreach (QString s, strings) foreach (const QString &s, strings)
NB! QTransform is not implicitly shared!
void foo(QTransform t); void foo(const QTransform &t);
37
38. Implicit data sharing in Qt
• See the “Implicitly Shared Classes” documentation for a
complete list of implicitly shared classes in Qt
• http://doc.trolltech.com/4.6-snapshot/shared.html
• Note: All Qt containers are implicitly shared
38
43. QVector<T>
• Items are stored contiguously in memory
• One block of memory is allocated:
QBasicAtomicInt int int uint T T
ref alloc size flags array[0] ... array[alloc - 1]
QVectorTypedData<T>
d
QVector<T>
43
44. QVector<T>
• Reserves space at the end
• Growth strategy depends on the type T
– Movable types: realloc by increments of 4096 bytes
– Non-movable types: 50% increments
• What is a movable type?
– Primitive types: bool, int, char, enums, pointers, …
– Plain Old Data (POD) with no constructor/destructor
– Basically everything that can be moved around in
memory using memcpy() or memmove()
– Good article: http://www.ddj.com/cpp/184401508
44
45. Movable types
• User-defined classes are treated as non-movable by
default
• Oh no!
• Have no fear, Q_DECLARE_TYPEINFO is here
• You can tell Qt that your class is a:
– Q_PRIMITIVE_TYPE: POD with no constr./destr.
– Q_MOVABLE_TYPE: has constr./destr., but can be
moved in memory using memcpy()/memmove()
45
49. QList<T>
• Two representations
• Array of pointers to items on the heap (general case)
QBasicAtomicInt int int int uint void * void *
ref alloc begin end flags array[0] ... array[alloc - 1]
QListData::Data
T T
d
QList<T>
49
50. QList<T>
• Special case: T is movable and sizeof(T) <= sizeof(void *)
• Items are stored directly (same as QVector)
QBasicAtomicInt int int int uint T T
ref alloc begin end flags array[0] ... array[alloc - 1]
QListData::Data
d
QList<T>
50
51. QList<T>
• Reserves space at the beginning and at the end
• Benefits of reserving space at the beginning
– Prepending an item usually takes constant time
– Removing the first item usually takes constant time
– Faster insertion
51
52. QVector<T> vs. QList<T>
• QList expands to less code in the executable
• For most purposes, QList is the right class to use
• If all you do is append(), use QVector
– Use reserve() if you know the size in advance
– Also consider QVarLengthArray or plain C array
• When T is movable and sizeof(T) <= sizeof(void *)
– Almost no difference, except that QList provides faster
insertions/removals in the first half of the list
• (Constant time insertions in the middle: Use QLinkedList)
52
53. General Qt Container Advices
• Avoid deep copies, e.g:
– Use at() rather than operator[]
– constData()/constBegin()/constEnd()
– Basically: limit usage of non-const functions
• When you know the size in advance:
– Use reserve()
• Let Qt know whether your class is movable or not
– Q_DECLARE_TYPEINFO
• Choose the right container for the right circumstance
53
54. General Painting Optimizations
• Prefer QPixmap over QImage (if possible)
– QPixmap is accelerated
– QPixmap caches information about the pixels
• Avoid QPixmap/QImage::setAlphaChannel()
– Use QPainter::setCompositionMode instead
• Avoid QPixmap/QImage::transformed()
– Use QPainter::setWorldTransform instead
• If you for sure know the image has alpha:
– Qt::NoOpaqueDetection (QPixmap::fromImage)
54
55. General Painting Optimizations
Original
int width = image.width();
int height = image.height();
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
QRgb pixel = image.pixel(x, y);
…
}
}
NB! Image is 32 bit
55
56. General Painting Optimizations
Optimized
int width = image.width();
int height = image.height();
for (int y = 0; y < height; ++y) {
QRgb *line = reinterpret_cast<QRgb *>(image.scanLine(y));
for (int x = 0; x < width; ++x) {
QRgb pixel = line[x];
…
}
}
56
57. General Painting Optimizations
Even more optimized
int numPixels = image.width() * image.height();
QRgb *pixels = reinterpret_cast<QRgb *>(image.bits());
for (int i = 0; i < numPixels; ++i) {
QRgb pixel = pixels[i];
…
}
}
57
61. Other Optimizations
Original Optimized
qFuzzyCompare(opacity+1, 1)); qFuzzyIsNull(opacity));
int nRects = qregion.rects().size(); int nRects = qregion.numRects();
if (expensive() && cheap()) if (cheap() && expensive())
#button1 { background:red } #button1,
#button2 { background:red } #button2 { background:red }
*[readOnly=”1”] { color:blue } /* Only QLineEdit can possibly be
read-only in my application
*/
QLineEdit[readOnly = “1”]
{ color:blue }
61
62. Graphics View Optimizations
• Viewport update modes
• Scene index
– BSP tree index
– No index
• Avoid QGraphicsScene::changed signal
• QGraphicsScene::setSceneRect
• Cache modes
– Device coordinates
– Item coordinates
• OpenGL viewport
62
63. Platform Specific Optimizations
• Link time optimization LTCG (Windows only)
– Approx. 10%-15% speedup
– Configure Qt with “-ltcg”
• Don't use explicit double arithmetic
– qreal is float on embedded (QWS)
– 100 / 2.54 → 100 / qreal(2.54)
• It's time time to take advantage of what we have
learned
• Let's do some real optimizations!
63
64.
65. Theory of Constraints
• Define a goal:
– For example: This application must run at 30 FPS
• Then:
1) Identify the constraint
2) Decide how to exploit the constraint
3) Improve
4) If goal not reached, go back to 1)
5) Done
65
66. Agenda
• Why Performance Matters
• Performance Improvements in Qt 4.6
• How You Can Improve Performance
66