Designing and coding Series 40 Java apps for high performance


Published on

In this presentation, you’ll learn preferred architecture patterns and practical steps for building your apps to be fast and responsive. Michael Samarin, Paul Houghton, Timo Saarinen of Futurice, show the results of various Series 40 code fragments and reveal where to spend your time in making improvements. They show you the differences that best-practice micro-optimisations make, where these optimisations are useful, and where you should let the tools and processor optimise for you.

Published in: Technology, Art & Photos
  • @alanls You need to be registered on Adobe Connect, simplest way is to sign up for one of our future webinars, such as this one on LWUIT
    Are you sure you want to  Yes  No
    Your message goes here
  • i cant get access to :
    Are you sure you want to  Yes  No
    Your message goes here
  • I cant get access to this. What do I have to be a member of to access it?
    Are you sure you want to  Yes  No
    Your message goes here
  • There's still time to sign up for session 2 of this webinar: Coding Series 40 Java apps for performance (26Sep 3pm UTC):
    Are you sure you want to  Yes  No
    Your message goes here
  • If you missed it, the recording for session 1 is now live:
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Designing and coding Series 40 Java apps for high performance

  1. 1. Series 40 Developer Training Michael Samarin Paul Houghton Timo SaarinenDesigning and coding Series 40Java apps for high performance Futurice Ltd
  2. 2. Today’s Topics» Performance Basics on Series 40» Mobile Front End Architecture Patterns» Choosing GUI, Caching, Threading» Low Level “Micro” Performance Optimization
  3. 3. Miller, R. B. (1968)Response time in man-computer conversationaltransactions.Proc. AFIPS Fall Joint Computer ConferenceVol. 33, 267-277» 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
  4. 4. » 1.0 second is about the limit for the users flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.» 10 seconds is about the limit for keeping the users attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
  5. 5. Choosing GUI Strategy» LCDUI Forms.» Canvas» GameCanvas» LWUIT
  6. 6. LCDUI Forms» Fast, simple and standard way of making UI.» On full touch Asha very attractive looking and have huge UX improvements.» Not as fast as Canvas. Animation on a Form is much slower than on a Canvas, and there is no way to influence the vertical scroll position, animate transitions between screen, or shift to a full screen view. You can, however, slightly increase the performance when changing screens by using just one Form and re- populating it with new Items.»
  7. 7. Canvas» Highly customizable way of making UI.» You have to take care of render timing yourself, or you can use Nokia’s FrameAnimator class to quickly create effects such as kinetic scrolling.» Any part of your code can call Canvas.repaint() to signal that painting should occur soon.» The most important performance tip for navigating through a Canvas-based UI is to implement your own View class to represent each screen, and paint all Views on one Canvas rather than switching from one Canvas to another, which can be slow and does not give you the possibility of animating the transition for smooth effect.
  8. 8. GameCanvas» GameCanvas is double buffered with more control over the painting cycle and threading.» Unlike Canvas, you should create your own Thread, which calls GameCanvas.paint() directly to fill the graphics buffer, and then GameCanvas.flushGraphics() to instantly blit the graphics buffer onto the screen.
  9. 9. LWUIT» LWUIT (Lightweight User Interface Toolkit) is a toolkit for creating SWING-like applications without some of the complexity of SWING.» Like Form, it offers basic components, but it adds to this better layouts, styles and theming, bundling own fonts into your application, and animated screen transitions.» LWUIT is implemented on top of a Canvas, but it is a large and complex library written to be a general purpose replacement for the default UI on many different phones.» LWUIT and the associated themes and any fonts you include quickly make your JAR file grow quite large.»
  10. 10. Heap Memory» On Series 40 only from 2 to 4 MB.» Instances of classes (objects) and primitive types are created in the heap.» Total number of methods in classes loaded by JVM has a direct impact on how much heap space is left for other data. These memory allocations are permanent for the runtime of the application and are not dynamically unloaded by the JVM once a class is no longer in use.
  11. 11. Recursive Algorithms and Stack Memory» Variables passed as arguments to a method are passed on the current thread’s stack. Method variables of primitive types are also allocated on the stack.» Recursive algorithms are algorithms where a method calls itself in a loop to complete a task. As a result, they create multiple stack frames. › They use a lot of stack memory. The same method is called repeatedly, and only as the application completes does it unwind the queued stack frames. This extra stack memory is often not useful, and stack memory per thread is limited and such heavy stack use may well cause an OutOfMemoryException well before you are actually out of heap memory. › Recursive algorithms can be slow. Each method call includes a certain amount of overhead, which is not really necessary since a recursive algorithm can be unwound into a non-recursive equivalent loop that does not include the relatively heavy method call.
  12. 12. › Provides basic “free” optimization › Fixes code redundancy and pre-calculate thingsCompile Time whenever possibleOptimization › Minimizes memory usage and › Should be last step in Obfuscation building apps – takes time and makes debugging difficult › Doesn’t fix wrong architecture
  13. 13. Compile Time Optimization
  14. 14. Proguard Bytecode Obfuscator
  15. 15. Obfuscation Example: Battle Tank › JAR File size decreased by 4% (889 -> 852 kB) › RAM usage decreased by 14% (161 -> 138 kB)
  16. 16. Architecture changes» Carefully consider architecture of your drawing loop and input loops and decouple them whenever possible.» Example: panorama drawing and sensor driving loop.» Original example:»» After optimization:»
  17. 17. WeakReference object Caching» Best pattern for using all available heap memory, but never running into the dreaded OutOfMemoryError.» CLDC 1.1 WeakReference» When an object is referenced by a WeakReference, and not using traditional Object pointers, this is a signal to the garbage collector that is has permission to collect the object if memory is running low.» You have to maintain own HashTable of Objects» To understand this pattern better look at Tantalum 3:
  18. 18. public class WeakHashCache { protected final Hashtable hash = new Hashtable(); public Object get(final Object key) { final WeakReference reference = (WeakReference) hash.get(key); if (reference != null) { return reference.get(); } return null; } public void put(final Object key, final Object value) { synchronized (hash) { if (key == null) { return; } if (value == null) { hash.remove(key); return; } hash.put(key, new WeakReference(value)); } } public void remove(final Object key) { if (key != null) { hash.remove(key); } } public boolean containsKey(final Object key) { if (key != null) { return hash.containsKey(key); } return false; } public int size() { return hash.size(); } public void clear() { hash.clear(); }}
  19. 19. Render Caching» One of the common performance needs is to make your application paint, in particular scroll, smoothly and quickly.» You can paint items each into their own Image, keeping that pre-painted Image in a cache, and reusing it as the object moves around the screen. Essentially, WeakReference cach of pre-painted Images.» Can achieve dramatic FPS increase, like in this example from 3 to 12 on Asha 305:»» To understand this pattern better look at Tantalum 3:
  20. 20. File System (Flash Memory) Caching» Flash memory is slow, but faster then Web.» Cache downloaded data from previous session. Improve startup time of app, by loading from disk cache instead of new Web requests.» RMS and File System (JSR-75) same speed, but with RMS no security prompts.» Can achieve dramatic startup time decrease, like in this example from 10 to 2 seconds on Asha 305:»
  21. 21. File System (Flash Memory) Caching» Underwater stones: still remember, Flash memory is slow.» Architect your application to use asynchronous loading /saving of data from / to disk cache.» In Battle Tank example, it was possible to save 28ms in each rendered frame, by removing synchronous references to flash memory in loop.» To understand this pattern better look at Tantalum 3:
  22. 22. Hash Acceleration» Some iterative algorithms are slow. Proper usage of collections types of data structures can increase performance.» Vector.contains() is very slow, but Hashtable.containsKey() is very fast. Reconsider your algorithms to use Hashtables.» Usage can be found in very surprising places. For example, Font.stringWidth() is slow, but necessary for drawing multiline text on Canvas. Creating a Hashtable with the width in each character you have used in the Font can transform this into a fast operation and increase Canvas.paint() speed.
  23. 23. Synchronized vs. Volatile Variables» When a variable or Object needs to be accessed from more than one Thread.» Marking a variable as volatile is the least restrictive approach and can have very high performance because no Thread is blocked.» Only one Thread may enter the synchronized sections at any one time.» Consider atomic operations on two variables. For example, when updating firstName and lastName from “John Smith” to “Jane Marceau”, do so within a synchronized block to avoid briefly exposing the transitional state “Jane Smith” to other threads.
  24. 24. Constants» We can give the compiler and Proguard more opportunities to optimize the code at the compile step, and this will also give the ARM processor opportunities for handling these variables with more efficient byte codes. private static int loopCount = 10; private static long startTime = System.currentTimeMillis(); private static boolean enableImages = true; Should be private static final int LOOP_COUNT = 10; private static final long START_TIME = System.currentTimeMillis(); private static final boolean ENABLE_IMAGES = true;
  25. 25. Primitives» Use int instead of short, byte or long. for (int i = 0; i < 3000000; i++) { short/int/long a = 123; short/int/long b = -44; short/int/long c = 12; a += c; b += a; c *= b; } Average times spent in loops on Nokia Asha 305 (obfuscated): int: 710 (580) ms short: 900 (850) ms 50% slower long: 1450 (1150) ms 100% slower
  26. 26. Final in methodsfor (int i = 0; i < 1000000; i++) { a = finalMethod(1, 2, 3);}for (int i = 0; i < 1000000; i++) { a = nonFinalMethod(1, 2, 3);}public final int finalMethod(final int a, final int b, final int c) { final float x = 1.23f, y = 0.05f; final float z = x * y; final int d = a + b + c; return d;}public int nonFinalMethod(int a, int b, int c) { float x = 1.23f, y = 0.05f; float z = x * y; int d = a + b + c; return d;}
  27. 27. Final in methodsAverage times on a Nokia Asha 305:finalMethod: 650 msnonFinalMethod: 940 ms 45% slowerIn this case, the time difference comes from final keyword beforex and y. It is logical because then z value can be precalculated.The final keywords with parameters a, b, c let us not precalculated or anything. And because we don’t use z, it being final does nothelp us
  28. 28. Static» Generally static methods and variables should be faster. Oddly, with some combinations of ARM and JVM, instance accesses are slightly faster. for (int i = 0; i < 1000000; i++) { staticMethod(); Average times spent in loops } on Nokia Asha 305 for (int i = 0; i < 1000000; i++) { nonStaticMethod(); (obfuscated): } private static void staticMethod() { nonStaticMethod: 570 ms b++; // static variable } staticMethod: 680 ms 20% private void nonStaticMethod() { slower a++; // instance variable }
  29. 29. String ConcatenationIf you are going to concatenate a large number of small Strings,use:StringBuffer.append()instead of theString +=operator. String is much slower because every time youconcatenate a string to another with += operator, a newStringBuffer is created under the hood. Depending on the numberof concatenations, a single explicit StringBuffer can be many timesfaster than multiple implicit StringBuffers created by Stringaddition.
  30. 30. Addition vs. Multiplication vs. Divisionfor (int i = 0; i < 500000; i++) { a = 1.23f; b = 1.45f; c = 0.004523f; c += a; a = b + c;}for (int i = 0; i < 500000; i++) { Average times spent in loops a = 1.23f; b = 1.45f; on Nokia Asha 305: c = 0.004523f; c *= a; Multiplying: 330 ms a = b * c;} Addition: 360 ms 9% slowerfor (int i = 0; i < 500000; i++) { a = 1.23f; Division: 560 ms 70% slower b = 1.45f; c = 0.004523f; c /= a; a = b / c;}
  31. 31. Switch vs. IfThe switch statement in C is implemented as a direct jump which isextremely fast. In Java on Nokia Series 40 phones, switches areimplemented at the bytecode level as a series of if statements.Therefore in many cases a switch statement is less efficient than amanually created series of if..else statements in which the firstpositive case is selected as the one which occurs more frequently. Ifyou prefer to use switch statements for code clarity, then arrangethem so that the most frequent cases appear first.
  32. 32. Hidden References» All inner classes contain a reference to the parent class. Even if your code does not take advantage of this, if you pass an inner class to an execution queue such as the event dispatch thread (EDT), the parent class cannot be garbage collected until the inner class instance has been executed and can be garbage collected. MyCanvas: midlet.getDisplay().callSerially(new Runnable() { public void run() { System.out.println(“Canvas width: “ + MyCanvas.this.getWidth()); } });
  33. 33. Performance summary» Compare Algorithms › Talk to colleagues and pick the best algorithm; having the best possible algorithm is the most effective way to optimize performance.» Simple Architecture › Keep your architecture simple and to the point without extra layers of method calls or objects for artificial abstraction. Mobile front end code does not last for ever, so over-engineering and excessive abstraction into multiple classes will slow you down compared to simple use of variables.
  34. 34. Performance summary» Manage Memory with WeakReference Caching › Avoid memory problems by always accessing image data in memory using a WeakReference Cache. › Create a type of virtual memory by duplicating the WeakReference cache contents in Flash memory (Record Management System) so that you can quickly recover items which are no longer available in RAM.
  35. 35. Performance summary» Use micro-optimizations of the code as habit › Know the rules of micro-optimisation for memory performance, logic and calculations. Include those as you develop, but trust Proguard to add the finishing touches. › Help Proguard by making everything possible final or static final. Avoid static variables in high performance loops as they are slower than instance variables.
  36. 36. Performance summary» Profile your app towards the end of project › Profile your application in an emulator. › Also test the actual run-time of critical code sections on the phone using System.currentTimeMillis() to see and carefully measure the effects of your code changes.
  37. 37. › Extreme Mobile JavaJavaOne 2012 Performance Tuning, User San Francisco Experience, and Architecture Patterns › Wednesday, Oct 3, 11:30AM › Notel Nikko – Monterey I/II › › Java for Mobile Devices: New Horizons with Fantastic New Devices › Monday, Oct 1, 8:30AM › Notel Nikko – Monterey I/II ›
  38. 38. Thank you!@MichaelSamarin