Strategies to improve embedded Linux application performance beyond ordinary techniques
Upcoming SlideShare
Loading in...5

Strategies to improve embedded Linux application performance beyond ordinary techniques



he common recipe for performance improvement is to profile an application, identify the most time-consuming routines, and finally select them for optimization. Sometimes that is not enough. Developers ...

he common recipe for performance improvement is to profile an application, identify the most time-consuming routines, and finally select them for optimization. Sometimes that is not enough. Developers may have to look inside the OS searching for performance improvement opportunities. Or they might need to optimize code inside a third party library they do not have access to. For those cases, other strategies shall be used. This presentation reports the experiences of Motorola's Brazilian developers reducing the startup time of an application on Motorola's MOTOMAGX embedded Linux platform. Most of the optimization was performed in the binary loading stage, prior to the execution of the entry point function. This endeavor required use of Linux ABI and Linux Loader going beyond typical bottleneck searching. The presentation will cover prelink, dynamic library loading, tuning of shared objects, and enhancing user experience. A live demo will show the use of prelink and other tools to improve performance of general Linux platforms when libraries are used.



Total Views
Views on SlideShare
Embed Views



1 Embed 5 5


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Strategies to improve embedded Linux application performance beyond ordinary techniques Strategies to improve embedded Linux application performance beyond ordinary techniques Presentation Transcript

  • Strategies to improve embeddedLinux applications’ performance beyond ordinary techniques Anderson Medeiros Software Engineer, Motorola André Oriani Software Engineer, Motorola
  • Agenda•  The performance problem faced by Motorola’s IM team•  Linux’s Dynamic Loader•  Prelink•  Libraries Tools•  Dymically loading Libraries•  Tuning Shared Libraries•  UI Time Perception•  Q & A
  • Motivation
  • Our Problem
  • The basic recipeMeasureAnalyzeOptimize
  • Our DiscoverUser clicks fork() main() DYNAMIC LOADER t
  • Linux’s Dynamic Loader
  • Loading a dynamically linked program .interp ALoad dynamic linker .rel.text Relocation .dynamic Libraries .initDependency libraries Program’s Symbol entry point tables A
  • A closer look at relocation Relative Symbol-based Type Lookup failed Symbol’sCompute hash Yes offset Lookup Hash Next No scopeAdd load bucket object empty address Yes Next Match element No No Adjust Chain Yes address empty
  • prelink
  • Motivation
  • How does prelink work? I•  Collects ELF binaries which should be prelinked and all the ELF shared libraries they depend on•  Assigns a unique virtual address space slot for each library and relinks the shared library to that base address•  Resolves all relocations in the binary or library against its dependant libraries and stores the relocations into the ELF object•  Stores a list of all dependant libraries together with their checksums into the binary or library•  For binaries, it also computes a list of conflicts and stores it into a special ELF section Note: Libraries shall be compiled with the GCC option -fPIC
  • How does prelink work? II•  At runtime, the dynamic linker first checks if it is prelinked itself•  Just before starting an application, the dynamic linker checks if: •  There is a library list section created by prelink •  They are present in symbol search scope in the same order •  None have been modified since prelinking •  There aren’t any new shared libraries loaded either•  If all conditions are satisfied, prelinking is used: •  Dynamic linker processes the fixup section and skips all normal relocation handling•  If at least one condition fails: •  Dynamic linker continues with normal relocation processing in the executable and all shared libraries
  • Results t
  • Library tools
  • How to use prelink?•  prelink –avf --ld-library-path=PATH --dynamic-linker=LDSO •  -a --all •  Prelink all binaries and dependant libraries found in directory hierarchies specified in /etc/prelink.conf •  -v --verbose •  Verbose mode. Print the virtual address slot assignment to libraries •  -f --force •  Force re-prelinking even for already prelinked objects for which no dependencies changed •  --ld-library-path=PATH •  Specify special LD_LIBRARY_PATH to be used when prelink queries dynamic linker about symbol resolution details •  --dynamic-linker=LDSO •  Specify alternate dynamic linker instead of the default
  • Dynamically Loading Libraries
  • Motivation
  • Motivation IIIf there are any libraries you are going to use only on special occasions, it is better to load them when they are really needed.
  • The Basics#include <dlfcn.h>void* dlopen ( const char* filename, int flags);void* dlsym ( void* handle, const char* symbol);char* dlerror (void);int dlclose (void* handle);#echo Although you don’t have to link against the library#echo you still have to link against libdl##gcc main.cpp -ldl -o program
  • Loading C++ Libraries C++ uses mangling!int mod (int a , int b); _Z3sumiifloat mod (float a, float b); _Z3sumff math.cpp math.o
  • The exampleclass Foo{ public: Foo(){} ~Foo(){} void bar(const char * msg) { std::cout<<"Msg:"<<msg<<std::endl; } };
  • The solutionStep 1 Define an interface for your class. Foo + Foo() + ~Foo() + void bar(const char*)
  • The solutionStep 1 Define an interface for your class. <<interface>> Foo + virtual void bar(const char*) = 0 FooImpl + Foo() + ~Foo() + void bar(const char*)
  • The solution - Lib’s Header fileStep 1 Define an interface for your class#ifndef FOO_H__#define FOO_H__class Foo{ public: virtual void bar (const char*) = 0;};
  • The solution - Lib’s Header fileStep 2 Create “C functions” to create and destroy instances of your classStep 3 You might want to create typedefs extern "C" Foo* createFoo(); extern "C" void destroyFoo(Foo*); typedef Foo* (*createFoo_t) (); typedef void (*destroyFoo_t)(Foo*); #endif
  • The solution - Lib’s Implementation fileStep 4 Implement your interface and “C functions” #include "foo.h" Foo* createFoo() #include <iostream.h> { return new FooImpl(); class FooImpl:public Foo } { public: void destroyFoo(Foo* foo) FooImpl(){} { virtual ~FooImpl(){} FooImpl* fooImpl = virtual void bar(const char * msg) static_cast<FooImpl*>(foo); { delete fooImpl; cout<<"Msg: "<<msg<<endl; } } };
  • The solution - The program#include <foo.h>#include <assert.h>#include <dlfcn.h>int main(){ void* handle = dlopen("./",RTLD_LAZY); assert(handle); createFoo_t dyn_createFoo = (createFoo_t)dlsym(handle,"createFoo"); assert(!dlerror()); Foo* foo = dyn_createFoo(); if(foo) foo->bar("The method bar is being called"); destroyFoo_t dyn_destroyFoo = (destroyFoo_t)dlsym(handle,"destroyFoo"); assert(!dlerror()); dyn_destroyFoo(foo); dlclose(handle); return 0;}
  • Tunning Shared Libraries
  • Inspiration “How To Write Shared Libraries” Ulrich Drepper- Red Hat
  • Less is always betterKeep at minimum…•  The number of libraries you directly or indirectly depend•  The size of libraries you link against shall have the smallest size possible•  The number for search directories for libraries, ideally one directory•  The number of exported symbols•  The length of symbols strings•  The numbers of relocations
  • Search directories for libs
  • Reducing search spaceStep 1 Set LD_LIBRARY_PATH to emptyStep 2 When linking use the options: -rpath-link <dir> to the specify your system’s directory for libraries -z nodeflib to avoid searching on /lib, /usr/lib and others places specified by /etc/ and /etc/ #export LD_LIBRARY_PATH=“” #gcc main.cpp -Wl,-z,nodeflib -Wl,-rpath-link,/lib -lfoo -o program
  • Reducing exported symbolsUsing GCC’s attribute feature int localVar __attribute__((visibility(“hidden”))); int localFunction() __attribute__((visibility(“hidden”))); class Someclass { private: static int a __attribute__((visibility(“hidden”))); int b; int doSomething(int d)__attribute__((visibility (“hidden”))); public: Someclass(int c); int doSomethingImportant(); };
  • Reducing exported symbols II{ You can tell the linker which global: symbols shall be exported cFunction*; using export maps extern “C++” { cppFunction*; *Someclass; Someclass::Someclass*; #g++ -shared example.cpp -o Someclass::?Someclass*; -Wl, Someclass::method* -Wl,- }; local: *;};
  • Pro and Cons Pros ConsVisibility attribute Visibility attribute•  Compiler can generate optimal •  GCC’s specific feature; code; •  Code become less readable;Export Maps Export Maps•  More practical; •  No optimization can be done by•  Centralizes the definition of library’s compiler because any symbol may API; be exported
  • Restricting symbol string’s lenghtnamespace java{ namespace lang { class Math { static const int PI; static double sin(double d); static double cos(double d); static double FastFourierTransform (double a, int b,const int** const c); }; _ZN4java4lang4Math2PIE } _ZN4java4lang4Math3sinEd} _ZN4java4lang4Math3cosEd _ZN4java4lang4Math20FastFourierTransformEdiPPKi
  • Avoiding relocations char* a = “ABC”; A B C 0 .dataconst char a[] = “ABC”; A B C 0 .rodata ELF
  • UI Time perception
  • MotivationX hours to deliver X hours to deliver $ to ship $ to shipPackage tracking No tracking
  • Motivation II
  • Improving responsivenessIt is not always possible to optimize code because:•  You might not have access to problematic code;•  It demands too much effort or it is too risky to change it.•  There is nothing you can do (I/O latency, etc…).•  Other reasons ...
  • Can I postpone ?loading Plug-Ins …
  • Can I postpone ? Loading plug-ins
  • Can I parallelize?
  • Can I parallelize?Sending file…
  • Can I remove it ?
  • In conclusion …•  You learned that libraries may play an important role in the startup performance of your application;•  You saw how dynamic link works on Linux;•  You were introduce to prelink and and became aware of its potential to boost the startup;•  You learned how to load a shared object on demand, preventing that some them be a burden at startup;•  You got some tips on how to write libraries to get the best performance;•  You understood that an UI that provides quick user feedback is more important than performance;
  • Q&A