ELC-E 2010: The Right Approach to Minimal Boot Times

46,777 views
46,607 views

Published on

This was presented at ELC-E 2010 in Cambridge and describes an approach to cold boot time reduction. It also demonstrates the approach through a case study with an MS7724 reference board.

Published in: Technology
6 Comments
39 Likes
Statistics
Notes
  • I deleted Transmission from my Mac after attempting to deal with their team. I see nothing has changed.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I wish my desktop would boot in 1-2 seconds :)
    The presentation is now on reddit

    http://www.reddit.com/r/linux/comments/fekhf/linux_gui_boot_in_077sec_the_right_approach_to/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Absolutely amazing and major kudos. Having done similar optimizations in the past (though never w/a logic analyzer), I appreciate the effort that went into this. It's really a shame that such professionalism and thoroughness isn't appreciated and held as a standard as the norm in the programming world. If it was, we'd have SW today that run 10 times faster in a fraction of the memory -- but the reality:

    ALL companies I worked for placed optimizations like this at the bottoms of the importance chain, with the attitude being that if it 'worked', that was good enough, and often, if you spent time on optimization, it was counted as you wasting your time.

    Finally learned to be able to squeeze by some types of this work to 'optimize' run-time efficiency in the name of 'stress testing'. Obviously the faster your application runs, the the more stress it puts on all parts of the application and infrastructure. It really does uncover a load of bugs and it is amazing how many software programs out today have ridiculously low, unpublished limits in how much data why will work with before they break.

    Try reporting them and you get a look like you are from 'mars'

    Most recent example was on a bt-client that failed w/500 torrents and one of the lead developers called my usage 'unrealistic and unreasonable' (http://trac.transmissionbt.com/ticket/3829#comment:25). Others piped up and mentioned that others were running w/over 3 times that many, so the lead dev obviously had his head in the sand when it came to real-life usage mentioned in his own forums, but the idea of 'stress testing' rarely occurs to anyone.

    Note -- by stress testing, it's not just about creating 'load', but also about optimizing 'cycles' in doing 'whatever' it is you are doing, since the tighter the cycles, the more likely it is you are to hit timing bugs. That's the *important* part of this -- stress testing w/o optimization is useful, but not nearly so much as with optimization.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Did any shared source code to test? i worked on s3c2440, hope to port your modification on it.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • why could only see 1-18 page
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
46,777
On SlideShare
0
From Embeds
0
Number of Embeds
699
Actions
Shares
0
Downloads
0
Comments
6
Likes
39
Embeds 0
No embeds

No notes for slide

ELC-E 2010: The Right Approach to Minimal Boot Times

  1. 1. Delivering Software Innovation MPC Data Limited is a company registered in England and Wales with company number 05507446. Copyright © MPC Data Limited 2010. All trademarks are hereby acknowledged The Right Approach to Minimal Boot Times Andrew Murray Senior Software Engineer CELF Embedded Linux Conference Europe 2010
  2. 2. Delivering Software Innovation 22  Senior Software Engineer, MPC Data  Driver and kernel development  Embedded applications development  Windows driver development  Work Experience  4 years of experience working with embedded Linux devices  Good track record in dramatically reducing customers’ boot time through MPC Data’s boot time reduction service: – Tight timescales often doesn’t permit nice, elegant and generic solutions – however this frustration had provided me with many ideas I wish to share today – I also wish to share my observations and experiences in boot time reduction Andrew Murray
  3. 3. Delivering Software Innovation 33 Agenda  Principals behind boot time reduction  My approach to boot time reduction  Case Study: MS7724 ‘Ecovec’  Optimizing user space and function reordering  Video Demonstration  Conclusion and Q&A
  4. 4. Delivering Software Innovation 4 Principals  The problem:  Getting an embedded Linux based product from power-on to a useful state of functionality in an acceptable amount of time.  Many innovative solutions exist: Suspend / Hibernate / etc  This presentation focuses on cold-boot optimisation  Specialising software for specific needs of a product  And this works because prior to optimisation the software will be: – More General purpose – Likely to contain functionality your device doesn’t require which will result in more initialisation and a larger image – More Convenient and flexible – Likely to probe and detect hardware which you know will always be there which will contribute to boot delay.  There is no silver bullet here – all that is required is: Disciplined Analysis + Common Sense + Pragmatic Approach 4
  5. 5. Delivering Software Innovation 5 The Approach 5 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation
  6. 6. Delivering Software Innovation 6 The Approach 6 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation  Understand what functionality is required:  Immediately after boot  Sometime after  The better your understanding the more able you are to specialise Linux and thus improve boot time
  7. 7. Delivering Software Innovation 7 The Approach 7 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation  It is important to visualise what contributes to boot time  Measuring boot time across the entire software stack is essential  Without tools, gauging small boot delays can be impossible  Being able to accurately measure boot time across the board will allow you to measure the effect of any changes you make…  …otherwise you’ll be lost in the dark
  8. 8. Delivering Software Innovation 8 The Approach 8 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation  Unnecessary functionality will increase boot time due to  Increased image size (flash transfer time)  Time spent initialization during start up  “But I might use this feature in the future, it’s nice to have” – Be strict and stick to the brief
  9. 9. Delivering Software Innovation 9 The Approach 9 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation  Functionality you require can be optimised  It may already be optimised in a later version of sources  This may involve:  Optimising flash timings  Removing unnecessary probing / delays  Refactoring code  Taking a new approach to problems
  10. 10. Delivering Software Innovation 10 The Approach 10 Identify boot time functionality Measure boot time across the board Remove un- necessary functionality Optimise required functionality Re-order initialisation  Further improvements can be gained by doing things at different times:  Parallelisation  Using Arjan’s async framework (kernel/async.c)  Deferred loading of less important features  Loadable kernel modules
  11. 11. Delivering Software Innovation 11 Case Study  Use the MS7724 ‘EcoVec’ as a case study for a home automation system  Boot time functionality:  Responsive QT user interface  Additional functionality:  Video capture/render (representing a security camera)  Will describe tools, techniques and lessons along the way 11
  12. 12. Delivering Software Innovation 12 MS7724 12
  13. 13. Delivering Software Innovation 13 SH7724 13
  14. 14. Delivering Software Innovation 14 Case Study  Use the MS7724 ‘EcoVec’ as a case study for a home automation system  Boot time functionality:  Responsive QT user interface  Additional functionality:  Video capture/render (representing a security camera)  Will describe tools, techniques and lessons along the way 14
  15. 15. Delivering Software Innovation 15 Case Study 15
  16. 16. Delivering Software Innovation 16 Case Study  Use the MS7724 ‘EcoVec’ as a case study for a home automation system  Boot time functionality:  Responsive QT user interface  Additional functionality:  Video capture/render (representing a security camera)  Will describe tools and techniques along the way 16
  17. 17. Delivering Software Innovation 17 Case Study  Typical Starting Point:  BootLoader: UBoot (2009-01)  OS: Linux (2.6.31-rc7)  Filesystem: Buildroot (2010.05), JFFS2, NOR Flash  Application: QT Embedded Opensource 4.6.2 17
  18. 18. Delivering Software Innovation 18 MS7724 Boot Process  Component times mostly measured using GPIO and a logic analyser,  UBoot time measured between reset and GPIO line being asserted  Sources modified to toggle GPIO at key points: – UBoot: UBoot to kernel handover (common/cmd_bootm.c:do_bootm) – Kernel: Mount FS (init/do_mounts.c:do_mount_root) – Kernel: Init (init/main.c:init_post)  Used printk timings for the rest  Time to required boot time functionality: > 19 seconds! 18 Baseline (seconds) UBoot (2.58 s) Kernel (1.30 s) Filesystem Mount (~6.83 s) Init Scripts (1.30 s) GUI Application (7.43 s) Power-to-UI: 19.44 seconds
  19. 19. Delivering Software Innovation 19 UBoot Functionality Removal • User Boot Delay – 1000 ms • Image Verification – 374 ms • Image Decompression • USB, ROMImage, Filesystems – 195 ms Functionality Optimisation • Improve ‘memcpy’ code – 342 ms • Eliminate use of console – 103 ms • Reduced kernel size – 60ms Functionality Re-ordering • Read MAC from EEPROM – 124 ms • Ethernet setup – 98 ms Reduction: 2577 ms > 280 ms (89%)Init 0.580 User Boot Delay 1.000 Verify Image 0.374 Extract Image 0.622 0 0.5 1 1.5 2 2.5 3 Time(s) Boot Task
  20. 20. Delivering Software Innovation 20 UBoot (Before and After) Init 0.580 User Boot Delay 1.000 Verify Image 0.374 Extract Image 0.622 0 0.5 1 1.5 2 2.5 3 Time(s) Boot Task Before (136 kB) After (94 kB) Initialisation Decompress 0 0.5 1 1.5 2 2.5 3 Time(s) Boot Task
  21. 21. Delivering Software Innovation 21 Compressed Kernel Images  The purpose of the boot loader is to jump to an uncompressed kernel image in RAM  A number of factors should be considered  If Flash throughput is greater than decompression throughput then an uncompressed image is quicker 21 Flash Throughput (10.50 MB/s) Decompression Throughput (2.67 MB/s)
  22. 22. Delivering Software Innovation 22 Linux Kernel Functionality Removal • Remove USB – 88 kB – 144 ms • Remove keyboard driver – 4 ms • Remove Filesystems – 300 kB – 0.8 ms • Remove console output Functionality Optimisation • Remove delays in driver initialization – 400 ms • Removing delays - 252 ms • Prevent probing disconnected cameras – 200 ms • Limiting memset size – 90 ms • Improve performance of memset – 71 ms Functionality Re-ordering • Defer networking initialization – 166 kB – 20 ms Reduction: 1301 ms -> 113 ms (91%)
  23. 23. Delivering Software Innovation 23 Linux Kernel (Before and After)
  24. 24. Delivering Software Innovation 24 Userspace (Mount and Init scripts) Functionality Removal •Remove all init scripts – use a single init process – 1.32 s Functionality Optimisation •Statically link application with uClibc libraries •Use SquashFS instead of JFFS2 ~6.81 s •Improve performance of NOR memory driver Functionality Re-ordering •Start QT then later start video Reduction: 8130 ms > 64 ms (99%) Example of a bootchart graphic
  25. 25. Delivering Software Innovation 25 QT  Reducing the boot time of the QT application was the biggest challenge and very time consuming  Un-optimized QT application was large and took 7.4 seconds to reach it’s main function!  Improvements reduce time to 0.3 seconds: Measurements show incremental effects against original binary (from left to right) Optimise flash access (arch/sh/kernel/io.c) -2.89s Removed features -2.16s? Used statically linked uClibc -0.9s? Stripped -0.16s Optimising executable -0.35s Read-ahead and block size - 0.63s
  26. 26. Delivering Software Innovation 26 Why does QT take so long to start?  Only a portion of the QT application is required to display a UI to the user  Event handling, additional forms, etc come later  As Linux uses Demand Paging - when an executable is run only parts of the executable used are read from flash  This reduces unnecessary flash accesses and decreases application start up time  However the application is on a block filesystem so when an entire block is retrieved at a time…  …This results in unnecessary flash access time if the required executable code is spread over the entire image
  27. 27. Delivering Software Innovation 27 Function Reordering and Block Sizes  Sections highlighted in red represent parts of executable required at start up  Most of these parts could fit in a single file-system block  I.e. we could optimise the application such that only 2 blocks of flash are accessed rather than 4  Thus the executable can be optimised by:  Reducing block size  Eliminating FS readahead  Reordering executable MTD Layer Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 NOR Flash NOR Flash
  28. 28. Delivering Software Innovation 28 Function Reordering  GCC compiler features can be used to assist:  --finstrument-functions  --ffunction-sections  Before the entry and after the exit of every function call – GCC will call two new functions when –finstrument-functions is used:  void __cyg_profile_func_enter (….)  void __cyg_profile_func_exit (….)  These calls can be implemented to find out which functions are called when  This information can be used to generate a custom linker script – when –function-sections is used each function lives in its own section.  This way we can ensure all the required sections for startup are contained contiguously in flash  (--gc-sections can also be helpful)
  29. 29. Delivering Software Innovation 29 Function Reordering and Block Sizes  Before:  After: MTD Layer Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 NOR Flash NOR Flash MTD Layer Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 NOR Flash NOR Flash FS QT Application
  30. 30. Delivering Software Innovation 30 Essential tools  Discrete events can be measured by toggling GPIO outputs and utilising a logic analyser,  Kernel events can be measured with:  Printk timings,  Initcall_debug and bootchart scripts,  Userspace events can be measured with ubootchart  http://code.google.com/p/ubootchart/  http://www.bootchart.org/  These are just some of the many tools available
  31. 31. Delivering Software Innovation 31 Case Study: Before and After A reduction of 96%! Baseline (seconds) UBoot (2.58 s) Kernel (1.30 s) Filesystem Mount (~6.83 s) Init Scripts (1.30 s) GUI Application (7.43 s) Power-to-UI: 19.44 seconds After swiftBoot Modifications (seconds) UBoot (0.28 s) Kernel (0.11 s) Filesystem Mount (0.02 s) Init Scripts (0.06 s) GUI Application (0.30 s) Power-to-UI: 0.77 seconds
  32. 32. Delivering Software Innovation 32 Modification In UBoot Gain (ms) Modification In Kernel Gain (ms) Modification In Userspace Gain (ms) Remove boot delay 1000 Remove driver delays 652 Use squashfs 6830 Remove Image verification 374 Prevent probing disconnected cameras 200 Optimise flash accesses 2890 Optimise memcpy code 342 Remove USB 144 Remove unused features from QT 2160 Remove USB ROMImage, filesystems 195 Don’t allocate memory for unused camera components 90 Remove all init scripts 1300 Defer reading MAC address 124 Improve memset 71 Statically link QT with uclibc 900 Reduction due to kernel size 60 Defer network initialisation 20 Reduce readahead and block size 630 Remove delays in Ethernet init 98 Remove keyboard driver 4 Re-order QT application 350 Eliminate use of console 103 Remove filesystems 0.8 Strip QT application 160 Total Gain 2.2 s Total Gain 1.2 s Total Gain 15.2 s Summary
  33. 33. Delivering Software Innovation 33 Guiding Principles  Observe and Record  Measuring boot times is the only way to form a clear picture of what is contributing to boot time,  Keep copious notes  Tackle the biggest delays in the system first,  Identify the largest delays and remove them to be most effective  Be aware and try to understand varying boot times  Remember the uncertainty principle  Don’t forget testing
  34. 34. Delivering Software Innovation 34 Conclusion & Call to action Reducing cold boot time is a like removing the longest links of a chain until you have just short links  As a result boot time is a product of a system design and long links can be easily added  Effort will always be required to remove and shorten links for a given system  Holy grail is to reduce this amount of effort to nothing – some ideas towards this:  [Idealism] Asynchronous initialisation in the kernel by default – Many challenges here – This would reduce effect of delays in drivers  [Realism] Simple Caching framework for device probes – To eliminate probes for known hardware (generic device tree) – Could encompass LPJ, etc
  35. 35. 35 Thank You Any Questions?
  36. 36. 36 Appendix
  37. 37. Delivering Software Innovation 37 Initcall Debug  Add the following to your kernel command line:  initcall_debug (to add debug)  loglevel=0 (to reduce the impact this has on boot time)  Ensure the following are set in your kernel configuration:  CONFIG_PRINTK_TIME (add timings to printks)  CONFIG_KALLSYMS (ensure symbols are there)  CONFIG_LOGBUF_SHIFT = 18 (ensure there is room in the log buffer)  Copy the output of ‘dmesg’  Type ‘cat output | ./scripts/bootgraph.pl > graph.svg’

×