Improving MeeGo Boot-Up Time

Hiroshi DOYU <Hiroshi.DOYU@nokia.com>

      September 2010, LinuxCon Japan
Preface
Background
Handset Boot-Up status
My experiment
Further optimization idea
Q&A
Preface
Inspired by QuickBoot
  Ubiqutous QuickBoot




http://www.ubiquitous.co.jp/En/products/middleware/quickboot
Embedded Linux Wiki
  Boot Time - eLinux.org




http://elinux.org/Boot_Time
Tributed to Tim Bird
 Improving Android Boot-Up Time
Background
Impact of boot-up time
For consumer client device

  User experience
    TV, IVI, Camera
     Immediate action is preferable right after power on.
    Tablet, netbook, handset
     Is cold start really necessary?
    More complicated S/W stacks, more memory consumed.
  Mass Production test
    The more time a device spends on the production line, the more expensive.
Boot-Up time definition
Until when?

  When Login prompt appears.
  When Desktop shows up.
  When Network is available.
  When Browser is ready.
  When it can take a picture.
  When CPU goes into idle.
This depends on:
     Your H/W configuration.
     Your S/W configuration.
     Your system requirements.
The shortest isn’t always the best.
Measurement method(kernel)
   printk timestamps
    show_delta: linux-2.6/scripts/show_delta, a python script
   initcall debugging
dmesg -s 256000 | grep "initcall" | 
     sed "s/(.*)after(.*)/2 1/g" | sort -r -n
   bootgraph
dmesg | 
 linux-2.6.git/scripts/bootgraph.pl > output.svg
   ftrace
Measurement method(userland)
   uptime
/ # cat /proc/uptime
18.73 14.24
/ # cat /proc/uptime
20.55 16.05
   bootchart
    A newer version is released in MeeGo
     No additional tool to create svg. Directly created.

  entire measurement
    Including bootloader, kernel and userland
  grabserial
    show_delta, again
  oprofile
    ETM, Embedded Trace Macrocell, H/W assisted
Existing Optimization techniques
   kernel optimization
     asynchronous initcall
     asynchronous resume/suspend
     misc: preset lpj, no probe, no console, deferred module loading
   userland optimization
     initscript: upstart or systemd. Do it in parallel
     readahead
     prelink
   hibernation based optimization
     snapshot boot
     InstantBoot
     Warp2
     QuickBoot


BIOS/bootloader assisted.
Is cold start still necessary?
 Do we need cold start so often?
 Flashing a hibernation image in advance could reduce the production
 line usetime.
 Optimization may depend on your product specific part
  S/W configuration
  H/W configuration
  Your system requirement
 Wouldn’t hibernation be ok in most cases?
Handset Boot-Up status
Handset requirement
 Responsiveness of device/applications
  Quick response could improve UX, especially Handsets.
   One touch can choose a friend from "contact list".
   One touch can start camera. Same as digital camera.
   One touch can start web browsing.
   A call has to be processed within a short time, from operator spec.


 Resolving dynamic libraries takes more time than swapping in pages.
 All major applications can be started but invisible
  Then, visible upon request.
 RAM is occupied with started applications/daemons.
Handset Boot-Up time
 N900 boot-up takes ~40 sec
  Until Desktop shows up.
 Number of applications 137
 Swap status
Handset Bootgraph
Handset BootChart
Handset memuse
N900 Boot time breakdown
 Bootloader: 0.44 sec
 Kernel: 2.68 sec.
  With serial console.
  Could be shorter without serial console.
 Desktop: 39.03 sec
My experiment
Target spec
 OMAP3 based reference board
  Similar to N900
 512MB RAM
 MeeGo Handset
  Number of applications ~161
  ~120 sec with all application boot-up done
  Swap status
No hibernation support for ARM
   There was no hibernation support for ARM.
   Picked up old patch, and upgraded to v2.6.35.
   Rejected by RMK because:
     Need to be synch’ed with suspend-to-ram
     Lack of PXA support
     coprocessor differences between ARM versions
mrc p15, 0, %0, c2, c0, 0


   At least, it works!
     Let’s proceed.
Which hibernation method to use?
Three implementation of hibernation

  1. swsusp
    Included in mainline kernel as default.
  2. uswsusp
    Userland implementation
  3. tuxonice
    Out of kernel, but many features
     Compression of images
     multiple thread I/O
     readahead
     LVM support
Start with swsusp
  To start hibernation
echo disk > /sys/power/state
swsusp/eMMC
swsusp/eMMC
Use mtdblock rather than eMMC
 mtdblock is much faster than eMMC.
  mtdblock
   ~23 MB/sec/READ
  eMMC
   ~20 MB/sec/READ
   ~15 MB/sec/READ

 This is a HACK since:
  mtdblock itself is bogus without wear-leveling support.
  mtdswap is *volatile*.
   Good performance
   But cannot be used for hibernation.
  Need non-volatile mtdswap!!
swsusp/MTD
swsusp/MTD
Port TuxOnIce on ARM
TuxOnIce has many optimization features:

  Compression of images
  multiple threaded I/O
  readahead
  LVM support
  To drop pagecache
echo -2 > /sys/power/tuxonice/image_size_limit
  To start hibernation
echo disk > /sys/power/tuxonice/do_hibernation
TuxOnIce/MTD
TuxOnIce/MTD
Shrink memory before hibernation
  Reclaim memory as much as possible right before hibernation.
echo 10000 > /sys/power/shrink_mem
TuxOnIce/MTD/shirink_mem
TuxOnIce/MTD/shirink_mem
What is the bottleneck?
 The smaller RAM consumed, the lesser boot time.
  But cannot squeeze any more after certain size
 In our case:
  size: ~110 MB
  ~70% of boot time is spent on (compressed) image restoration.
meminfo/shirink_mem
What occupies RAM?
 Who uses lots of memory
  MeeGo "memuse" can identify.
Why unevictable?
  Recent SoC has smart coprocessors
    GPU, DSP and H/W accelerators.
  They may have IOMMU.
  More memory could be shared with coprocessors




http://en.wikipedia.org/wiki/IOMMU
Why does IOMMU have an effect?
 pages have to be DMA’able.
 Shared pages have to be pinned.
  They shouldn’t be swapped out.
   Unevictable
Further optimization idea
Linearity of hibernation method
  Linux VM tries to occupy RAM as much as possible(ex: page cache).
  RAM consumption can be squeezed at certain point.
  The boot time increases in proportion to the size of unevictable
  memory.


For further optimization, we need something more!
Proposals
  1. To increase read performance of storage
   Faster storage?
     mtd gets shorter boot-up time than eMMC
     faster mtd gets shorter boot-up time than slower mtd
     non-volatile mtdswap driver
   LVM swap to improve disk performance by raid-0
  2. Still to decrease image size
   Kill & restart bloated Apps if possible.
     maybe a bit brutal, but it works certainly.
   Swap out unevictable pages
     How to ensure if those pages exisit when it’s necessary?
   page coloring
     memory cgroup, which process page can be swapped out

  3. Lazy image/page loading


Don’t we forget the system responsiveness?
Example: Ubiquitous QuickBoot
  Can be considered as "Lazy image/page loading":




http://www.ubiquitous.co.jp/En/products/middleware/quickboot
Q&A
Thank you!




Please send comments toHiroshi.DOYU@nokia.com

Improving MeeGo boot-up time

  • 1.
    Improving MeeGo Boot-UpTime Hiroshi DOYU <Hiroshi.DOYU@nokia.com> September 2010, LinuxCon Japan
  • 2.
    Preface Background Handset Boot-Up status Myexperiment Further optimization idea Q&A
  • 3.
  • 4.
    Inspired by QuickBoot Ubiqutous QuickBoot http://www.ubiquitous.co.jp/En/products/middleware/quickboot
  • 5.
    Embedded Linux Wiki Boot Time - eLinux.org http://elinux.org/Boot_Time
  • 6.
    Tributed to TimBird Improving Android Boot-Up Time
  • 7.
  • 8.
    Impact of boot-uptime For consumer client device User experience TV, IVI, Camera Immediate action is preferable right after power on. Tablet, netbook, handset Is cold start really necessary? More complicated S/W stacks, more memory consumed. Mass Production test The more time a device spends on the production line, the more expensive.
  • 9.
    Boot-Up time definition Untilwhen? When Login prompt appears. When Desktop shows up. When Network is available. When Browser is ready. When it can take a picture. When CPU goes into idle. This depends on: Your H/W configuration. Your S/W configuration. Your system requirements. The shortest isn’t always the best.
  • 10.
    Measurement method(kernel) printk timestamps show_delta: linux-2.6/scripts/show_delta, a python script initcall debugging dmesg -s 256000 | grep "initcall" | sed "s/(.*)after(.*)/2 1/g" | sort -r -n bootgraph dmesg | linux-2.6.git/scripts/bootgraph.pl > output.svg ftrace
  • 11.
    Measurement method(userland) uptime / # cat /proc/uptime 18.73 14.24 / # cat /proc/uptime 20.55 16.05 bootchart A newer version is released in MeeGo No additional tool to create svg. Directly created. entire measurement Including bootloader, kernel and userland grabserial show_delta, again oprofile ETM, Embedded Trace Macrocell, H/W assisted
  • 12.
    Existing Optimization techniques kernel optimization asynchronous initcall asynchronous resume/suspend misc: preset lpj, no probe, no console, deferred module loading userland optimization initscript: upstart or systemd. Do it in parallel readahead prelink hibernation based optimization snapshot boot InstantBoot Warp2 QuickBoot BIOS/bootloader assisted.
  • 13.
    Is cold startstill necessary? Do we need cold start so often? Flashing a hibernation image in advance could reduce the production line usetime. Optimization may depend on your product specific part S/W configuration H/W configuration Your system requirement Wouldn’t hibernation be ok in most cases?
  • 14.
  • 15.
    Handset requirement Responsivenessof device/applications Quick response could improve UX, especially Handsets. One touch can choose a friend from "contact list". One touch can start camera. Same as digital camera. One touch can start web browsing. A call has to be processed within a short time, from operator spec. Resolving dynamic libraries takes more time than swapping in pages. All major applications can be started but invisible Then, visible upon request. RAM is occupied with started applications/daemons.
  • 16.
    Handset Boot-Up time N900 boot-up takes ~40 sec Until Desktop shows up. Number of applications 137 Swap status
  • 17.
  • 18.
  • 19.
  • 20.
    N900 Boot timebreakdown Bootloader: 0.44 sec Kernel: 2.68 sec. With serial console. Could be shorter without serial console. Desktop: 39.03 sec
  • 21.
  • 22.
    Target spec OMAP3based reference board Similar to N900 512MB RAM MeeGo Handset Number of applications ~161 ~120 sec with all application boot-up done Swap status
  • 23.
    No hibernation supportfor ARM There was no hibernation support for ARM. Picked up old patch, and upgraded to v2.6.35. Rejected by RMK because: Need to be synch’ed with suspend-to-ram Lack of PXA support coprocessor differences between ARM versions mrc p15, 0, %0, c2, c0, 0 At least, it works! Let’s proceed.
  • 24.
    Which hibernation methodto use? Three implementation of hibernation 1. swsusp Included in mainline kernel as default. 2. uswsusp Userland implementation 3. tuxonice Out of kernel, but many features Compression of images multiple thread I/O readahead LVM support
  • 25.
    Start with swsusp To start hibernation echo disk > /sys/power/state
  • 26.
  • 27.
  • 28.
    Use mtdblock ratherthan eMMC mtdblock is much faster than eMMC. mtdblock ~23 MB/sec/READ eMMC ~20 MB/sec/READ ~15 MB/sec/READ This is a HACK since: mtdblock itself is bogus without wear-leveling support. mtdswap is *volatile*. Good performance But cannot be used for hibernation. Need non-volatile mtdswap!!
  • 29.
  • 30.
  • 31.
    Port TuxOnIce onARM TuxOnIce has many optimization features: Compression of images multiple threaded I/O readahead LVM support To drop pagecache echo -2 > /sys/power/tuxonice/image_size_limit To start hibernation echo disk > /sys/power/tuxonice/do_hibernation
  • 32.
  • 33.
  • 34.
    Shrink memory beforehibernation Reclaim memory as much as possible right before hibernation. echo 10000 > /sys/power/shrink_mem
  • 35.
  • 36.
  • 37.
    What is thebottleneck? The smaller RAM consumed, the lesser boot time. But cannot squeeze any more after certain size In our case: size: ~110 MB ~70% of boot time is spent on (compressed) image restoration.
  • 38.
  • 39.
    What occupies RAM? Who uses lots of memory MeeGo "memuse" can identify.
  • 40.
    Why unevictable? Recent SoC has smart coprocessors GPU, DSP and H/W accelerators. They may have IOMMU. More memory could be shared with coprocessors http://en.wikipedia.org/wiki/IOMMU
  • 41.
    Why does IOMMUhave an effect? pages have to be DMA’able. Shared pages have to be pinned. They shouldn’t be swapped out. Unevictable
  • 42.
  • 43.
    Linearity of hibernationmethod Linux VM tries to occupy RAM as much as possible(ex: page cache). RAM consumption can be squeezed at certain point. The boot time increases in proportion to the size of unevictable memory. For further optimization, we need something more!
  • 44.
    Proposals 1.To increase read performance of storage Faster storage? mtd gets shorter boot-up time than eMMC faster mtd gets shorter boot-up time than slower mtd non-volatile mtdswap driver LVM swap to improve disk performance by raid-0 2. Still to decrease image size Kill & restart bloated Apps if possible. maybe a bit brutal, but it works certainly. Swap out unevictable pages How to ensure if those pages exisit when it’s necessary? page coloring memory cgroup, which process page can be swapped out 3. Lazy image/page loading Don’t we forget the system responsiveness?
  • 45.
    Example: Ubiquitous QuickBoot Can be considered as "Lazy image/page loading": http://www.ubiquitous.co.jp/En/products/middleware/quickboot
  • 46.
    Q&A Thank you! Please sendcomments toHiroshi.DOYU@nokia.com