LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status
Upcoming SlideShare
Loading in...5

LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status



Resource: LCA14 ...

Resource: LCA14
Name: LCA14-301: AArch64: Media, libs and GUI plans & status
Date: 05-03-2014
Speaker: Ragesh Radhakrishnan
Linaro Connect:



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status Presentation Transcript

  • Wed-5-Mar, 10:05am, Ragesh Radhakrishnan, James Yu and Tom Gall LCA14-301: AArch64: Media, libs & GUI plans & status
  • 1) Pick important libraries that have existing ARMv7 (32bit) NEON optimizations 2) Avoid creating more hand coded NEON assembler, use NEON intrinsics instead 3) Set expectations - We have to run in the model - Model is not cycle accurate 4) Push results upstream to development versions of library 5) As appropriate create versions against stable library versions for use in product if requested Porting Strategy for AARCH64
  • ● libpng - James Yu ● libvpx - James Yu (VP8, VP9) ● libjpeg-turbo - Ragesh Radhakrishnan ● pixman - Ragesh Radhakrishnan ● xfce image - Tom Gall ● chromium browser - Tom Gall Porting Strategy for AARCH64
  • Source Code - git:// Supported AArch64 from version 1.6.7, Nov. 2013. Has been tested on iOS 7. Benchmark Result: Version 1.6.10beta01 [February 9, 2014] Toolchain: gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1- 4.8-2014.01 - Linaro GCC 2013.11) CPU: Cortex-A8 800 MHz, single core. libpng Test image: goldhill.png, 720x576 Total Time Performance Few performance lossNone NEON 50.519 s 100.00% NEON Assembly 42.899 s 117.76% 100.00% NEON intrinsics 44.081s 114.61% 97.32% * Total time = decode 100 times.
  • 1. A part of Google WebM project. 2. Source code - https://chromium.googlesource. com/webm/libvpx 3. Status - * Complete rewritten NEON assembly to intrinsics. * Optimized performance on ARMv7. * Post total 49 patches of VP8/VP9. - VP8: in progress review. - VP9: posted, waiting for review. (27/Feb/2014) * In progress to run on ARMv8 architecture. libvpx - VP8/VP9
  • Benchmark result: Version: 1.3.0 [February 26, 2014] Toolchain: gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) CPU: Cortex-A8 800 MHz, single core. Test Video: Tears of Steel, 1080p. 12:15 mins have VP8 and VP9 format version. libvpx - VP8/VP9 FPS Performance 9.5% performance loss of using intrinsics instead of assemblyVP8 Decode None NEON 2.82 100.00% NEON Assembly 13.23 469.14% 100.00% NEON intrinsics 11.97 424.46% 90.48% Vp9 Decode None NEON 2.22 100.00% NEON Assembly 8.37 377.03% 100.00% NEON intrinsics 7.56 340.54% 90.32%
  • Armv7 Android refresh: List of features from AOSP integrated to libjpeg turbo ver 1.3. Tile Decode,Color conversion rgb565 & rgb8888, backing store - Ashmem Status : Upstreaming to libjpeg turbo in progress. Source: git:// jpeglib decompression benchmark on pandaboard using tjbench libjpeg-turbo Image resolution Performance (fps) Throughput (MP/Sec) linaro libjpeg- turbo 3008*2000 3.8 22.8563 227*149 829.1839 28.04 AOSP jpeglib 3008*2000 1.454 8.474 227*149 285.302 9.64
  • Armv8 Port: List of jpeg decoder handcoded armv8 port, This port is tested using ARM RTSM. Status : Decoder routines upstreamed to libjpeg-turbo Source: git:// Branch: libjpeg-turbo-armv8 libjpeg-turbo # Jpeg funcitons ported Remarks 1 IDCT_Slow IDCT integer version 2 IDCT_Fast IDCT non accurate version 3 IDCT_2x2 IDCT 2x2 size reduction 4 IDCT_4x4 IDCT 2x2 size reduction 5 Color conversion routines yuv to rgb, yuv to bgr, yuv to grayscale etc
  • Pixman armv8 port: Rewriting armv7 functions to armv8. Approach : Using Intrinsics List of functionalities and progress Status : rewriting of Bilinear scanline funciton in progress Test Environment: Using armv8 xfce stack on ARM RTSM. Pixman # Main functions to be ported Remarks 1 Bilinear scanline functions 80% ported 2 Pixman composite function Pixel processing funcitons Not started
  • OE based Works in the model Patches need to flow to respective upstreams xfce image
  • Status Chromium-24 src + 32 patches binary built tests built (most run without problem) Model Networking broken (VFP “upgrade”) 2 Gig RAM limit dual core slow Chromium Porting to AARCH64
  • Plan libv8 ToT enables ToT Chromium Forward port Push upstream to Chromium community Chromium on AARCH64
  • Any input on next libraries? Any libraries you’d like to see Linaro optimize? Discussion
  • More about Linaro Connect: More about Linaro: More about Linaro engineering: Linaro members: