SlideShare a Scribd company logo
1 of 24
Download to read offline
ATOS introduction
ST/Linaro Collaboration Context
Presenter: Christian Bertin
Development team: Rémi Duraffort, Christophe Guillon, François de Ferrière, Hervé
Knochel, Antoine Moynault
Consumer Product Division, Compilation team
2016/3/8
Project Rationale
• Optimizing large open source applications by hand is a dead-end
Þ Sources and makefiles must be optimized as is
• Compilers are tuned through heuristic settings based on standard
benchmark average results (typically SPECS)
Þ There is room for improvement if one takes application- and target-specific
heuristics tuning (specialization) into account
• Advanced compilation features (e.g Feedback-Driven or Inter-
Procedural Optimization) are neither straightforward to use, nor easy
to automate, thus in practice never used by developers
Þ Compilers are under-exploited
Develop a tool, which automatically tunes the compiler for a specific set
of use cases, seamlessly applying advanced compilation optimization
2
Let the machine do the hard work
(HiPEAC Roadmap)
Auto Tuning Optimization System
ATOS stands for:
“Auto Tuning Optimization System”
• Functional requirements:
• Automatically find best C/C++ compiler configuration for a given objective.
• Explore on top of any build system with a given set of executable use cases
• Preserve original sources, makefiles and build scripts (they are not modified)
• Application:
• Find best performance / code size tradeoffs search for:
• a given set of executables/libraries[/kernel modules] and
• a given set of benchmarking use cases
3
Optimization Challenges
4
WebCore
V8 JITGtk
Cairo Jpeg
Pixman
Directfb
Identify
Hot Code
Enlarge Optimization scope
Advanced
Profiling Tools
Inter Procedural and
Cross Library
Find best
compiler tuning
Iterative
Optimization
ATOS High Level Usage
unmodified
compile.sh
unmodified
compile.sh
atos-explore
Local or global
Configuration
Database
Local or global
Configuration
Database
exec.shexec.sh
executable
name
executable
name
atos-play Optimized
executable
Optimized
executableunmodified
compile.sh
unmodified
compile.sh
Extract of
database
Extract of
database
Developer or batch environment
Release environment
Unmodified compile.sh: actually any build system command:
• make all
• rpmbuild
• build.sh
exec.sh: actually a script running the application
• make run
• direct or remote execution
• run.sh
• emulated or on board
Example of Preferred Configuration 6
Best trade-off
Best size
Best speed
Get and replay
chosen
configuration
ATOS Features
• Whole executable optimization sequences
• File by file exploration of optimization sequences
• Function by function exploration with provided GCC plugin
• Support for 'perf' and 'oprofile‘ tools
• Seamless profiling feedback and link time optimizations support
• Support in GCC 4.5 to 6.0 / LLVM 3.4 and 3.6 / ARM RVCT
• Simple command line interface
• Native or cross build and run
7
ATOS Features (continued)
Latest features/improvements (ATOS v4.0)
• Kernel build support (*.ko)
• Various exploration schemes: random, staged or genetic
• Parallel build and execution (native or remote) capabilities
• Parameters fine-tuning and flags pruning exploration
8
Example of Exploration Usage 9
Initialization of session
$ atos-init –b ./build.sh –r ./run.sh –p ./run-
oprofile.sh
First basic exploration O(12)
$ atos-explore
Whole program flags exploration O(M.G.3)[def: O(3000)]
$ atos-explore-genetic [-M100] [--generations 10]
File by file exploration O(H.N.36) [def: O(H.3600)]
$ atos-explore-acf –file-by-file [-N100]
Funct by funct exploration O(H.N.36) [def: O(H.3600)]
$ atos-explore-acf [-N100]
Show exploration graph
$ atos-graph
Where: O(…): complexity in number of build+run, H: number of determined Hot
file (resp. function)
Example of Replay Usage 10
Output best perf tradeoff
$ atos-play –T
speedup | sizered | target | variant_id
+30.56% | -13.44% | sha-shatest-shacmp | bbc60eb968fb803987f4df19304e6a1e
Output best size tradeoff
$ atos-play –T –f size
speedup | sizered | target | variant_id
+13.57% | +3.58% | sha-shatest-shacmp | 7b9351c456fa303f8e3ef446e2028aff
Compile for best perf tradeoff
$ atos-play
Building Variant [bbc60eb968fb803987f4df19304e6a1e]…
Compile a selected variant
$ atos-play –l 7b9351c456fa303f8e3ef446e2028aff
Building Variant [7b9351c456fa303f8e3ef446e2028aff]…
Video of exploration using parallel
build/run features
11
ATOS Outcomes
ATOS Optimization Results 13
• JPEG ST40 / HDK7108 results
• 26.39% speedup / 13.37% size increase
• ZLIB ST40 / HDK7108 results
• 12.54% speedup / 1.41% size increase
• Stagecraft ST40 / HDK7108 results
• 5-28% speedup (30 benchs) / 14% size reduction
• HEVC ARMv7 / Orly results
• 9.22% speedup / 21.21% size reduction
• SPEC2000 ARMv7 / U9540 results
• 18.7% speedup SPECINT / 10.2% speedup SPECFP / size increase n/a
JPEG ST40 / HDK7108
14
Best perf v2.0:
26.39% speedup
13.37% size increase
-O3
-O2
-Os
ZLIB ST40 / HDK7108
15
Best perf v2.0:
12.54% speedup
1.41% size increase
-O3
-Os
-O2
StageCraft ST40 / HDK7108
16
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
REF
BEST variant
• Results:
• Performance: 5-15 % speedup range in FPS, one at 28%
• Size: 14% footprint reduction (out of 35Mb initial footprint)
• Reference: build from CPD –O2/-O3 mix, compiler gcc 4.5.2 (no lto, no plugins)
• Benchmark: Sagem provided benchmarks
• Profile: 58 use cases, 30 of which not fully HW accelerated
• Optimization scope Stagecraft binary and 21 accompanying shared objects
• Outcome: Customer interested but actually the packages are provided by ST/CPD
ATOS* HEVC** results
Summary and robustness to source updates 17
Experiment
Performance first
speed-up size reduction
1. Initial exploration (HEVC version 1)
(4618 runs)
6.18% 22.41%
2. Replay best after update (HEVC version 2)
(0 more run)
7.55% 20.16%
3. More advanced fine-tuning on top of 2.
(1267 additional runs)
9.22% 21.21%
Speed-up and code size compared to reference configuration : -O3 build
Performance measurement unit: CPU time (process user time)
Benchmark: decoding of the 300 first frames of a typical stream***
* ATOS version: atos v2.0
** HEVC version: HEVC full SW ARM Cortex, Optimized, 2 distinct versions
*** stream: RA_LC_TotalRecallHotelTransylvania_1280x720_25_QP26.bin
Source update without additional exploration
Some more explorations
SPECINT2000* ARMv7 / U9540 18
Time
-OS
Speedup
-Os/-O3
Speedup
-Os/ATOS
Speedup
-O3/ATOS
164.gzip 40.9 7.4 24.8 16.1
175.vpr 37.6 11.7 16.8 4.6
176.gcc 4.0 6.7 25.4 17.6
181.mcf 45.6 2.2 5.0 2.8
186.crafty 31.7 8.5 21.1 11.5
197.parser 12.1 5.3 29.3 22.8
252.eon 12.4 54.7 80.1 16.4
253.perlbmk 68.6 4.2 22.2 17.3
254.gap 7.0 0.7 8.6 7.8
255.vortex 15.4 2.5 85.6 81.2
256.bzip2 48.2 3.1 17.7 14.1
300.twolf 17.8 0.3 28.3 27.8
SPECint +8.2% +28.4% +18.7%
* SPECINT2000 results and improvements on train dataset
(Speedup -O3/ATOS: +13.97% when re-executed on REF dataset)
SPECFP2000* ARMv7 / U9540 19
Time
-OS
Speedup
-Os/-O3
Speedup
-Os/ATOS
Speedup
-O3/ATOS
177.mesa 47.1 27.9 54.4 20.7
179.art 22.3 0.3 9.2 8.9
183.equake 40.4 3.3 6.4 2.9
188.ammp 165.0 66.9 82.1 9.1
SPECfp +22.0% +34.4% +10.2%
* SPECINT2000 results and improvements on TRAI dataset
(Speedup -O3/ATOS: +6.53% when re-executed on REF dataset)
Compilation Technologies for ATOS
20
Combine and Auto Tune
Plugins
Fine Grain
Optimizations
FDO
Feedback
Directed
Optimizations
LTO
Link Time
OptimizationsOSE
Optimization
Space
Exploration
ISE
Inlining
Space
Exploration
SDO
Sample
Directed
Optimizations
LIPO
Light Weight
IPO
CLO
Cross Library
Optimizations
Use application training run to
drive optimizations
Perform optimizations across
object files
Perform optimizations across
libraries
Explore the space of possible
compiler optimizations mix
Explore the space of possible
inlining decisions
Continuously monitor user apps to
drive optimizations
Group related object files for
improving optimization search
Fully automated build/run optimizer (ATOS)
Compiler techno.
under improvement
or development
Well proven
technologies
ML
Machine
Learning Reuse past results for
accelerating search
Fine grain optimizations from
whole to per file to per function
21
3/16/16Presentation Title
21
Best size tradeoff
Used for cold functions
Improvements
while hot functions are discovered
Reference (-O3)
Current best perf tradeoff
8.44% speedup 15.89% reduction
ATOS function-by-function exploration
ATOS Status 22
• ATOS has been in production since 2 years
• ATOS FOSS-OUT is done (approval e/o February)
• ATOS to be made open source by March 2016 on GitHub
• License GPL v2 and later
• STMicroelectronics initial gatekeeper
ATOS Delivery Script 23
ATOS surdriver script to install in compiler directory:
#! /bin/bash
CC_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )“
ATOS_OPTS="-Os --param inline-unit-growth=90 --param large-
function-insns=4000 --param large-stack-frame-growth=3000 --param
max-inline-insns-auto=20 --param max-inline-insns-recursive-
auto=1000 --param max-inline-insns-recursive=300 --param max-
inline-insns-single=1200 --param max-inline-recursive-depth=8
-fearly-inlining -findirect-inlining -fno-inline-functions -fno-
inline-functions-called-once --param max-early-inliner-
iterations=1 --param iv-consider-all-candidates-bound=10 -fno-
aggressive-loop-optimizations -falign-loops -fno-move-loop-
invariants -fno-rerun-cse-after-loop -ftree-dce -fno-tree-loop-
distribution -fno-tree-loop-im -ftree-loop-optimize -fno-tree-
scev-cprop -ftree-slp-vectorize -fno-unroll-loops"
${CC_PATH}/sh4-linux-uclibc-gcc ${1+"$@"} ${ATOS_OPTS}
Thanks for your attention

More Related Content

What's hot

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
 
BKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleBKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleLinaro
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLinaro
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLinaro
 
5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - finalYutaka Kawai
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLinaro
 
BKK16-312 Integrating and controlling embedded devices in LAVA
BKK16-312 Integrating and controlling embedded devices in LAVABKK16-312 Integrating and controlling embedded devices in LAVA
BKK16-312 Integrating and controlling embedded devices in LAVALinaro
 
LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101Linaro
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLinaro
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLinaro
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement Linaro
 

What's hot (20)

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
BKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleBKK16-502 Suspend to Idle
BKK16-502 Suspend to Idle
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backporting
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
 
5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
BKK16-312 Integrating and controlling embedded devices in LAVA
BKK16-312 Integrating and controlling embedded devices in LAVABKK16-312 Integrating and controlling embedded devices in LAVA
BKK16-312 Integrating and controlling embedded devices in LAVA
 
LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need it
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
 

Similar to BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)

Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVELinaro
 
JVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark applicationJVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark applicationTatsuhiro Chiba
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
 
Web Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfWeb Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfSamHoney6
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMchiportal
 
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingSummit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingOPNFV
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityScyllaDB
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018Chun-Yu Tseng
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SGanesan Narayanasamy
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 

Similar to BKK16-308 The tool called Auto-Tuned Optimization System (ATOS) (20)

Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVE
 
JVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark applicationJVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark application
 
SDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLSSDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLS
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
 
Web Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfWeb Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdf
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBM
 
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingSummit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)

  • 1. ATOS introduction ST/Linaro Collaboration Context Presenter: Christian Bertin Development team: Rémi Duraffort, Christophe Guillon, François de Ferrière, Hervé Knochel, Antoine Moynault Consumer Product Division, Compilation team 2016/3/8
  • 2. Project Rationale • Optimizing large open source applications by hand is a dead-end Þ Sources and makefiles must be optimized as is • Compilers are tuned through heuristic settings based on standard benchmark average results (typically SPECS) Þ There is room for improvement if one takes application- and target-specific heuristics tuning (specialization) into account • Advanced compilation features (e.g Feedback-Driven or Inter- Procedural Optimization) are neither straightforward to use, nor easy to automate, thus in practice never used by developers Þ Compilers are under-exploited Develop a tool, which automatically tunes the compiler for a specific set of use cases, seamlessly applying advanced compilation optimization 2 Let the machine do the hard work (HiPEAC Roadmap)
  • 3. Auto Tuning Optimization System ATOS stands for: “Auto Tuning Optimization System” • Functional requirements: • Automatically find best C/C++ compiler configuration for a given objective. • Explore on top of any build system with a given set of executable use cases • Preserve original sources, makefiles and build scripts (they are not modified) • Application: • Find best performance / code size tradeoffs search for: • a given set of executables/libraries[/kernel modules] and • a given set of benchmarking use cases 3
  • 4. Optimization Challenges 4 WebCore V8 JITGtk Cairo Jpeg Pixman Directfb Identify Hot Code Enlarge Optimization scope Advanced Profiling Tools Inter Procedural and Cross Library Find best compiler tuning Iterative Optimization
  • 5. ATOS High Level Usage unmodified compile.sh unmodified compile.sh atos-explore Local or global Configuration Database Local or global Configuration Database exec.shexec.sh executable name executable name atos-play Optimized executable Optimized executableunmodified compile.sh unmodified compile.sh Extract of database Extract of database Developer or batch environment Release environment Unmodified compile.sh: actually any build system command: • make all • rpmbuild • build.sh exec.sh: actually a script running the application • make run • direct or remote execution • run.sh • emulated or on board
  • 6. Example of Preferred Configuration 6 Best trade-off Best size Best speed Get and replay chosen configuration
  • 7. ATOS Features • Whole executable optimization sequences • File by file exploration of optimization sequences • Function by function exploration with provided GCC plugin • Support for 'perf' and 'oprofile‘ tools • Seamless profiling feedback and link time optimizations support • Support in GCC 4.5 to 6.0 / LLVM 3.4 and 3.6 / ARM RVCT • Simple command line interface • Native or cross build and run 7
  • 8. ATOS Features (continued) Latest features/improvements (ATOS v4.0) • Kernel build support (*.ko) • Various exploration schemes: random, staged or genetic • Parallel build and execution (native or remote) capabilities • Parameters fine-tuning and flags pruning exploration 8
  • 9. Example of Exploration Usage 9 Initialization of session $ atos-init –b ./build.sh –r ./run.sh –p ./run- oprofile.sh First basic exploration O(12) $ atos-explore Whole program flags exploration O(M.G.3)[def: O(3000)] $ atos-explore-genetic [-M100] [--generations 10] File by file exploration O(H.N.36) [def: O(H.3600)] $ atos-explore-acf –file-by-file [-N100] Funct by funct exploration O(H.N.36) [def: O(H.3600)] $ atos-explore-acf [-N100] Show exploration graph $ atos-graph Where: O(…): complexity in number of build+run, H: number of determined Hot file (resp. function)
  • 10. Example of Replay Usage 10 Output best perf tradeoff $ atos-play –T speedup | sizered | target | variant_id +30.56% | -13.44% | sha-shatest-shacmp | bbc60eb968fb803987f4df19304e6a1e Output best size tradeoff $ atos-play –T –f size speedup | sizered | target | variant_id +13.57% | +3.58% | sha-shatest-shacmp | 7b9351c456fa303f8e3ef446e2028aff Compile for best perf tradeoff $ atos-play Building Variant [bbc60eb968fb803987f4df19304e6a1e]… Compile a selected variant $ atos-play –l 7b9351c456fa303f8e3ef446e2028aff Building Variant [7b9351c456fa303f8e3ef446e2028aff]…
  • 11. Video of exploration using parallel build/run features 11
  • 13. ATOS Optimization Results 13 • JPEG ST40 / HDK7108 results • 26.39% speedup / 13.37% size increase • ZLIB ST40 / HDK7108 results • 12.54% speedup / 1.41% size increase • Stagecraft ST40 / HDK7108 results • 5-28% speedup (30 benchs) / 14% size reduction • HEVC ARMv7 / Orly results • 9.22% speedup / 21.21% size reduction • SPEC2000 ARMv7 / U9540 results • 18.7% speedup SPECINT / 10.2% speedup SPECFP / size increase n/a
  • 14. JPEG ST40 / HDK7108 14 Best perf v2.0: 26.39% speedup 13.37% size increase -O3 -O2 -Os
  • 15. ZLIB ST40 / HDK7108 15 Best perf v2.0: 12.54% speedup 1.41% size increase -O3 -Os -O2
  • 16. StageCraft ST40 / HDK7108 16 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 REF BEST variant • Results: • Performance: 5-15 % speedup range in FPS, one at 28% • Size: 14% footprint reduction (out of 35Mb initial footprint) • Reference: build from CPD –O2/-O3 mix, compiler gcc 4.5.2 (no lto, no plugins) • Benchmark: Sagem provided benchmarks • Profile: 58 use cases, 30 of which not fully HW accelerated • Optimization scope Stagecraft binary and 21 accompanying shared objects • Outcome: Customer interested but actually the packages are provided by ST/CPD
  • 17. ATOS* HEVC** results Summary and robustness to source updates 17 Experiment Performance first speed-up size reduction 1. Initial exploration (HEVC version 1) (4618 runs) 6.18% 22.41% 2. Replay best after update (HEVC version 2) (0 more run) 7.55% 20.16% 3. More advanced fine-tuning on top of 2. (1267 additional runs) 9.22% 21.21% Speed-up and code size compared to reference configuration : -O3 build Performance measurement unit: CPU time (process user time) Benchmark: decoding of the 300 first frames of a typical stream*** * ATOS version: atos v2.0 ** HEVC version: HEVC full SW ARM Cortex, Optimized, 2 distinct versions *** stream: RA_LC_TotalRecallHotelTransylvania_1280x720_25_QP26.bin Source update without additional exploration Some more explorations
  • 18. SPECINT2000* ARMv7 / U9540 18 Time -OS Speedup -Os/-O3 Speedup -Os/ATOS Speedup -O3/ATOS 164.gzip 40.9 7.4 24.8 16.1 175.vpr 37.6 11.7 16.8 4.6 176.gcc 4.0 6.7 25.4 17.6 181.mcf 45.6 2.2 5.0 2.8 186.crafty 31.7 8.5 21.1 11.5 197.parser 12.1 5.3 29.3 22.8 252.eon 12.4 54.7 80.1 16.4 253.perlbmk 68.6 4.2 22.2 17.3 254.gap 7.0 0.7 8.6 7.8 255.vortex 15.4 2.5 85.6 81.2 256.bzip2 48.2 3.1 17.7 14.1 300.twolf 17.8 0.3 28.3 27.8 SPECint +8.2% +28.4% +18.7% * SPECINT2000 results and improvements on train dataset (Speedup -O3/ATOS: +13.97% when re-executed on REF dataset)
  • 19. SPECFP2000* ARMv7 / U9540 19 Time -OS Speedup -Os/-O3 Speedup -Os/ATOS Speedup -O3/ATOS 177.mesa 47.1 27.9 54.4 20.7 179.art 22.3 0.3 9.2 8.9 183.equake 40.4 3.3 6.4 2.9 188.ammp 165.0 66.9 82.1 9.1 SPECfp +22.0% +34.4% +10.2% * SPECINT2000 results and improvements on TRAI dataset (Speedup -O3/ATOS: +6.53% when re-executed on REF dataset)
  • 20. Compilation Technologies for ATOS 20 Combine and Auto Tune Plugins Fine Grain Optimizations FDO Feedback Directed Optimizations LTO Link Time OptimizationsOSE Optimization Space Exploration ISE Inlining Space Exploration SDO Sample Directed Optimizations LIPO Light Weight IPO CLO Cross Library Optimizations Use application training run to drive optimizations Perform optimizations across object files Perform optimizations across libraries Explore the space of possible compiler optimizations mix Explore the space of possible inlining decisions Continuously monitor user apps to drive optimizations Group related object files for improving optimization search Fully automated build/run optimizer (ATOS) Compiler techno. under improvement or development Well proven technologies ML Machine Learning Reuse past results for accelerating search Fine grain optimizations from whole to per file to per function
  • 21. 21 3/16/16Presentation Title 21 Best size tradeoff Used for cold functions Improvements while hot functions are discovered Reference (-O3) Current best perf tradeoff 8.44% speedup 15.89% reduction ATOS function-by-function exploration
  • 22. ATOS Status 22 • ATOS has been in production since 2 years • ATOS FOSS-OUT is done (approval e/o February) • ATOS to be made open source by March 2016 on GitHub • License GPL v2 and later • STMicroelectronics initial gatekeeper
  • 23. ATOS Delivery Script 23 ATOS surdriver script to install in compiler directory: #! /bin/bash CC_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )“ ATOS_OPTS="-Os --param inline-unit-growth=90 --param large- function-insns=4000 --param large-stack-frame-growth=3000 --param max-inline-insns-auto=20 --param max-inline-insns-recursive- auto=1000 --param max-inline-insns-recursive=300 --param max- inline-insns-single=1200 --param max-inline-recursive-depth=8 -fearly-inlining -findirect-inlining -fno-inline-functions -fno- inline-functions-called-once --param max-early-inliner- iterations=1 --param iv-consider-all-candidates-bound=10 -fno- aggressive-loop-optimizations -falign-loops -fno-move-loop- invariants -fno-rerun-cse-after-loop -ftree-dce -fno-tree-loop- distribution -fno-tree-loop-im -ftree-loop-optimize -fno-tree- scev-cprop -ftree-slp-vectorize -fno-unroll-loops" ${CC_PATH}/sh4-linux-uclibc-gcc ${1+"$@"} ${ATOS_OPTS}
  • 24. Thanks for your attention