SlideShare a Scribd company logo
1 of 52
1
Part 1: Tools
2
3
-Rpass
-Rpass sample output
4
5
-Rpass
llvm-opt-report
llvm-opt-report
• https://reviews.llvm.org/D25262
• https://github.com/llvm/llvm-
project/tree/main/llvm/tools/llvm-
opt-report
6
7
-Rpass
llvm-opt-report
opt-viewer
opt-viewer sample output
8
opt-viewer
• 2016 work led by Adam
Nemet (Apple)
https://www.youtube.com/watch?v=qq0q1hfzidg
• Part of LLVM master:
https://github.com/llvm/llvm-
project/tree/main/llvm/tools/opt-viewer
• Downloadable via deb pkg:
llvm-14-tools
9
opt-viewer Usage
• Build with an extra clang switch:
-fsave-optimization-record
*.opt.yaml files are generated, by
default in the obj folder.
• Generate htmls:
$ opt-viewer.py
--output-dir <htmls folder>
--source-dir <repo>
<yamls folder>
10
opt-viewer additions over -Rpass
Inlining
context
Hotness
(PGO)
Great work, but
• Heavy
• High I/O
• High memory
• >1G htmls
• Designed (and presented) for compiler authors
• Mostly non actionable to developers
12
13
-Rpass
llvm-opt-report
opt-viewer
OptView2
Introducing OptView2
• https://github.com/OfekShilon/optview2
14
Target Developers, Not Compiler Authors
• Denoise:
• Collect only optimization failures
• By default no system headers
• Remove duplicities,
• Filter comment types via config file/command line
• ~1.5M lines ==> 22K lines
• Include column info (location within line)
• split-to-subfolders
• Sortable, resizable & pageable index
• ...
15
Example OptView2 outputs
• https://ofekshilon.github.io/optview2-opencv/
• https://ofekshilon.github.io/optview2-cpython/
• https://ofekshilon.github.io/optview2_mujoco/
16
Example OptView2 outputs
17
(Mostly) available in godbolt!
18
Part 2: Usage
19
1. Inlining
https://ofekshilon.github.io/optview2-opencv/core/modules_core_include_opencv2_core_dualquaternion.inl.hpp.html#L80
20
2. "Clobbered by store"
https://godbolt.org/z/T7h4nK3G7
21
2. "Clobbered by store"
22
2. "Clobbered by store"
23
2. "Clobbered by store"
• “Strict aliasing is an assumption made by the compiler, that
objects of different types will never refer to the same
memory location (i.e. alias each other.)”
Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
• Perhaps can be ‘weaponized’ to communicate non-aliasing to
the compiler?
24
2. "Clobbered by store"
• Maybe we can force artificial type-diff?
Say, through some implementation of strong-typedef?
• In practice, compilers are struggling.
• Clang issue: https://github.com/llvm/llvm-project/issues/54646
25
3. “Clobbered by call”
https://godbolt.org/z/jG5jq7c9a
26
3. “Clobbered by call”
27
• Cheating?.. pure + returns void somefunc() – does nothing, removed entirely.
• If returned non-void – wouldn’t work (clang issue: https://github.com/llvm/llvm-project/issues/53102)
3. “Clobbered by call”
28
• Whateva() called only once, result copied to 2 other places
3. “Clobbered by call”
29
3. “Clobbered by call”
30
3. “Clobbered by call”
Sometimes the offending call is standard! https://godbolt.org/z/81319zq1E
31
4. “Failed to move load with loop invariant address”
https://godbolt.org/z/YGc83TMnj
32
Cheat Sheet
Symptom Probable
cause
Action
Inlining Failure Add header / forceinline /
increase threshold
"Clobbered by store" Aliasing restrict / force type diff
"Clobbered by load" Escape Attributes pure / const /
noescape (typically before the remark site)
"Failed to move load
loop invariant"
Aliasing /
Escape
All the above + copy to local
* Don’t
understand?
Reduce to bare minimum in godbolt.
Might be a compiler limitation.
33
Part 3: Other Compilers
34
GCC work
• https://gcc.gnu.org/legacy-ml/gcc-patches/2018-
05/msg01675.html
• https://github.com/davidmalcolm/gcc-opt-viewer
35
GCC work
• https://dmalcolm.fedorapeople.org/gcc/2018-05-18/pgo-demo-test/pgo-demo-test/
36
GCC work
• Active only during 2018
• Still at prototype quality
• Compilation might consume 10G+ RAM per single file
• Python scripts often break
• Opened two bugs, one solved in my fork
37
Decorations across compilers
* Pertains also to locals
** Decorates a function return value
38
clang gcc icc msvc
__restrict V V __restrict *
__declspec(restrict) **
__attribute__((pure)) V - -
__attribute__((const)) V V __declspec(noalias)
__attribute__((noescape)) - - -
Decorations across compilers
• `Hedley` (https://github.com/nemequ/Hedley) is a single header including
cross-compiler wrappers like:
#if HEDLEY_HAS_ATTRIBUTE(noescape)
# define HEDLEY_NO_ESCAPE __attribute__((__noescape__))
#else
# define HEDLEY_NO_ESCAPE
#endif
• Known limitation: noalias (check if still applicable:
https://github.com/nemequ/hedley/issues/54)
• Can also look there for analogues in other compilers (Sun pragmas etc.)
39
OptView2 with LTO
• Different Usage:
• Build with LTO, use –v to dump the list of obj files used (containing only IR)
• LLVM includes the tool llvm-lto. Use like this:
$ llvm-lto -lto-pass-remarks-output=<yaml outputpath>
-j=10 -O=3 <obj files list>
• Creates a single huge yaml. No parallelism in creation or consumption
by optview2.
• Can somewhat reduce remarks volume with -lto-pass-remarks-
filter=<regex>
• Hard to get meaningful results for a large project.
40
OptView2 with LTO
• Inlining -> non-issue.
• Escape & Aliasing – still very much an issue.
• “inter-procedural analyses are often less precise, due to uncertainty
stemming from unknown outside callers… In LLVM, intra-procedural
analyses are dominating in numbers and potential. The existing inter-
procedural analyses mostly try to limit the possible effects of function
calls and simplify the caller-callee interface through propagation of
constants.. “
(Doerfert, Homerding, Finkel 2019)
41
Impact?
• Personal experience: 6 µs -> 4.6 µs
• PETOSPA: …. Optimistic Static Program Annotations (Doerfert, Homerding,
Finkel 2019)
https://github.com/jdoerfert/PETOSPA/blob/master/ISC19.pdf
• ~15%-20% speedup
• ORAQL: Optimistic Responses to Alias Queries in LLVM (Hückelheim,
Doerfert 2021)
https://www.youtube.com/watch?v=7UVB5AFJM1w
• No impact
• HTO: ... Optimization via Annotated Headers (Moses, Doerfert 2019)
https://www.youtube.com/watch?v=elmio6AoyK0
• ~50% of full LTO gains
42
Recommendations
• Concentrate on known bottlenecks,
• Invest when you
• work at sub-millisecond scale, or
• in very tight loops.
43
Bottom line
• The compiler can talk to you.
• You can learn to listen.
• And even answer.
• Sometimes.
44
Come join the party!
• https://github.com/OfekShilon/optview2
ofekshilon@gmail.com
45
@OfekShilon
46
Weaponizing Strict Aliasing:
Forcing type difference
• Strong-Typedefs
• Typical motivation: enhancing overload resolution and type safety
• Despite some attempts (eg ‘opaque typedefs’), no standard solution
• A handful of
libraries exist,
all using
wrappers:
47
Strong-typedefs impact
• Improves optimization:
• https://godbolt.org/z/r8aWfMGfx
• Degrades optimization:
• https://godbolt.org/z/fe95sdrnx
• Clang issue: https://github.com/llvm/llvm-project/issues/54646
• Improves again: (enum classes hack, for integer-likes only)
• https://godbolt.org/z/4nejY3dKs
48
4. “Failed to move load with loop invariant
address”
• Foreach or other <algorithm>s?
• In this toy example – identical code.
• https://godbolt.org/z/jYWhG6zWc
• Occasionally different, not always better.
49
Impact?
• Distribution in a real C++ project:
50
OptView2 wish list
• Reduce run time & memory
• Run on windows
• Consume binary optimization remarks
• Revive (possibly integrate) gcc-opt-viewer
51
Compiler wish list
• -opt-remarks filtering, as in –Rpass (just failures, pass filter)
• Pass to LTO linking phase
• Enable opt-remarks for other languages?
• Curious about rust aliasing behavior
• Enhance ways to communicate non-aliasing
• Accept ‘rectrict’ on local varibales, as in msvc?
• Report names where available, not just types (“i32”)
• Generate remarks during escape analysis passes
• When reporting clobbering (==aliasing), differentiate “concrete
potential flow with aliasing found” and “couldn’t prove anything”.
52

More Related Content

What's hot

CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021whywaita
 
BuildKitの概要と最近の機能
BuildKitの概要と最近の機能BuildKitの概要と最近の機能
BuildKitの概要と最近の機能Kohei Tokunaga
 
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...whywaita
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backendMin-Yih Hsu
 
Cargo makeを使ってみた話
Cargo makeを使ってみた話Cargo makeを使ってみた話
Cargo makeを使ってみた話emakryo
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapPadraig O'Sullivan
 
PHPとシグナル、その裏側
PHPとシグナル、その裏側PHPとシグナル、その裏側
PHPとシグナル、その裏側do_aki
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Machine configoperatorのちょっとイイかもしれない話
Machine configoperatorのちょっとイイかもしれない話 Machine configoperatorのちょっとイイかもしれない話
Machine configoperatorのちょっとイイかもしれない話 Toshihiro Araki
 
LLVM Instruction Selection
LLVM Instruction SelectionLLVM Instruction Selection
LLVM Instruction SelectionShiva Chen
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
 
異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j昌桓 李
 
ZABBIXでメトリクス監視の話
ZABBIXでメトリクス監視の話ZABBIXでメトリクス監視の話
ZABBIXでメトリクス監視の話kenjiskywalkerslide
 
IoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache FlinkIoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache FlinkTakanori Suzuki
 
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】WESEEKWESEEK
 
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Schedulermatthewrdale
 

What's hot (20)

CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
 
BuildKitの概要と最近の機能
BuildKitの概要と最近の機能BuildKitの概要と最近の機能
BuildKitの概要と最近の機能
 
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...
Development myshoes and Provide Cycloud-hosted runner -- GitHub Actions with ...
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
 
Cargo makeを使ってみた話
Cargo makeを使ってみた話Cargo makeを使ってみた話
Cargo makeを使ってみた話
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
PHPとシグナル、その裏側
PHPとシグナル、その裏側PHPとシグナル、その裏側
PHPとシグナル、その裏側
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Machine configoperatorのちょっとイイかもしれない話
Machine configoperatorのちょっとイイかもしれない話 Machine configoperatorのちょっとイイかもしれない話
Machine configoperatorのちょっとイイかもしれない話
 
Plan 9のお話
Plan 9のお話Plan 9のお話
Plan 9のお話
 
Linux Namespaces
Linux NamespacesLinux Namespaces
Linux Namespaces
 
LLVM Instruction Selection
LLVM Instruction SelectionLLVM Instruction Selection
LLVM Instruction Selection
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j
 
ZABBIXでメトリクス監視の話
ZABBIXでメトリクス監視の話ZABBIXでメトリクス監視の話
ZABBIXでメトリクス監視の話
 
IoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache FlinkIoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache Flink
 
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】
Stripeを使った簡単なサブスク型課金サービスの作り方【WESEEK Tech Conf #15】
 
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Scheduler
 
HashMapとは?
HashMapとは?HashMapとは?
HashMapとは?
 

Similar to OptView2 - C++ on Sea 2022

OptView2 MUC meetup slides
OptView2 MUC meetup slidesOptView2 MUC meetup slides
OptView2 MUC meetup slidesOfek Shilon
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ihor Banadiga
 
The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...Arnaud Joly
 
Advanced windows debugging
Advanced windows debuggingAdvanced windows debugging
Advanced windows debuggingchrisortman
 
Python testing like a pro by Keith Yang
Python testing like a pro by Keith YangPython testing like a pro by Keith Yang
Python testing like a pro by Keith YangPYCON MY PLT
 
Colab workshop (for Computer vision Students)
Colab workshop (for Computer vision Students)Colab workshop (for Computer vision Students)
Colab workshop (for Computer vision Students)Asim Hameed Khan
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAndrii Soldatenko
 
Packaging perl (LPW2010)
Packaging perl (LPW2010)Packaging perl (LPW2010)
Packaging perl (LPW2010)p3castro
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureAdrian Otto
 
On the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangOn the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangChris McEniry
 
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...Hackito Ergo Sum
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERNeotys
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ihor Banadiga
 
Package management and creation in Gentoo Linux
Package management and creation in Gentoo LinuxPackage management and creation in Gentoo Linux
Package management and creation in Gentoo LinuxDonnie Berkholz
 
Open Source Tools for Leveling Up Operations FOSSET 2014
Open Source Tools for Leveling Up Operations FOSSET 2014Open Source Tools for Leveling Up Operations FOSSET 2014
Open Source Tools for Leveling Up Operations FOSSET 2014Mandi Walls
 

Similar to OptView2 - C++ on Sea 2022 (20)

OptView2 MUC meetup slides
OptView2 MUC meetup slidesOptView2 MUC meetup slides
OptView2 MUC meetup slides
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
 
The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...
 
Advanced windows debugging
Advanced windows debuggingAdvanced windows debugging
Advanced windows debugging
 
Python testing like a pro by Keith Yang
Python testing like a pro by Keith YangPython testing like a pro by Keith Yang
Python testing like a pro by Keith Yang
 
Colab workshop (for Computer vision Students)
Colab workshop (for Computer vision Students)Colab workshop (for Computer vision Students)
Colab workshop (for Computer vision Students)
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environments
 
Packaging perl (LPW2010)
Packaging perl (LPW2010)Packaging perl (LPW2010)
Packaging perl (LPW2010)
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable Infrastructure
 
On the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangOn the Edge Systems Administration with Golang
On the Edge Systems Administration with Golang
 
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
 
re-find: fun with spec
re-find: fun with specre-find: fun with spec
re-find: fun with spec
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging Workshop
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
 
Package management and creation in Gentoo Linux
Package management and creation in Gentoo LinuxPackage management and creation in Gentoo Linux
Package management and creation in Gentoo Linux
 
Switching to Git
Switching to GitSwitching to Git
Switching to Git
 
Open Source Tools for Leveling Up Operations FOSSET 2014
Open Source Tools for Leveling Up Operations FOSSET 2014Open Source Tools for Leveling Up Operations FOSSET 2014
Open Source Tools for Leveling Up Operations FOSSET 2014
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

OptView2 - C++ on Sea 2022

  • 1. 1
  • 9. opt-viewer • 2016 work led by Adam Nemet (Apple) https://www.youtube.com/watch?v=qq0q1hfzidg • Part of LLVM master: https://github.com/llvm/llvm- project/tree/main/llvm/tools/opt-viewer • Downloadable via deb pkg: llvm-14-tools 9
  • 10. opt-viewer Usage • Build with an extra clang switch: -fsave-optimization-record *.opt.yaml files are generated, by default in the obj folder. • Generate htmls: $ opt-viewer.py --output-dir <htmls folder> --source-dir <repo> <yamls folder> 10
  • 11. opt-viewer additions over -Rpass Inlining context Hotness (PGO)
  • 12. Great work, but • Heavy • High I/O • High memory • >1G htmls • Designed (and presented) for compiler authors • Mostly non actionable to developers 12
  • 15. Target Developers, Not Compiler Authors • Denoise: • Collect only optimization failures • By default no system headers • Remove duplicities, • Filter comment types via config file/command line • ~1.5M lines ==> 22K lines • Include column info (location within line) • split-to-subfolders • Sortable, resizable & pageable index • ... 15
  • 16. Example OptView2 outputs • https://ofekshilon.github.io/optview2-opencv/ • https://ofekshilon.github.io/optview2-cpython/ • https://ofekshilon.github.io/optview2_mujoco/ 16
  • 18. (Mostly) available in godbolt! 18
  • 21. 2. "Clobbered by store" https://godbolt.org/z/T7h4nK3G7 21
  • 22. 2. "Clobbered by store" 22
  • 23. 2. "Clobbered by store" 23
  • 24. 2. "Clobbered by store" • “Strict aliasing is an assumption made by the compiler, that objects of different types will never refer to the same memory location (i.e. alias each other.)” Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html • Perhaps can be ‘weaponized’ to communicate non-aliasing to the compiler? 24
  • 25. 2. "Clobbered by store" • Maybe we can force artificial type-diff? Say, through some implementation of strong-typedef? • In practice, compilers are struggling. • Clang issue: https://github.com/llvm/llvm-project/issues/54646 25
  • 26. 3. “Clobbered by call” https://godbolt.org/z/jG5jq7c9a 26
  • 27. 3. “Clobbered by call” 27 • Cheating?.. pure + returns void somefunc() – does nothing, removed entirely. • If returned non-void – wouldn’t work (clang issue: https://github.com/llvm/llvm-project/issues/53102)
  • 28. 3. “Clobbered by call” 28 • Whateva() called only once, result copied to 2 other places
  • 29. 3. “Clobbered by call” 29
  • 30. 3. “Clobbered by call” 30
  • 31. 3. “Clobbered by call” Sometimes the offending call is standard! https://godbolt.org/z/81319zq1E 31
  • 32. 4. “Failed to move load with loop invariant address” https://godbolt.org/z/YGc83TMnj 32
  • 33. Cheat Sheet Symptom Probable cause Action Inlining Failure Add header / forceinline / increase threshold "Clobbered by store" Aliasing restrict / force type diff "Clobbered by load" Escape Attributes pure / const / noescape (typically before the remark site) "Failed to move load loop invariant" Aliasing / Escape All the above + copy to local * Don’t understand? Reduce to bare minimum in godbolt. Might be a compiler limitation. 33
  • 34. Part 3: Other Compilers 34
  • 37. GCC work • Active only during 2018 • Still at prototype quality • Compilation might consume 10G+ RAM per single file • Python scripts often break • Opened two bugs, one solved in my fork 37
  • 38. Decorations across compilers * Pertains also to locals ** Decorates a function return value 38 clang gcc icc msvc __restrict V V __restrict * __declspec(restrict) ** __attribute__((pure)) V - - __attribute__((const)) V V __declspec(noalias) __attribute__((noescape)) - - -
  • 39. Decorations across compilers • `Hedley` (https://github.com/nemequ/Hedley) is a single header including cross-compiler wrappers like: #if HEDLEY_HAS_ATTRIBUTE(noescape) # define HEDLEY_NO_ESCAPE __attribute__((__noescape__)) #else # define HEDLEY_NO_ESCAPE #endif • Known limitation: noalias (check if still applicable: https://github.com/nemequ/hedley/issues/54) • Can also look there for analogues in other compilers (Sun pragmas etc.) 39
  • 40. OptView2 with LTO • Different Usage: • Build with LTO, use –v to dump the list of obj files used (containing only IR) • LLVM includes the tool llvm-lto. Use like this: $ llvm-lto -lto-pass-remarks-output=<yaml outputpath> -j=10 -O=3 <obj files list> • Creates a single huge yaml. No parallelism in creation or consumption by optview2. • Can somewhat reduce remarks volume with -lto-pass-remarks- filter=<regex> • Hard to get meaningful results for a large project. 40
  • 41. OptView2 with LTO • Inlining -> non-issue. • Escape & Aliasing – still very much an issue. • “inter-procedural analyses are often less precise, due to uncertainty stemming from unknown outside callers… In LLVM, intra-procedural analyses are dominating in numbers and potential. The existing inter- procedural analyses mostly try to limit the possible effects of function calls and simplify the caller-callee interface through propagation of constants.. “ (Doerfert, Homerding, Finkel 2019) 41
  • 42. Impact? • Personal experience: 6 µs -> 4.6 µs • PETOSPA: …. Optimistic Static Program Annotations (Doerfert, Homerding, Finkel 2019) https://github.com/jdoerfert/PETOSPA/blob/master/ISC19.pdf • ~15%-20% speedup • ORAQL: Optimistic Responses to Alias Queries in LLVM (Hückelheim, Doerfert 2021) https://www.youtube.com/watch?v=7UVB5AFJM1w • No impact • HTO: ... Optimization via Annotated Headers (Moses, Doerfert 2019) https://www.youtube.com/watch?v=elmio6AoyK0 • ~50% of full LTO gains 42
  • 43. Recommendations • Concentrate on known bottlenecks, • Invest when you • work at sub-millisecond scale, or • in very tight loops. 43
  • 44. Bottom line • The compiler can talk to you. • You can learn to listen. • And even answer. • Sometimes. 44
  • 45. Come join the party! • https://github.com/OfekShilon/optview2 ofekshilon@gmail.com 45 @OfekShilon
  • 46. 46
  • 47. Weaponizing Strict Aliasing: Forcing type difference • Strong-Typedefs • Typical motivation: enhancing overload resolution and type safety • Despite some attempts (eg ‘opaque typedefs’), no standard solution • A handful of libraries exist, all using wrappers: 47
  • 48. Strong-typedefs impact • Improves optimization: • https://godbolt.org/z/r8aWfMGfx • Degrades optimization: • https://godbolt.org/z/fe95sdrnx • Clang issue: https://github.com/llvm/llvm-project/issues/54646 • Improves again: (enum classes hack, for integer-likes only) • https://godbolt.org/z/4nejY3dKs 48
  • 49. 4. “Failed to move load with loop invariant address” • Foreach or other <algorithm>s? • In this toy example – identical code. • https://godbolt.org/z/jYWhG6zWc • Occasionally different, not always better. 49
  • 50. Impact? • Distribution in a real C++ project: 50
  • 51. OptView2 wish list • Reduce run time & memory • Run on windows • Consume binary optimization remarks • Revive (possibly integrate) gcc-opt-viewer 51
  • 52. Compiler wish list • -opt-remarks filtering, as in –Rpass (just failures, pass filter) • Pass to LTO linking phase • Enable opt-remarks for other languages? • Curious about rust aliasing behavior • Enhance ways to communicate non-aliasing • Accept ‘rectrict’ on local varibales, as in msvc? • Report names where available, not just types (“i32”) • Generate remarks during escape analysis passes • When reporting clobbering (==aliasing), differentiate “concrete potential flow with aliasing found” and “couldn’t prove anything”. 52

Editor's Notes

  1. Clang-only 
  2. Replace members with locals NOT good c++ code!
  3. Definitely needs some lovin, Hopefully can still be resurrected.
  4. Report LLVM bugs: alias analysis bugs had no observable symptoms until now. There are no diagnostics emitted on them and they don’t result in bad codegen So, I suspect there are plenty of them.
  5. At least 4 committee papers try to engage with aliasing in the language. Alias-set, provenance