SlideShare a Scribd company logo
OptView2
View and improve
compiler optimizations
Ofek Shilon
Istra Research
Jan 2022
@OfekShilon
1
Things I’d like the compiler to tell me:
• I couldn’t inline that function
…because I had only its declaration.
• I’m re-evaluating these expressions on every loop iteration
…because I couldn’t prove they stay fixed.
• I’m reloading this variable from memory many times
…because I don’t know whether unrelated code lines change it.
2
Some raw optimization data is exposed
• Clang/gcc: -Rpass
code.cc:4:25: remark: foo inlined into bar [-Rpass=inline]
• Intel: -qopt-report=[0..5]
• Microsoft (very partial):
/Qpar-report, /Qvec-report
3
4
Presentation Matters.
• opt-viewer, 2016 work by Adam Nemet (Apple) and others
• Focus on Clang, over Linux
5
Opt-Viewer Usage
• Build with an extra clang switch:
-fsave-optimization-record
*.opt.yaml files are generated, by
default in the obj folder.
• Generate htmls:
$ opt-viewer.py
--output-dir <htmls folder>
--source-dir <repo>
<yamls folder>
6
opt-viewer.py Sample Output
Inlining
context
Hotness
(PGO)
7
Opt-Viewer Usage
• Speed up yaml generation ~x50 by installing libyaml:
$ sudo apt install libyaml-dev
$ pip --no-cache-dir install --verbose --force-reinstall -I pyyaml
• The script would detect missing libyaml and suggest installing it:
“For faster parsing, you may want to install libyaml for PyYAL”
8
9
Great work, still not much traction
https://www.youtube.com/watch?v=qq0q1hfzidg
10
Great work, still not much traction
11
Why?
• Heavy
• High I/O
• High memory
• >1G htmls
• Designed (and presented) for compiler authors
• Mostly non actionable to developers
12
Introducing optview2
• https://github.com/OfekShilon/optview2
13
Target Developers, Not Compiler Authors
• Denoise:
• Collect only optimization failures
• Ignore system headers
• Display a single entry per type/source loc (in index)
• …
• ~1.5M lines ==> 22K lines
• Don’t mention ‘passes’
• Make the index sortable, resizable & pageable
• Abridged function names
• ...
14
Example - OpenCV
• file://wsl.localhost/Ubuntu-20.04/home/ofek/html/opencv-
clean/modules/photo/index.html
15
“Definition is unavailable”
16
!
17
“Clobbered by store”
• https://godbolt.org/z/K9a79MhcE
• Aliasing
• Forces reloading redistBatch on every loop iteration
18
Aliasing – the silent killer
• Distribution of 22K remarks in a C++ project:
19
20
“Clobbered by call”
• https://godbolt.org/z/bb9fTKj9d
• https://github.com/llvm/llvm-project/issues/53102
21
22
Aliasing: Counter-measures
• LTO ?
• __restrict__
• Non standard, tells the compiler an argument doesn’t alias with anything else
• __attribute__((pure)), __attribute__((const))
• Non standard, tell the compiler a function doesn’t write/read global state.
• Potentially resolve ‘clobbered by call’ opt failures,
• Strict-Aliasing
• The only C++ standard compliant way.
23
Strict Aliasing
• “Strict aliasing is an assumption made by the compiler, that
objects of different types will never refer to the same
memory location (i.e. alias each other.)”
Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
• Allowed optimization. on by default for -O2 +
• char* is an exception – always allowed to alias with any type.
• Can be disabled with –fno-strict-aliasing (pessimization!)
24
Strict Aliasing: Forcing type difference
• typedef / using ?
• Inherit int?
• “I considered …to allow derivation from built-in classes … However, I
restrained myself. … the C conversion rules are so chaotic that pretending that
int, short, etc., are well-behaved ordinary classes is not going to work. They
are either C compatible, or they obey the relatively well-behaved C++ rules
for classes, but not both.”
Bjarne Stroustrup, The Design and Evolution of C++, §15.11.3.
• Strong-Typedefs
• Wrappers
25
Strict Aliasing demos
• https://godbolt.org/z/qGdj4qoG1
• https://godbolt.org/z/zWfYdsorE
• https://godbolt.org/z/oxq1K3n8W
26
Demo 3: loop hoisting
27
• Check foreach!!
• Try to form a toy example
28
Demo 3: loop hoisting
29
30
GCC work
• https://gcc.gnu.org/legacy-ml/gcc-patches/2018-
05/msg01675.html
• https://github.com/davidmalcolm/gcc-opt-viewer
• YAML -> JSON.gz
• Actually a better choice
https://stackoverflow.com/questions/27743711/can-i-speedup-yaml
• Active only during 2018
• Currently seems unusable
https://github.com/davidmalcolm/gcc-opt-viewer/issues/3
31
Conclusions applicable to other compilers?
• In many cases – yes.
• Shows where to look
32
OptView2 Future work
• Improve filtering,
• Reduce run time & memory,
• Run on windows,
• Consume binary optimization remarks,
• Revive (possibly integrate) gcc-opt-viewer,
• Support code annotations,
• Report LLVM bugs.
33
Lots of help needed 
• https://github.com/Of
ekShilon/optview2
• ofekshilon@gmail.com
34
Bkp
35
llvm-opt-report
$ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
$ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
$ cat /tmp/v.lst
• https://reviews.llvm.org/D25262
• https://github.com/llvm/llvm-
project/tree/main/llvm/tools/llvm-opt-
report
36
Impact
https://github.com/opencv/opencv/wiki/HowToUsePerfTests
$ cd ~/src/opencv/modules/ts/misc/
$ python3 ./run.py ~/src/opencv-clean/build/ -w ~/logs-opencv-clean
$ python3 ./run.py ~/src/opencv-opt/build/ -w ~/logs-opencv-opt
$ python3 ./summary.py ~/logs-opencv-clean/core* ~/logs-opencv-opt/core*
-u mks -m median
• Seems substantial.
• Will publish more detailed steps in the optview2 github page (and
possibly fork opencv)
Detour
37
38

More Related Content

What's hot

Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithmAshish Karki
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
arupmalakar
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
Jia-Bin Huang
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
Shalish VJ
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
Alexey Ermakov
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
Databricks
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
Databricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision treePrafulla Shukla
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 

What's hot (20)

Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithm
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision tree
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 

Similar to OptView2 MUC meetup slides

OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022
Ofek Shilon
 
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
AzovDevMeetup 2016 | Angular 2: обзор | Александр ШевнинAzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
JSC “Arcadia Inc”
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone Civetta
CocoaHeads France
 
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
corehard_by
 
Effective C++
Effective C++Effective C++
Effective C++
Andrey Karpov
 
Использование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложенийИспользование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложений
Vitebsk Miniq
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ihor Banadiga
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
Tomas Doran
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
AdaCore
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programs
Greg Banks
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
Ivan Krylov
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
DECK36
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Tim Callaghan
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
nkpart
 
C++ in kernel mode
C++ in kernel modeC++ in kernel mode
C++ in kernel mode
corehard_by
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part II
Eugene Lazutkin
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
Alison Chaiken
 

Similar to OptView2 MUC meetup slides (20)

OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022
 
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
AzovDevMeetup 2016 | Angular 2: обзор | Александр ШевнинAzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone Civetta
 
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
 
Effective C++
Effective C++Effective C++
Effective C++
 
Использование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложенийИспользование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложений
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
 
Linux programming
Linux programmingLinux programming
Linux programming
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programs
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
 
C++ in kernel mode
C++ in kernel modeC++ in kernel mode
C++ in kernel mode
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part II
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
 

Recently uploaded

Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 

Recently uploaded (20)

Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 

OptView2 MUC meetup slides

  • 1. OptView2 View and improve compiler optimizations Ofek Shilon Istra Research Jan 2022 @OfekShilon 1
  • 2. Things I’d like the compiler to tell me: • I couldn’t inline that function …because I had only its declaration. • I’m re-evaluating these expressions on every loop iteration …because I couldn’t prove they stay fixed. • I’m reloading this variable from memory many times …because I don’t know whether unrelated code lines change it. 2
  • 3. Some raw optimization data is exposed • Clang/gcc: -Rpass code.cc:4:25: remark: foo inlined into bar [-Rpass=inline] • Intel: -qopt-report=[0..5] • Microsoft (very partial): /Qpar-report, /Qvec-report 3
  • 4. 4
  • 5. Presentation Matters. • opt-viewer, 2016 work by Adam Nemet (Apple) and others • Focus on Clang, over Linux 5
  • 6. Opt-Viewer Usage • Build with an extra clang switch: -fsave-optimization-record *.opt.yaml files are generated, by default in the obj folder. • Generate htmls: $ opt-viewer.py --output-dir <htmls folder> --source-dir <repo> <yamls folder> 6
  • 8. Opt-Viewer Usage • Speed up yaml generation ~x50 by installing libyaml: $ sudo apt install libyaml-dev $ pip --no-cache-dir install --verbose --force-reinstall -I pyyaml • The script would detect missing libyaml and suggest installing it: “For faster parsing, you may want to install libyaml for PyYAL” 8
  • 9. 9
  • 10. Great work, still not much traction https://www.youtube.com/watch?v=qq0q1hfzidg 10
  • 11. Great work, still not much traction 11
  • 12. Why? • Heavy • High I/O • High memory • >1G htmls • Designed (and presented) for compiler authors • Mostly non actionable to developers 12
  • 14. Target Developers, Not Compiler Authors • Denoise: • Collect only optimization failures • Ignore system headers • Display a single entry per type/source loc (in index) • … • ~1.5M lines ==> 22K lines • Don’t mention ‘passes’ • Make the index sortable, resizable & pageable • Abridged function names • ... 14
  • 15. Example - OpenCV • file://wsl.localhost/Ubuntu-20.04/home/ofek/html/opencv- clean/modules/photo/index.html 15
  • 17. ! 17
  • 18. “Clobbered by store” • https://godbolt.org/z/K9a79MhcE • Aliasing • Forces reloading redistBatch on every loop iteration 18
  • 19. Aliasing – the silent killer • Distribution of 22K remarks in a C++ project: 19
  • 20. 20
  • 21. “Clobbered by call” • https://godbolt.org/z/bb9fTKj9d • https://github.com/llvm/llvm-project/issues/53102 21
  • 22. 22
  • 23. Aliasing: Counter-measures • LTO ? • __restrict__ • Non standard, tells the compiler an argument doesn’t alias with anything else • __attribute__((pure)), __attribute__((const)) • Non standard, tell the compiler a function doesn’t write/read global state. • Potentially resolve ‘clobbered by call’ opt failures, • Strict-Aliasing • The only C++ standard compliant way. 23
  • 24. Strict Aliasing • “Strict aliasing is an assumption made by the compiler, that objects of different types will never refer to the same memory location (i.e. alias each other.)” Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html • Allowed optimization. on by default for -O2 + • char* is an exception – always allowed to alias with any type. • Can be disabled with –fno-strict-aliasing (pessimization!) 24
  • 25. Strict Aliasing: Forcing type difference • typedef / using ? • Inherit int? • “I considered …to allow derivation from built-in classes … However, I restrained myself. … the C conversion rules are so chaotic that pretending that int, short, etc., are well-behaved ordinary classes is not going to work. They are either C compatible, or they obey the relatively well-behaved C++ rules for classes, but not both.” Bjarne Stroustrup, The Design and Evolution of C++, §15.11.3. • Strong-Typedefs • Wrappers 25
  • 26. Strict Aliasing demos • https://godbolt.org/z/qGdj4qoG1 • https://godbolt.org/z/zWfYdsorE • https://godbolt.org/z/oxq1K3n8W 26
  • 27. Demo 3: loop hoisting 27
  • 28. • Check foreach!! • Try to form a toy example 28
  • 29. Demo 3: loop hoisting 29
  • 30. 30
  • 31. GCC work • https://gcc.gnu.org/legacy-ml/gcc-patches/2018- 05/msg01675.html • https://github.com/davidmalcolm/gcc-opt-viewer • YAML -> JSON.gz • Actually a better choice https://stackoverflow.com/questions/27743711/can-i-speedup-yaml • Active only during 2018 • Currently seems unusable https://github.com/davidmalcolm/gcc-opt-viewer/issues/3 31
  • 32. Conclusions applicable to other compilers? • In many cases – yes. • Shows where to look 32
  • 33. OptView2 Future work • Improve filtering, • Reduce run time & memory, • Run on windows, • Consume binary optimization remarks, • Revive (possibly integrate) gcc-opt-viewer, • Support code annotations, • Report LLVM bugs. 33
  • 34. Lots of help needed  • https://github.com/Of ekShilon/optview2 • ofekshilon@gmail.com 34
  • 36. llvm-opt-report $ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record $ llvm-opt-report /tmp/v.yaml > /tmp/v.lst $ cat /tmp/v.lst • https://reviews.llvm.org/D25262 • https://github.com/llvm/llvm- project/tree/main/llvm/tools/llvm-opt- report 36
  • 37. Impact https://github.com/opencv/opencv/wiki/HowToUsePerfTests $ cd ~/src/opencv/modules/ts/misc/ $ python3 ./run.py ~/src/opencv-clean/build/ -w ~/logs-opencv-clean $ python3 ./run.py ~/src/opencv-opt/build/ -w ~/logs-opencv-opt $ python3 ./summary.py ~/logs-opencv-clean/core* ~/logs-opencv-opt/core* -u mks -m median • Seems substantial. • Will publish more detailed steps in the optview2 github page (and possibly fork opencv) Detour 37
  • 38. 38

Editor's Notes

  1. Very little known LLVM feature, I really think can add value to your day to day work. Certainly if you’re working in clang I’ll present first the LLVM existing opt-viewer tool, then optview2 which is derived work by me, And about half the time would be dedicated to demonstrations of this tool in use. With specific focus on aliasing issues.
  2. I want to gain visibility into compiler optimizations. More specifically, I want to see where the compiler tried optimizations and failed – there might be something I could do about it.
  3. LLVM was already instrumented to show optimization choices (for –Rpass), Hal Finkel wrapped the data in yaml format for machine consumption. Adam Nemet wrote the opt-viewer python scripts
  4. A few small python scripts that process the compiler-emitted remarks into annotated source htmls, and an index page – which is an optimization dashboard of sorts.
  5. Don’t know, guessing I stopped it at 60G mem Ran it on separate subsets of a real project An extra step is needed to make this usable to us Let’s take that extra step.
  6. The demos are of results of the optview2 version, but let it be said that the entire heavy lifting is done by the original code Credit to Ilan Ben Hagai for javascript wizardry
  7. Full opt-viewer, not just failures
  8. file:///C:/optview2/html/opencv4/modules_imgproc_src_clahe.cpp.html#L201 One of the darker corners corners of C++, already full of dark corners Some guesswork in interpreting the optview2 reports
  9. Restrict –compiles on non arguments, didn’t see any impact
  10. Turning on strict-aliasing means allowing the compiler to assume exactly that. Turning it off is a pessimization What can be done? Force type difference
  11. file:///C:/optview2/html/opencv2/_home_ofek_src_opencv_modules_imgproc_src_canny.cpp.html#L419
  12. This is not good C++ code Don’t do this for fun. Do this in code where you *really* care about cycle-level performance. Hopefully compilers would get good enough to not have to do this.
  13. Probably If your project *can* be built by clang – I suggest you give it a go.
  14. Report LLVM bugs: alias analysis bugs had no observable symptoms until now. There are no diagnostics emitted on them and they don’t result in bad codegen So, I suspect there are plenty of them.
  15. Tests done on WSL2, sys calls have overhead – but in both repos