SlideShare a Scribd company logo
1©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Using Qualcomm® Snapdragon™
LLVM compiler to optimize apps for
32 and 64 Bit
Zino Benaissa
Engineer, Principal/Manager
Qualcomm Innovation Center, Inc.
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
3©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Outline
• Introduction
• Coding guidelines for performance
• LLVM optimization pragmas
• LLVM internal flags
• Summary
4©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Introduction
5©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Software engineering
Software applications are growing exponentially
• Software quality and security
− Many tools to fight bugs, scrutinize source code for security holes. LLVM community is developing such
tools:
− Static analyzer
− Sanitizers:
− Address
− Undefined behavior
− Loop coverage tools
• Performance
− Well, hardware/compilers are smart and they are!
− But often performance goals are not met. In this case programmers are on their own
− Costly analysis is required
− Ad hoc methods are used
− Inspection of assembly code and code rewrite
6©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Compilers
Compilers are formidable tools
• They have evolved along with the hardware evolution
− Superscalar, SIMD, multi-core, 64 bits
• Typical industrial compiler includes over hundred optimizations
• Many powerful optimizations has been actively researched and developed to target
hardware features
− Loop auto-vectorization targeting SIMD execution unit
− Loop auto-parallelization targeting multi-cores
• Work correctly on any program
• Produce fast code
• Maximize utilization of hardware capabilities
Programmer expectations
7©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Compilers
Compilers are just programs. Programmers should be aware
• Contains thousands bugs like any other large software
• Optimizations have limitations
− Can fail to apply on legitimate piece of code
• Lack “expected” optimization
− No assumption of what the compiler will do
• Systematic but typically unable to infer critical knowledge of domain experts
8©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Compilers
The good news: minor rewrites of source code often trigger optimizations
• Following simple coding guidelines can significantly increase compiler effectiveness
• Compiler knows why an optimization did not apply
− The LLVM community is actively developing optimization reporting feature targeted for release 3.6
− The Snapdragon LLVM team are extending this feature
− Early preview of this feature is possible
9©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Coding Guidelines for
Performance
10©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Sample code included in this presentation is made available subject to The Clear BSD License
Copyright (c) 2014 Qualcomm Innovation Center, Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the
disclaimer below) provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the distribution.
* Neither the name of Qualcomm Innovation Center, Inc. nor the names of its contributors may be used to endorse or promote
products derived from this software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE
IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
11©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 1
void foo(int *A) {
for (int i = 0; i < computeN(); i++)
A[i] += 1;
}
12©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 1: Make the loop trip count known
Loop Rewrite to
void foo(int *A) {
for (int i = 0; i < computeN(); i++)
A[i] += 1;
}
void foo(int *A) {
int n = computeN();
for (int i = 0; i < n; i++)
A[i] += 1;
}
computeN() need to be evaluated every loop iteration computeN() is evaluated only once
13©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 2
void foo(int *myArray, unsigned n) {
for (unsigned i = 0; i < n; i += 2)
myArray[i] += 1;
}
14©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 2: Use signed type
Loop Rewrite to
void foo(int *myArray, unsigned n) {
for (unsigned i = 0; i < n; i += 2)
myArray[i] += 1;
}
void foo(int *myArray, unsigned n) {
for (int i = 0; i < n; i += 2)
myArray[i] += 1;
}
Unsigned type has modulo (wrap) semantic. Because variable i can
overflow, compiler cannot assume it executes n iterations
Overflow of signed type is undefined. Compiler assumes loop
counter never overflows.
15©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 3
void foo(MyStruct *s) {
for (int i = 0; i < s->NumElm; i++)
s->MyArray[i] += 1;
}
16©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 3: Beware of pointer aliasing
Loop Rewrite to
void foo(MyStruct *s) {
for (int i = 0; i < s->NumElm; i++)
s->MyArray[i] += 1;
}
void foo(MyStruct *s) {
int n = s->NumElm;
for (int i = 0; i < n; i++)
s->MyArray[i] += 1;
}
Programmer should not assume that the compiler will be able to hoist
s->NumElm
Compiler knows the number of loop iterations
17©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guidelines 4
typedef struct {
int **b;
} S;
void foo(S *A) {
for (int i = 0; i < 100; i++)
A->b[i] = nullptr;
}
18©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 4: Hoist complex pointer indirections
Loop Rewrite to
typedef struct {
int **b;
} S;
void foo(S *A) {
for (int i = 0; i < 100; i++)
A->b[i] = nullptr;
}
typedef struct {
int **b;
} S;
void foo(S *A) {
int **ptr = A->b;
for (int i = 0; i < 100; i++)
ptr[i] = nullptr;
}
A->b is evaluated every iterations
If there are more that 2 levels of pointer/struct indirections. Hoist
outside loop.
19©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 5
void foo(int *A, int *B) {
for (int i = 0; i < 100; i++)
A[i] += B[i];
}
20©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 5: Use restrict keyword
Loop Rewrite to
void foo(int *A, int *B) {
for (int i = 0; i < 100; i++)
A[i] += B[i];
}
void foo(int *__restrict A,
int *__restrict B) {
for (int i = 0; i < 100; i++)
A[i] += B[i];
}
The loop cannot be parallelized because the compiler has to worry
about 1 case: A is pointing to B[i+1]
Tells the compiler that A and B are pointing to separate arrays.
LLVM vectorizes this loop without restrict. It generates run time
checks to verify A and B are not overlapping
21©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 6
void foo(int *A, int n, int m) {
for (int i = 0; i < n ; i++) {
for (int j = 0; j < m ; j++) {
if (j != m - 1)
*A |= 1;
if (i != n – 1)
*A |= 2;
if (j != 0)
*A |= 4;
if (i != 0)
*A |= 8;
A++;
}
}
}
Most elements of A will be set with
*A | 15
Last iteration excluded
First iteration excluded
22©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 6: Avoid complex control-flow
Loop Rewrite to
void foo(int *A, int n, int m) {
for (int i = 0; i < n ; i++) {
for (int j = 0; j < m ; j++) {
if (j != m - 1)
*A |= 1;
if (i != n – 1)
*A |= 2;
if (j != 0)
*A |= 4;
if (i != 0)
*A |= 8;
A++;
}
}
}
void foo(int *A, int n, int m) {
// Handle cases n == 1 and m == 1
// Peel iteration when i is 0
// Most executed loop
for (i = 1; i < n - 1; i++) {
*A++ |= 11; /* iter j = 0 */
for (int j = 1; j < m - 1; j++)
*A++ |= 15;
*A++ |= 14; /* iter j = m - 1 */
}
// Peel iteration i = n – 1
}
Last and first iterations are peeled
Most common
executed code
©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
23©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
LLVM Optimization
Pragmas
24©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 7: Use pragma vectorize
Loop
void foo(int *A, int n) {
for (int i = 0; i < n % 4; i++)
A[i] += 1;
}
Loop has too few iterations
25©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 7: Use pragma vectorize
Loop Rewrite to
void foo(int *A, int n) {
for (int i = 0; i < n % 4; i++)
A[i] += 1;
}
void foo(int *A, int n) {
#pragma clang loop vectorize(disable)
for (int i = 0; i < n % 4; i++)
A[i] += 1;
}
Compiler often has no way to know n is less than three
Beware pragma often are target dependent. Apply only to intended target
Pragmas override command line flags
Programmer cannot assume the compiler will figure out that
loop has at least four iterations
pragmas will be supported in the upcoming Snapdragon LLVM release
26©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Guideline 7: Use pragma vectorize
Example 2
Loop Rewrite to
void foo(char *A, int n) {
n = min(14, n);
for (int i = 0; i < n; i++)
A[i] += 1;
}
void foo(char *A, int n) {
n = min(14, n);
#pragma clang loop vectorize_width(8)
for (int i = 0; i < n; i++)
A[i] += 1;
}
Compiler is unaware there is at most 15 iterations.
It will attempt to vectorize using a factor of 16 to fill
ARM/NEON registers (128 bits)
Compiler will vectorize using a factor 8. When n >= 8, vector instructions
are used.
27©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
LLVM Internal Flags
28©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
LLVM hidden optimization flags
• Compiler utilizes various heuristics and optimization threshold
− Preset depending on optimization level
• Many optimizations are experimental and remain turned off
• Controlled by command line compiler flags
− “clang –help-hidden” displays all available flags
• Difficult to utilize them
− Can significantly accelerate specific pieces of code
− Unsafe to use in general
• Typically reserved to advanced programmers and compiler developers
− In future, compiler reporting to suggest usage of a subset of these flags
29©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Summary
• Coding guidelines can make compilers significantly more effective
− Significant speed up
• Guidelines are only useful while the code remains readable
− Avoid obscure and complex source changes
• Use Domain expert knowledge
− LLVM supported pragmas
• Snapdragon LLVM compiler available at Qualcomm Developer Nework
30©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
For more information on Qualcomm, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United
States and other countries, used with permission. Uplinq is a trademark of Qualcomm
Incorporated, used with permission. Other products and brand names may be trademarks or
registered trademarks of their respective owners of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm
Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate
structure, as applicable.
Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of
its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm
Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering,
research and development functions, and substantially all of its product and services businesses,
including its semiconductor business, QCT.
Thank you FOLLOW US ON:

More Related Content

Similar to App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

Common NonStop security hacks and how to avoid them
Common NonStop security hacks and how to avoid themCommon NonStop security hacks and how to avoid them
Common NonStop security hacks and how to avoid themGreg Swedosh
 
The Knock Knock Protocol
The Knock Knock ProtocolThe Knock Knock Protocol
The Knock Knock Protocol
adil raja
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723Iftach Ian Amit
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failure
Rogue Wave Software
 
Remote Command Execution
Remote Command ExecutionRemote Command Execution
Remote Command Execution
adil raja
 
IRJET- A Study on Penetration Testing using Metasploit Framework
IRJET- A Study on Penetration Testing using Metasploit FrameworkIRJET- A Study on Penetration Testing using Metasploit Framework
IRJET- A Study on Penetration Testing using Metasploit Framework
IRJET Journal
 
Penetration testing using metasploit framework
Penetration testing using metasploit frameworkPenetration testing using metasploit framework
Penetration testing using metasploit framework
PawanKesharwani
 
Lifetime total cost of ownership of an Application
Lifetime total cost of ownership of an ApplicationLifetime total cost of ownership of an Application
Lifetime total cost of ownership of an Application
Aspire Systems
 
AWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFxAWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFx
SignalFx
 
Mathematically Guaranteed C and C++ Code
Mathematically Guaranteed C and C++ CodeMathematically Guaranteed C and C++ Code
Mathematically Guaranteed C and C++ Code
Pauline Schellenberger
 
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
Jitendra Bafna
 
Top Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and ComplianceTop Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and Compliance
Precisely
 
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdfAttacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
ssuser8b461f
 
Introduction to MultisimCircuitSimulation.pdf
Introduction to MultisimCircuitSimulation.pdfIntroduction to MultisimCircuitSimulation.pdf
Introduction to MultisimCircuitSimulation.pdf
KadiriIbrahim2
 
Open stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshareOpen stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshare
Sumit Naiksatam
 
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
lisanl
 
Sydney mule soft meetup 30 april 2020
Sydney mule soft meetup   30 april 2020Sydney mule soft meetup   30 april 2020
Sydney mule soft meetup 30 april 2020
Royston Lobo
 
ANET SureLog SIEM IntelligentResponse
ANET SureLog  SIEM IntelligentResponseANET SureLog  SIEM IntelligentResponse
ANET SureLog SIEM IntelligentResponse
Ertugrul Akbas
 
Software Attacks for Embedded, Mobile, and Internet of Things
Software Attacks for Embedded, Mobile, and Internet of ThingsSoftware Attacks for Embedded, Mobile, and Internet of Things
Software Attacks for Embedded, Mobile, and Internet of Things
TechWell
 

Similar to App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android (20)

Common NonStop security hacks and how to avoid them
Common NonStop security hacks and how to avoid themCommon NonStop security hacks and how to avoid them
Common NonStop security hacks and how to avoid them
 
The Knock Knock Protocol
The Knock Knock ProtocolThe Knock Knock Protocol
The Knock Knock Protocol
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failure
 
Remote Command Execution
Remote Command ExecutionRemote Command Execution
Remote Command Execution
 
IRJET- A Study on Penetration Testing using Metasploit Framework
IRJET- A Study on Penetration Testing using Metasploit FrameworkIRJET- A Study on Penetration Testing using Metasploit Framework
IRJET- A Study on Penetration Testing using Metasploit Framework
 
Penetration testing using metasploit framework
Penetration testing using metasploit frameworkPenetration testing using metasploit framework
Penetration testing using metasploit framework
 
Rsockets ofa12
Rsockets ofa12Rsockets ofa12
Rsockets ofa12
 
Lifetime total cost of ownership of an Application
Lifetime total cost of ownership of an ApplicationLifetime total cost of ownership of an Application
Lifetime total cost of ownership of an Application
 
AWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFxAWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFx
 
Mathematically Guaranteed C and C++ Code
Mathematically Guaranteed C and C++ CodeMathematically Guaranteed C and C++ Code
Mathematically Guaranteed C and C++ Code
 
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
MuleSoft Surat Virtual Meetup#31 - Async API, Process Error, Circuit Breaker ...
 
Top Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and ComplianceTop Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and Compliance
 
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdfAttacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
Attacks-From-a-New-Front-Door-in-4G-5G-Mobile-Networks.pdf
 
Introduction to MultisimCircuitSimulation.pdf
Introduction to MultisimCircuitSimulation.pdfIntroduction to MultisimCircuitSimulation.pdf
Introduction to MultisimCircuitSimulation.pdf
 
Open stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshareOpen stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshare
 
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0
 
Sydney mule soft meetup 30 april 2020
Sydney mule soft meetup   30 april 2020Sydney mule soft meetup   30 april 2020
Sydney mule soft meetup 30 april 2020
 
ANET SureLog SIEM IntelligentResponse
ANET SureLog  SIEM IntelligentResponseANET SureLog  SIEM IntelligentResponse
ANET SureLog SIEM IntelligentResponse
 
Software Attacks for Embedded, Mobile, and Internet of Things
Software Attacks for Embedded, Mobile, and Internet of ThingsSoftware Attacks for Embedded, Mobile, and Internet of Things
Software Attacks for Embedded, Mobile, and Internet of Things
 

More from Qualcomm Developer Network

How to take advantage of XR over 5G: Understanding XR Viewers
How to take advantage of XR over 5G: Understanding XR ViewersHow to take advantage of XR over 5G: Understanding XR Viewers
How to take advantage of XR over 5G: Understanding XR Viewers
Qualcomm Developer Network
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
Qualcomm Developer Network
 
What consumers want in their next XR device
What consumers want in their next XR deviceWhat consumers want in their next XR device
What consumers want in their next XR device
Qualcomm Developer Network
 
More Immersive XR through Split-Rendering
More Immersive XR through Split-RenderingMore Immersive XR through Split-Rendering
More Immersive XR through Split-Rendering
Qualcomm Developer Network
 
Making an on-device personal assistant a reality
Making an on-device personal assistant a realityMaking an on-device personal assistant a reality
Making an on-device personal assistant a reality
Qualcomm Developer Network
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
Qualcomm Developer Network
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
Qualcomm Developer Network
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
Qualcomm Developer Network
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
Qualcomm Developer Network
 
Connected Lighting: The Next Frontier in the Internet of Everything
Connected Lighting: The Next Frontier in the Internet of EverythingConnected Lighting: The Next Frontier in the Internet of Everything
Connected Lighting: The Next Frontier in the Internet of Everything
Qualcomm Developer Network
 
Bring Out the Best in Embedded Computing
Bring Out the Best in Embedded ComputingBring Out the Best in Embedded Computing
Bring Out the Best in Embedded Computing
Qualcomm Developer Network
 
Android Tools for Qualcomm Snapdragon Processors
Android Tools for Qualcomm Snapdragon Processors Android Tools for Qualcomm Snapdragon Processors
Android Tools for Qualcomm Snapdragon Processors
Qualcomm Developer Network
 
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Developer Network
 
LTE Broadcast/Multicast for Live Events & More
LTE Broadcast/Multicast for Live Events & More LTE Broadcast/Multicast for Live Events & More
LTE Broadcast/Multicast for Live Events & More
Qualcomm Developer Network
 
The Fundamentals of Internet of Everything Connectivity
The Fundamentals of Internet of Everything ConnectivityThe Fundamentals of Internet of Everything Connectivity
The Fundamentals of Internet of Everything Connectivity
Qualcomm Developer Network
 
The Future Mobile Security
The Future Mobile Security The Future Mobile Security
The Future Mobile Security
Qualcomm Developer Network
 
Get Educated on Education Apps
Get Educated on Education Apps Get Educated on Education Apps
Get Educated on Education Apps
Qualcomm Developer Network
 
Bringing Mobile Vision to Wearables
Bringing Mobile Vision to Wearables Bringing Mobile Vision to Wearables
Bringing Mobile Vision to Wearables
Qualcomm Developer Network
 
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
Qualcomm Developer Network
 
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
Qualcomm Developer Network
 

More from Qualcomm Developer Network (20)

How to take advantage of XR over 5G: Understanding XR Viewers
How to take advantage of XR over 5G: Understanding XR ViewersHow to take advantage of XR over 5G: Understanding XR Viewers
How to take advantage of XR over 5G: Understanding XR Viewers
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
What consumers want in their next XR device
What consumers want in their next XR deviceWhat consumers want in their next XR device
What consumers want in their next XR device
 
More Immersive XR through Split-Rendering
More Immersive XR through Split-RenderingMore Immersive XR through Split-Rendering
More Immersive XR through Split-Rendering
 
Making an on-device personal assistant a reality
Making an on-device personal assistant a realityMaking an on-device personal assistant a reality
Making an on-device personal assistant a reality
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 4
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 3
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 2
 
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
Developing for Industrial IoT with Linux OS on DragonBoard™ 410c: Session 1
 
Connected Lighting: The Next Frontier in the Internet of Everything
Connected Lighting: The Next Frontier in the Internet of EverythingConnected Lighting: The Next Frontier in the Internet of Everything
Connected Lighting: The Next Frontier in the Internet of Everything
 
Bring Out the Best in Embedded Computing
Bring Out the Best in Embedded ComputingBring Out the Best in Embedded Computing
Bring Out the Best in Embedded Computing
 
Android Tools for Qualcomm Snapdragon Processors
Android Tools for Qualcomm Snapdragon Processors Android Tools for Qualcomm Snapdragon Processors
Android Tools for Qualcomm Snapdragon Processors
 
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform
 
LTE Broadcast/Multicast for Live Events & More
LTE Broadcast/Multicast for Live Events & More LTE Broadcast/Multicast for Live Events & More
LTE Broadcast/Multicast for Live Events & More
 
The Fundamentals of Internet of Everything Connectivity
The Fundamentals of Internet of Everything ConnectivityThe Fundamentals of Internet of Everything Connectivity
The Fundamentals of Internet of Everything Connectivity
 
The Future Mobile Security
The Future Mobile Security The Future Mobile Security
The Future Mobile Security
 
Get Educated on Education Apps
Get Educated on Education Apps Get Educated on Education Apps
Get Educated on Education Apps
 
Bringing Mobile Vision to Wearables
Bringing Mobile Vision to Wearables Bringing Mobile Vision to Wearables
Bringing Mobile Vision to Wearables
 
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
Introduction to Qualcomm Vuforia Mobile Vision Platform: Toy Recognition
 
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
Using Qualcomm Vuforia to Build Breakthrough Mobile Experiences
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

App Optimizations Using Qualcomm Snapdragon LLVM Compiler for Android

  • 1. 1©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
  • 2. Using Qualcomm® Snapdragon™ LLVM compiler to optimize apps for 32 and 64 Bit Zino Benaissa Engineer, Principal/Manager Qualcomm Innovation Center, Inc. Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
  • 3. 3©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Outline • Introduction • Coding guidelines for performance • LLVM optimization pragmas • LLVM internal flags • Summary
  • 4. 4©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Introduction
  • 5. 5©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Software engineering Software applications are growing exponentially • Software quality and security − Many tools to fight bugs, scrutinize source code for security holes. LLVM community is developing such tools: − Static analyzer − Sanitizers: − Address − Undefined behavior − Loop coverage tools • Performance − Well, hardware/compilers are smart and they are! − But often performance goals are not met. In this case programmers are on their own − Costly analysis is required − Ad hoc methods are used − Inspection of assembly code and code rewrite
  • 6. 6©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Compilers Compilers are formidable tools • They have evolved along with the hardware evolution − Superscalar, SIMD, multi-core, 64 bits • Typical industrial compiler includes over hundred optimizations • Many powerful optimizations has been actively researched and developed to target hardware features − Loop auto-vectorization targeting SIMD execution unit − Loop auto-parallelization targeting multi-cores • Work correctly on any program • Produce fast code • Maximize utilization of hardware capabilities Programmer expectations
  • 7. 7©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Compilers Compilers are just programs. Programmers should be aware • Contains thousands bugs like any other large software • Optimizations have limitations − Can fail to apply on legitimate piece of code • Lack “expected” optimization − No assumption of what the compiler will do • Systematic but typically unable to infer critical knowledge of domain experts
  • 8. 8©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Compilers The good news: minor rewrites of source code often trigger optimizations • Following simple coding guidelines can significantly increase compiler effectiveness • Compiler knows why an optimization did not apply − The LLVM community is actively developing optimization reporting feature targeted for release 3.6 − The Snapdragon LLVM team are extending this feature − Early preview of this feature is possible
  • 9. 9©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Coding Guidelines for Performance
  • 10. 10©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Sample code included in this presentation is made available subject to The Clear BSD License Copyright (c) 2014 Qualcomm Innovation Center, Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Qualcomm Innovation Center, Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • 11. 11©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 1 void foo(int *A) { for (int i = 0; i < computeN(); i++) A[i] += 1; }
  • 12. 12©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 1: Make the loop trip count known Loop Rewrite to void foo(int *A) { for (int i = 0; i < computeN(); i++) A[i] += 1; } void foo(int *A) { int n = computeN(); for (int i = 0; i < n; i++) A[i] += 1; } computeN() need to be evaluated every loop iteration computeN() is evaluated only once
  • 13. 13©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 2 void foo(int *myArray, unsigned n) { for (unsigned i = 0; i < n; i += 2) myArray[i] += 1; }
  • 14. 14©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 2: Use signed type Loop Rewrite to void foo(int *myArray, unsigned n) { for (unsigned i = 0; i < n; i += 2) myArray[i] += 1; } void foo(int *myArray, unsigned n) { for (int i = 0; i < n; i += 2) myArray[i] += 1; } Unsigned type has modulo (wrap) semantic. Because variable i can overflow, compiler cannot assume it executes n iterations Overflow of signed type is undefined. Compiler assumes loop counter never overflows.
  • 15. 15©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 3 void foo(MyStruct *s) { for (int i = 0; i < s->NumElm; i++) s->MyArray[i] += 1; }
  • 16. 16©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 3: Beware of pointer aliasing Loop Rewrite to void foo(MyStruct *s) { for (int i = 0; i < s->NumElm; i++) s->MyArray[i] += 1; } void foo(MyStruct *s) { int n = s->NumElm; for (int i = 0; i < n; i++) s->MyArray[i] += 1; } Programmer should not assume that the compiler will be able to hoist s->NumElm Compiler knows the number of loop iterations
  • 17. 17©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guidelines 4 typedef struct { int **b; } S; void foo(S *A) { for (int i = 0; i < 100; i++) A->b[i] = nullptr; }
  • 18. 18©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 4: Hoist complex pointer indirections Loop Rewrite to typedef struct { int **b; } S; void foo(S *A) { for (int i = 0; i < 100; i++) A->b[i] = nullptr; } typedef struct { int **b; } S; void foo(S *A) { int **ptr = A->b; for (int i = 0; i < 100; i++) ptr[i] = nullptr; } A->b is evaluated every iterations If there are more that 2 levels of pointer/struct indirections. Hoist outside loop.
  • 19. 19©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 5 void foo(int *A, int *B) { for (int i = 0; i < 100; i++) A[i] += B[i]; }
  • 20. 20©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 5: Use restrict keyword Loop Rewrite to void foo(int *A, int *B) { for (int i = 0; i < 100; i++) A[i] += B[i]; } void foo(int *__restrict A, int *__restrict B) { for (int i = 0; i < 100; i++) A[i] += B[i]; } The loop cannot be parallelized because the compiler has to worry about 1 case: A is pointing to B[i+1] Tells the compiler that A and B are pointing to separate arrays. LLVM vectorizes this loop without restrict. It generates run time checks to verify A and B are not overlapping
  • 21. 21©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 6 void foo(int *A, int n, int m) { for (int i = 0; i < n ; i++) { for (int j = 0; j < m ; j++) { if (j != m - 1) *A |= 1; if (i != n – 1) *A |= 2; if (j != 0) *A |= 4; if (i != 0) *A |= 8; A++; } } } Most elements of A will be set with *A | 15 Last iteration excluded First iteration excluded
  • 22. 22©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 6: Avoid complex control-flow Loop Rewrite to void foo(int *A, int n, int m) { for (int i = 0; i < n ; i++) { for (int j = 0; j < m ; j++) { if (j != m - 1) *A |= 1; if (i != n – 1) *A |= 2; if (j != 0) *A |= 4; if (i != 0) *A |= 8; A++; } } } void foo(int *A, int n, int m) { // Handle cases n == 1 and m == 1 // Peel iteration when i is 0 // Most executed loop for (i = 1; i < n - 1; i++) { *A++ |= 11; /* iter j = 0 */ for (int j = 1; j < m - 1; j++) *A++ |= 15; *A++ |= 14; /* iter j = m - 1 */ } // Peel iteration i = n – 1 } Last and first iterations are peeled Most common executed code ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
  • 23. 23©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. LLVM Optimization Pragmas
  • 24. 24©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 7: Use pragma vectorize Loop void foo(int *A, int n) { for (int i = 0; i < n % 4; i++) A[i] += 1; } Loop has too few iterations
  • 25. 25©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 7: Use pragma vectorize Loop Rewrite to void foo(int *A, int n) { for (int i = 0; i < n % 4; i++) A[i] += 1; } void foo(int *A, int n) { #pragma clang loop vectorize(disable) for (int i = 0; i < n % 4; i++) A[i] += 1; } Compiler often has no way to know n is less than three Beware pragma often are target dependent. Apply only to intended target Pragmas override command line flags Programmer cannot assume the compiler will figure out that loop has at least four iterations pragmas will be supported in the upcoming Snapdragon LLVM release
  • 26. 26©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Guideline 7: Use pragma vectorize Example 2 Loop Rewrite to void foo(char *A, int n) { n = min(14, n); for (int i = 0; i < n; i++) A[i] += 1; } void foo(char *A, int n) { n = min(14, n); #pragma clang loop vectorize_width(8) for (int i = 0; i < n; i++) A[i] += 1; } Compiler is unaware there is at most 15 iterations. It will attempt to vectorize using a factor of 16 to fill ARM/NEON registers (128 bits) Compiler will vectorize using a factor 8. When n >= 8, vector instructions are used.
  • 27. 27©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. LLVM Internal Flags
  • 28. 28©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. LLVM hidden optimization flags • Compiler utilizes various heuristics and optimization threshold − Preset depending on optimization level • Many optimizations are experimental and remain turned off • Controlled by command line compiler flags − “clang –help-hidden” displays all available flags • Difficult to utilize them − Can significantly accelerate specific pieces of code − Unsafe to use in general • Typically reserved to advanced programmers and compiler developers − In future, compiler reporting to suggest usage of a subset of these flags
  • 29. 29©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Summary • Coding guidelines can make compilers significantly more effective − Significant speed up • Guidelines are only useful while the code remains readable − Avoid obscure and complex source changes • Use Domain expert knowledge − LLVM supported pragmas • Snapdragon LLVM compiler available at Qualcomm Developer Nework
  • 30. 30©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries, used with permission. Uplinq is a trademark of Qualcomm Incorporated, used with permission. Other products and brand names may be trademarks or registered trademarks of their respective owners of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you FOLLOW US ON: