SlideShare a Scribd company logo
1 of 7
Download to read offline
OpenCL Kernel
Optimization Tips
Champ Yen (champ.yen@gmail.com)
http://champyen.blogspot.com
ver.20140820
Optimization - a form of balance
Device/Platform
Features
Runtime
Toolchain
Problem
Algorithm
Optimization
Optimization is not only greedy
searching in single direction. It is
more like to find a good balance
point between device, toolchain
and the problem.
Device - Computation
● device type
○ cpu - powerful single thread performance
○ gpu - many threads, great total throughput
● ISA design
○ scalar-based
○ vector-based
● # of compute unit/processing elements
● estimate impact of using divergence & barrier
● capability of asynchronous data transfer
Device - Memory
● get basic memory characteristics:
○ size
○ latency
○ throughput
○ coalescing effect
○ addressing mode
● global memory - unified or not
● local memory - real or not
● penalty of oversize
Toolchain/Runtime
● document/tutorial/guide for debugging, profiling and optimization.
● there is no perfect runtime/toolchain
● profiling/debugging tools.
● it is not always a good idea to debug/optimization on different
platforms.
● automatic optimization MAY NOT HELP the thinking of optimization
● tricky forms of computation/memory operations.
○ MAD operations
○ memory access mode
Problem/Algorithms
● DATA PARALLEL!
● multi-stages is not always bad.
○ doing all things together uses more memory resource in one workitem.
● vectorized is not always a good idea
● use appropriate work group size
○ bad memory access pattern, less coalescing
○ may cause lower cache hit rate
○ less local memory for each workitem
○ may be less private memory for each workitem.
● different form of implementation
● do optimization things manually.
○ DO NOT relies on automatic features.
Q & A

More Related Content

Similar to OpenCL Kernel Optimization Tips

Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
Mark Veltzer
 

Similar to OpenCL Kernel Optimization Tips (20)

Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Ad109 - XPages Performance and Scalability
Ad109 - XPages Performance and ScalabilityAd109 - XPages Performance and Scalability
Ad109 - XPages Performance and Scalability
 
Programming for Problem Solving
Programming for Problem Solving Programming for Problem Solving
Programming for Problem Solving
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Multicore
MulticoreMulticore
Multicore
 
2016-01-16 03 Денис Нелюбин. How to test a million
2016-01-16 03 Денис Нелюбин. How to test a million2016-01-16 03 Денис Нелюбин. How to test a million
2016-01-16 03 Денис Нелюбин. How to test a million
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Concurrency - Why it's hard ?
Concurrency - Why it's hard ?
 
C3 w3
C3 w3C3 w3
C3 w3
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStack
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in Amoeba
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysis
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
 

More from Champ Yen (6)

Halide tutorial 2019
Halide tutorial 2019Halide tutorial 2019
Halide tutorial 2019
 
Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack
 
OpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionOpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming Introduction
 
Chrome OS Observation
Chrome OS ObservationChrome OS Observation
Chrome OS Observation
 
Play With Android
Play With AndroidPlay With Android
Play With Android
 
Linux Porting
Linux PortingLinux Porting
Linux Porting
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Recently uploaded (20)

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

OpenCL Kernel Optimization Tips

  • 1. OpenCL Kernel Optimization Tips Champ Yen (champ.yen@gmail.com) http://champyen.blogspot.com ver.20140820
  • 2. Optimization - a form of balance Device/Platform Features Runtime Toolchain Problem Algorithm Optimization Optimization is not only greedy searching in single direction. It is more like to find a good balance point between device, toolchain and the problem.
  • 3. Device - Computation ● device type ○ cpu - powerful single thread performance ○ gpu - many threads, great total throughput ● ISA design ○ scalar-based ○ vector-based ● # of compute unit/processing elements ● estimate impact of using divergence & barrier ● capability of asynchronous data transfer
  • 4. Device - Memory ● get basic memory characteristics: ○ size ○ latency ○ throughput ○ coalescing effect ○ addressing mode ● global memory - unified or not ● local memory - real or not ● penalty of oversize
  • 5. Toolchain/Runtime ● document/tutorial/guide for debugging, profiling and optimization. ● there is no perfect runtime/toolchain ● profiling/debugging tools. ● it is not always a good idea to debug/optimization on different platforms. ● automatic optimization MAY NOT HELP the thinking of optimization ● tricky forms of computation/memory operations. ○ MAD operations ○ memory access mode
  • 6. Problem/Algorithms ● DATA PARALLEL! ● multi-stages is not always bad. ○ doing all things together uses more memory resource in one workitem. ● vectorized is not always a good idea ● use appropriate work group size ○ bad memory access pattern, less coalescing ○ may cause lower cache hit rate ○ less local memory for each workitem ○ may be less private memory for each workitem. ● different form of implementation ● do optimization things manually. ○ DO NOT relies on automatic features.