SlideShare a Scribd company logo
Kevin O’Leary – Intel Developer Products Division Technical Consulting Engineer
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
2
Programming Languages by Popularity
Pythonremains #1
programming language in
hiring demand
followed by Java and C++
Go and Scala
demonstrate strong
growth for last 2 years
* Source: CodeEval, Feb 2015
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
3
Programming Languages Productivity
0
1
2
3
4
5
Python Java C++ C
PROGRAMMING COMPLEXITY
(HOURS)
Prechelt* Berkholz**
0
1
2
3
4
Python Java C++ C
LANGUAGE VERBOSITY
(LOC/FEATURE)
Prechelt* Berkholz**
* L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer,
2000, Vol. 33, Issue 10, pp. 23-29
** RedMonk - D.Berkholz, Programming languages ranked by expressiveness
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
4
Low Overhead Sampling
 Accurate performance data without high
overhead instrumentation
 Launch application or attach to a running
process
Precise Line Level Details
 No guessing, see source line level detail
Mixed Python / native C, C++, Fortran…
 Optimize native code driven by Python
Profile Python & Go!
And Mixed Python / C++ / Fortran
New!
Gettingyourpythoncodetorunfaster
Usingintel®vtune™amplifierxe
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Accurate Data - Low Overhead
 CPU, GPU, FPU, threading, bandwidth…
Meaningful Analysis
 Threading, OpenMP region efficiency
 Memory access, storage device
Easy
 Data displayed on the source code
 Easy set-up, no special compiles
Faster, Scalable Code, Faster
Intel® VTune™ Amplifier Performance Profiler
http://intel.ly/vtune-amplifier-xe
For Windows* and Linux* From $899
(UI only now available on OS X*)
Claire Cates
Principal Developer
SAS Institute Inc.
“Last week, Intel® VTune™ Amplifier
helped us find almost 3X
performance improvement. This
week it helped us improve the
performance another 3X.”
6
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Intel® VTune™ Amplifier
Tune Applications for Scalable Multicore Performance
Agenda
 Why python optimization
is important
 How do you find places
that need optimization
 Overview of profilers
 Profiling Python using
VTune Amplifier
 Mixed mode profiling
 Summary
7
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Why do you need Python optimization?
Python is used to power a wide range of software, including those where
application performance matters.
• web server code
• complex automation scripts (even build systems)
• scientific calculations, etc.
Python allows you to quickly write code that may not scale well, but you won’t
know it unless you give it enough workload to process.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
9
How do you find the places that need
optimization?
Code examination
 Easiest in terms of you don’t need any tools except code editor
 Difficult in practice, also assumes you know how certain code constructs
perform
 This might not work for even moderately large code base because there is
just too much information to grasp.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
10
How do you find the places that need
optimization? (continued)
Logging
 Done by making special version of source code augmented with timestamp
logging
 Involves looking at the logs trying to find the part of your code that is slow.
 This analysis can be tedious and it also involves changing your source.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
11
How do you find the places that need
optimization? (continued)
Profiling
 Profiling is gathering some metrics on how your application works under
certain workloads
 In this paper we will be focused on CPU hotspot profiling. Finding places in
your code that consume a lot of CPU cycles.
 In theory you could also profile other interesting cases such as waiting on a
lock, memory consumption, etc. (not currently implemented in VTune™
Amplifier)
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
12
Overview of existing profilers
There are three basic types of profilers:
• Event
Event based profilers collect data when certain events occur. For example on
function entry/exit or when classes are loaded/unloaded, etc. The built-in
Python profiler cProfile is an example of an event based profiler.
• Instrumentation
In an instrumentation based profiler the target application is modified and
basically the application profiles itself. This can be done by manually modifying
the application or by support built inside the compiler.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
13
Overview of existing profilers (contd)
• Statistical
A statistically based profiler samples data at regular intervals. The hottest
functions should be at the top of the sample distribution. This type of profiling
provides approximate results but are much less intrusive on the target
application. Profiling overhead is also much less workload dependent. Intel®
VTune™ Amplifier is an example of a statistically based profiler.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Tool Description Platforms Profile level Avg. overhead *
Intel® VTune™
Amplifier
• Rich GUI viewer
• Mixed C/C++/Python
code
Windows
Linux
Line ~1.1-1.6x
cProfile (built-in) • Text interactive mode:
“pstats” (built-in)
• GUI viewer:
RunSnakeRun (Open
Source)
• PyCharm
Any Function 1.3x-5x
Python Tools • Visual Studio (2010+)
• Open Source
Windows Function ~2x
line_profiler • Pure Python
• Open Source
• Text-only viewer
Any Line Up to
10x or more
14
Short overview of Python profilers
* Measured against Grand Unified Python Benchmark
Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16 GB RAM; Windows 8.1 x86_64
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
15
Profiling Python code using Intel® VTune™
Amplifier XE
Intel® VTune™ Amplifier XE now has the ability to profile Python code. It can
give you line level profiling information with very low overhead. Some key
features are:
Both Python 32- and 64-bit are supported, 2.7.x and 3.4.x-3.5.x versions
Remote collection via SSH supported
Rich user interface with multithreaded support; zoom & filter; source drill-down
 Supported workflows
– Start application, wait for it to finish
– Attach to application, profile, detach
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
16
Profiling Python code using Intel® VTune™
Amplifier XE
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
17
Steps to analyze
Using Intel VTune™ Amplifier
Create a VTune™ Amplifier project.
Run basic hotspot analysis
Interpret result data
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
18
Steps to analyze – Create a Project
To analyze your target with VTune™ Amplifier, you need to create a project,
which is a container for an analysis target configuration and data collection
results.
• Run the amplxe-gui that launches the VTune Amplifier GUI
• Click on the menu button select New->Project
• Specify the project name test_python
VTune Amplifier creates the test_python project directory under
the $HOME/intel/ampl/projects directory and opens the Choose Target and
Analysis Type window with the Analysis Target tab active.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
19
Steps to analyze – Create a Project
• From the left pane, select the local target system and from the right pane
select the Application to Launch target type
• The configuration pane on the right is updated with the settings applicable
to the selected target type.
• Specify and configure your target as follows:
• For the Application field, browse to your python executable
• For the Application parameters field, enter your python script
• Specify the working directory for your program to run
• Use the Managed code profiling mode pull down to specify Mixed
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
20
Steps to analyze – Create a Project
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
21
Steps to analyze – Run Basic Hotspot analysis
• Click the Choose Analysis button on the right to switch to the Analysis
Type tab
• In the Choose Target and Analysis Type window, switch to the Analysis
Type tab.
• From the analysis tree on the left, select Algorithm Analysis > Basic
Hotspots
• The right pane is updated with the default options for the Hotspots analysis
• Click the Start button on the right command bar to run the analysis.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
22
Steps to analyze – Run Basic Hotspot analysis
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
23
Steps to analyze – Interpret Result Data
When the sample application exits, Intel® VTune™ Amplifier finalizes the results
and opens the Hotspots viewpoint where each window or pane is configured to
display code regions that consumed a lot of CPU time. To interpret the data on
the sample code performance, do the following:
Start analysis with the Summary window. To interpret the data, hover over the
question mark to read the pop-up and better understand what the metric
means.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
24
Steps to analyze – Interpret Result Data
Note that CPU Time for the sample application is equal to 7.004 seconds. It is
the sum of CPU time for all application threads. Total Thread Count is 3, so the
sample application is multi-threaded.
The Top Hotspots section provides data on the most time-consuming
functions (hotspot functions) sorted by CPU time spent on their execution.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
25
Steps to analyze – Interpret Result Data
The CPU Usage Histogram shows you how well you are utilizing the different
cores of your system. It indicates the Target Utilization which is the maximum
cares available and also the Average Utilization. You can use the sliders at the
bottom of the graph to set which values you would like to be Ideal, Ok, Poor
and Idle.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
26
Steps to analyze – Interpret Result Data
Click on the Bottom-up tab. You can see the hottest functions in your
application. You can also see the threads in your application and how much
CPU time was spent in each thread, you can easily see if your workload is
balanced between your threads. In addition, VTune™ Amplifier color codes your
effective time by utilization to indicate whether you are utilizing your CPU
efficiently.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
27
Steps to analyze – Interpret Result Data
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
28
Steps to analyze – Interpret Result Data
Double click on one of your
functions, this brings up the
source code for your
application, You can view how
much time you are spending on
each line of your application
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
29
Running mixed mode analysis
Python is an interpreted language and it does not use a compiler to generate
binary execution code. Cython is also an interpreted language but a C-
extension, it can be used to generate native code. The VTune™ Amplifier XE
2017 fully supports of Python and Cython code.
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
import math
cdef class SlowpokeCore:
cdef public object N
def __init__(self, N):
self.N = N
cdef double doWork(self, int N) except *:
cdef int i, j, k
cdef double res
res = 0
for j in range(N):
k = 0
for i in range(N):
k += 1
res += k
return math.log(res)
def __str__(self):
return 'SlowpokeCore: %f' % self.doWork(self.N)
30
Mixed C/Python example to profile: core.pyx (Cython-based)
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
31
Mixed C/Python example to profile: main.py
from slowpoke import SlowpokeCore
import logging
import time
def makeParams():
objects = tuple(SlowpokeCore(50000) for _ in xrange(50))
template = ''.join('{%d}' % i for i in xrange(len(objects)))
return template, objects
def calc_pi():
# removed for readability; pure-Python function was here
def doLog():
template, objects = makeParams()
for _ in xrange(1000):
calc_pi()
logging.info(template.format(*objects))
def main():
logging.basicConfig()
start = time.time()
doLog()
stop = time.time()
print('run took: %.3f' % (stop - start))
if __name__ == '__main__':
main()
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
32
Intel® VTune™ Amplifier example
Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16
GB RAM; Windows 8.1 x86_64
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
33
Intel® VTune™ Amplifier – source view (main.py)
Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16
GB RAM; Windows 8.1 x86_64
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
34
Intel® VTune™ Amplifier – source view (core.c)
Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16
GB RAM; Windows 8.1 x86_64
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
35
Summary
Tuning can dramatically increase the performance of your code. Intel® VTune™
Amplifier XE now has the ability to profile Python code. You can also analyze
mixed Python/C code as well as pure Python. VTune™ Amplifier has a powerful
set of features that will allow you to quickly identify your performance
bottlenecks.
Call to action
Get Intel® Parallel Studio XE 2017 and start profiling your Python code today!
https://software.intel.com/en-us/articles/intel-parallel-studio-xe-2017
Optimization Notice
Copyright © 2016, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
36
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for
use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part 1: Intel® VTune™ Amplifier

More Related Content

What's hot

The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 
Register of 80386
Register of 80386Register of 80386
Register of 80386
aviban
 
Computer organiztion5
Computer organiztion5Computer organiztion5
Computer organiztion5
Umang Gupta
 
Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4
hans511002
 
Intel core presentation mnk
Intel core presentation mnkIntel core presentation mnk
Intel core presentation mnk
kondalarao7
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 

What's hot (20)

Embedded C - Lecture 3
Embedded C - Lecture 3Embedded C - Lecture 3
Embedded C - Lecture 3
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Register of 80386
Register of 80386Register of 80386
Register of 80386
 
bca data structure
bca data structurebca data structure
bca data structure
 
Computer organiztion5
Computer organiztion5Computer organiztion5
Computer organiztion5
 
Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4
 
Generation of computer processors
Generation of computer processorsGeneration of computer processors
Generation of computer processors
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
I3,i5,i7 ppt
I3,i5,i7 pptI3,i5,i7 ppt
I3,i5,i7 ppt
 
Assembly language part I
Assembly language part IAssembly language part I
Assembly language part I
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
Queue implementation
Queue implementationQueue implementation
Queue implementation
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Go and the garbage collection
Go and the garbage collectionGo and the garbage collection
Go and the garbage collection
 
Intel core presentation mnk
Intel core presentation mnkIntel core presentation mnk
Intel core presentation mnk
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 

Similar to Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part 1: Intel® VTune™ Amplifier

Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
Media Gorod
 

Similar to Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part 1: Intel® VTune™ Amplifier (20)

Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
 
Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию...
 
Vasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python ProfilingVasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python Profiling
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 

More from Intel® Software

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Recently uploaded

Recently uploaded (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 

Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part 1: Intel® VTune™ Amplifier

  • 1. Kevin O’Leary – Intel Developer Products Division Technical Consulting Engineer
  • 2. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2 Programming Languages by Popularity Pythonremains #1 programming language in hiring demand followed by Java and C++ Go and Scala demonstrate strong growth for last 2 years * Source: CodeEval, Feb 2015
  • 3. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 3 Programming Languages Productivity 0 1 2 3 4 5 Python Java C++ C PROGRAMMING COMPLEXITY (HOURS) Prechelt* Berkholz** 0 1 2 3 4 Python Java C++ C LANGUAGE VERBOSITY (LOC/FEATURE) Prechelt* Berkholz** * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk - D.Berkholz, Programming languages ranked by expressiveness
  • 4. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 4 Low Overhead Sampling  Accurate performance data without high overhead instrumentation  Launch application or attach to a running process Precise Line Level Details  No guessing, see source line level detail Mixed Python / native C, C++, Fortran…  Optimize native code driven by Python Profile Python & Go! And Mixed Python / C++ / Fortran New!
  • 6. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Accurate Data - Low Overhead  CPU, GPU, FPU, threading, bandwidth… Meaningful Analysis  Threading, OpenMP region efficiency  Memory access, storage device Easy  Data displayed on the source code  Easy set-up, no special compiles Faster, Scalable Code, Faster Intel® VTune™ Amplifier Performance Profiler http://intel.ly/vtune-amplifier-xe For Windows* and Linux* From $899 (UI only now available on OS X*) Claire Cates Principal Developer SAS Institute Inc. “Last week, Intel® VTune™ Amplifier helped us find almost 3X performance improvement. This week it helped us improve the performance another 3X.” 6
  • 7. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel® VTune™ Amplifier Tune Applications for Scalable Multicore Performance Agenda  Why python optimization is important  How do you find places that need optimization  Overview of profilers  Profiling Python using VTune Amplifier  Mixed mode profiling  Summary 7
  • 8. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Why do you need Python optimization? Python is used to power a wide range of software, including those where application performance matters. • web server code • complex automation scripts (even build systems) • scientific calculations, etc. Python allows you to quickly write code that may not scale well, but you won’t know it unless you give it enough workload to process.
  • 9. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 9 How do you find the places that need optimization? Code examination  Easiest in terms of you don’t need any tools except code editor  Difficult in practice, also assumes you know how certain code constructs perform  This might not work for even moderately large code base because there is just too much information to grasp.
  • 10. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 10 How do you find the places that need optimization? (continued) Logging  Done by making special version of source code augmented with timestamp logging  Involves looking at the logs trying to find the part of your code that is slow.  This analysis can be tedious and it also involves changing your source.
  • 11. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 11 How do you find the places that need optimization? (continued) Profiling  Profiling is gathering some metrics on how your application works under certain workloads  In this paper we will be focused on CPU hotspot profiling. Finding places in your code that consume a lot of CPU cycles.  In theory you could also profile other interesting cases such as waiting on a lock, memory consumption, etc. (not currently implemented in VTune™ Amplifier)
  • 12. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 12 Overview of existing profilers There are three basic types of profilers: • Event Event based profilers collect data when certain events occur. For example on function entry/exit or when classes are loaded/unloaded, etc. The built-in Python profiler cProfile is an example of an event based profiler. • Instrumentation In an instrumentation based profiler the target application is modified and basically the application profiles itself. This can be done by manually modifying the application or by support built inside the compiler.
  • 13. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 13 Overview of existing profilers (contd) • Statistical A statistically based profiler samples data at regular intervals. The hottest functions should be at the top of the sample distribution. This type of profiling provides approximate results but are much less intrusive on the target application. Profiling overhead is also much less workload dependent. Intel® VTune™ Amplifier is an example of a statistically based profiler.
  • 14. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Tool Description Platforms Profile level Avg. overhead * Intel® VTune™ Amplifier • Rich GUI viewer • Mixed C/C++/Python code Windows Linux Line ~1.1-1.6x cProfile (built-in) • Text interactive mode: “pstats” (built-in) • GUI viewer: RunSnakeRun (Open Source) • PyCharm Any Function 1.3x-5x Python Tools • Visual Studio (2010+) • Open Source Windows Function ~2x line_profiler • Pure Python • Open Source • Text-only viewer Any Line Up to 10x or more 14 Short overview of Python profilers * Measured against Grand Unified Python Benchmark Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16 GB RAM; Windows 8.1 x86_64
  • 15. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 15 Profiling Python code using Intel® VTune™ Amplifier XE Intel® VTune™ Amplifier XE now has the ability to profile Python code. It can give you line level profiling information with very low overhead. Some key features are: Both Python 32- and 64-bit are supported, 2.7.x and 3.4.x-3.5.x versions Remote collection via SSH supported Rich user interface with multithreaded support; zoom & filter; source drill-down  Supported workflows – Start application, wait for it to finish – Attach to application, profile, detach
  • 16. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 16 Profiling Python code using Intel® VTune™ Amplifier XE
  • 17. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 17 Steps to analyze Using Intel VTune™ Amplifier Create a VTune™ Amplifier project. Run basic hotspot analysis Interpret result data
  • 18. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 18 Steps to analyze – Create a Project To analyze your target with VTune™ Amplifier, you need to create a project, which is a container for an analysis target configuration and data collection results. • Run the amplxe-gui that launches the VTune Amplifier GUI • Click on the menu button select New->Project • Specify the project name test_python VTune Amplifier creates the test_python project directory under the $HOME/intel/ampl/projects directory and opens the Choose Target and Analysis Type window with the Analysis Target tab active.
  • 19. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 19 Steps to analyze – Create a Project • From the left pane, select the local target system and from the right pane select the Application to Launch target type • The configuration pane on the right is updated with the settings applicable to the selected target type. • Specify and configure your target as follows: • For the Application field, browse to your python executable • For the Application parameters field, enter your python script • Specify the working directory for your program to run • Use the Managed code profiling mode pull down to specify Mixed
  • 20. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 20 Steps to analyze – Create a Project
  • 21. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 21 Steps to analyze – Run Basic Hotspot analysis • Click the Choose Analysis button on the right to switch to the Analysis Type tab • In the Choose Target and Analysis Type window, switch to the Analysis Type tab. • From the analysis tree on the left, select Algorithm Analysis > Basic Hotspots • The right pane is updated with the default options for the Hotspots analysis • Click the Start button on the right command bar to run the analysis.
  • 22. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 22 Steps to analyze – Run Basic Hotspot analysis
  • 23. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 23 Steps to analyze – Interpret Result Data When the sample application exits, Intel® VTune™ Amplifier finalizes the results and opens the Hotspots viewpoint where each window or pane is configured to display code regions that consumed a lot of CPU time. To interpret the data on the sample code performance, do the following: Start analysis with the Summary window. To interpret the data, hover over the question mark to read the pop-up and better understand what the metric means.
  • 24. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 24 Steps to analyze – Interpret Result Data Note that CPU Time for the sample application is equal to 7.004 seconds. It is the sum of CPU time for all application threads. Total Thread Count is 3, so the sample application is multi-threaded. The Top Hotspots section provides data on the most time-consuming functions (hotspot functions) sorted by CPU time spent on their execution.
  • 25. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 25 Steps to analyze – Interpret Result Data The CPU Usage Histogram shows you how well you are utilizing the different cores of your system. It indicates the Target Utilization which is the maximum cares available and also the Average Utilization. You can use the sliders at the bottom of the graph to set which values you would like to be Ideal, Ok, Poor and Idle.
  • 26. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 26 Steps to analyze – Interpret Result Data Click on the Bottom-up tab. You can see the hottest functions in your application. You can also see the threads in your application and how much CPU time was spent in each thread, you can easily see if your workload is balanced between your threads. In addition, VTune™ Amplifier color codes your effective time by utilization to indicate whether you are utilizing your CPU efficiently.
  • 27. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 27 Steps to analyze – Interpret Result Data
  • 28. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 28 Steps to analyze – Interpret Result Data Double click on one of your functions, this brings up the source code for your application, You can view how much time you are spending on each line of your application
  • 29. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 29 Running mixed mode analysis Python is an interpreted language and it does not use a compiler to generate binary execution code. Cython is also an interpreted language but a C- extension, it can be used to generate native code. The VTune™ Amplifier XE 2017 fully supports of Python and Cython code.
  • 30. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. import math cdef class SlowpokeCore: cdef public object N def __init__(self, N): self.N = N cdef double doWork(self, int N) except *: cdef int i, j, k cdef double res res = 0 for j in range(N): k = 0 for i in range(N): k += 1 res += k return math.log(res) def __str__(self): return 'SlowpokeCore: %f' % self.doWork(self.N) 30 Mixed C/Python example to profile: core.pyx (Cython-based)
  • 31. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 31 Mixed C/Python example to profile: main.py from slowpoke import SlowpokeCore import logging import time def makeParams(): objects = tuple(SlowpokeCore(50000) for _ in xrange(50)) template = ''.join('{%d}' % i for i in xrange(len(objects))) return template, objects def calc_pi(): # removed for readability; pure-Python function was here def doLog(): template, objects = makeParams() for _ in xrange(1000): calc_pi() logging.info(template.format(*objects)) def main(): logging.basicConfig() start = time.time() doLog() stop = time.time() print('run took: %.3f' % (stop - start)) if __name__ == '__main__': main()
  • 32. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 32 Intel® VTune™ Amplifier example Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16 GB RAM; Windows 8.1 x86_64
  • 33. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 33 Intel® VTune™ Amplifier – source view (main.py) Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16 GB RAM; Windows 8.1 x86_64
  • 34. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 34 Intel® VTune™ Amplifier – source view (core.c) Machine specs: HP EliteBook 850 G1; Intel® Core™ i5-4300U @1.90 Ghz (4 cores with HT on) CPU; 16 GB RAM; Windows 8.1 x86_64
  • 35. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 35 Summary Tuning can dramatically increase the performance of your code. Intel® VTune™ Amplifier XE now has the ability to profile Python code. You can also analyze mixed Python/C code as well as pure Python. VTune™ Amplifier has a powerful set of features that will allow you to quickly identify your performance bottlenecks. Call to action Get Intel® Parallel Studio XE 2017 and start profiling your Python code today! https://software.intel.com/en-us/articles/intel-parallel-studio-xe-2017
  • 36. Optimization Notice Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 36 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804