SlideShare a Scribd company logo
Yung-Yu Chen (@yungyuc)
On the necessity and
inapplicability of Python
Help us develop numerical software
Whom I am
• I am a mechanical engineer by training, focusing on
applications of continuum mechanics. A computational
scientist / engineer rather than a computer scientist.

• In my day job, I write high-performance code for
semiconductor applications of computational geometry
and lithography.

• In my spare time, I am teaching a course ‘numerical
software development’ in the dept. of computer science
in NCTU.
2
You can contact me through twitter: https://twitter.com/yungyuc
or linkedin: https://www.linkedin.com/in/yungyuc/.
PyHUG
• Python Hsinchu User Group (established in late
2011)

• The first group of staff of PyCon Taiwan (2012)

• Weekly meetups at a pub for 3 years, not
stopped by COVID-19

• 7+ active user groups in Taiwan 

• I have been in PyConJP in 2012, 2013 (APAC),
2015, 2019

• Last year I led a visit group to PyConJP (thank
you Terada san for the sharing the know-
how!)

• I hope we can do more
3
PyCon
Taiwan
5-6 Sep, 2020, Tainan, Taiwan

• It is planned to be an on-site conference
(unless something incredibly bad
happens again)

• Speakers may choose to speak online

• We still need to wear a face mask

• Appreciate the Taiwan citizens and
government, who work hard to
counter COVID-19

• https://g0v.hackmd.io/@kiang/
mask-info 

• We hope to see you again in Taiwan!
4
https://tw.pycon.org/2020/
Numerical software
• Numerical software: Computer programs to solve scientific or
mathematic problems.

• Other names: Mathematical software, scientific software, technical
software.

• Python is a popular language for application experts to describe the
problems and solutions, because it is easy to use.

• Most of the computing systems (the numerical software) are designed in
a hybrid architecture.

• The computing kernel uses C++.

• Python is chosen for the user-level API.
5
Example: OPC
6
photoresist
silicon substrate
photomask
light source
Photolithography in semiconductor fabrication
wave length is only
hundreds of nm
image I want to
project on the PR
shape I need
on the mask
Optical proximity correction (OPC)
(smaller than the
wave length)
write code to
make it happen
Example: PDEs
7
Numerical simulations of
conservation laws:

∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
Use case: stress waves in 

anisotropic solids
Use case: compressible flows
Example: What others do
• Machine learning

• Examples: TensorFlow, PyTorch

• Also:

• Computer aided design and engineering (CAD/CAE)

• Computer graphics and visualization

• Hybrid architecture provides both speed and flexibility

• C++ makes it possible to do the huge amount of calculations, e.g.,
distributed computing of thousands of computers

• Python helps describe the complex problems of mathematics or sciences
8
Crunch real numbers
• Simple example: solve the Laplace equation

• 

• 

• 

• Use a two-dimensional array as the spatial grid

• Point-Jacobi method: 3-level nested loop
∂2
u
∂x2
+
∂2
u
∂y2
= 0 (0 < x < 1; 0 < y < 1)
u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1)
u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1)
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt]
+ u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
9
Non-trivial boundary condition
Power of Numpy C++
def solve_numpy_array():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
while not converged:
step += 1
un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] +
u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms
Wall time: 63.1 ms: Pretty good!
CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s
Wall time: 5280 ms: Poor speed
10
std::tuple<xt::xarray<double>, size_t, double>
solve_cpp(xt::xarray<double> u)
{
const size_t nx = u.shape(0);
xt::xarray<double> un = u;
bool converged = false;
size_t step = 0;
double norm;
while (!converged)
{
++step;
for (size_t it=1; it<nx-1; ++it)
{
for (size_t jt=1; jt<nx-1; ++jt)
{
un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4;
}
}
norm = xt::amax(xt::abs(un-u))();
if (norm < 1.e-5) { converged = true; }
u = un;
}
return std::make_tuple(u, step, norm);
}
CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms
Wall time: 29.9 ms: Definitely good!
Pure Python 5280 ms
Numpy 63.1 ms
C++ 29.9 ms
83.7x
2.1x 176.6x
Pure Python Numpy
C++
The speed is the reason

1000 computers → 5.67

Save a lot of $
Recap: Why Python?
• Python is slow, but numpy may be reasonably fast.

• Coding in C++ is time-consuming.

• C++ is only needed in the computing kernel.

• Most code is supportive code, but it must not slow down the
computing kernel.

• Python makes it easier to organize structure the code.

This is why high-performance system usually uses a hybrid
architecture (C++ with Python or another scripting language).
11
Let’s go hybrid, but …
• A dilemma:

• Engineers (domain experts) know the problems but
don’t know C++ and software engineering.

• Computer scientists (programmers) know about C++
and software engineering but not the problems.

• Either side takes years of practices and study.

• Not a lot of people want to play both roles.
12
NSD: attempt to improve
• Numerical software development: a graduate-level
course

• Train computer scientists the hybrid architecture
for numerical software

• https://github.com/yungyuc/nsd

• Runnable Jupyter notebooks
13
• Part 1: Start with Python
• Lecture 1: Introduction

• Lecture 2: Fundamental engineering practices

• Lecture 3: Python and numpy

• Part 2: Computer architecture for performance
• Lecture 4: C++ and computer architecture
• Lecture 5: Matrix operations

• Lecture 6: Cache optimization

• Lecture 7: SIMD

• Part 3: Resource management
• Lecture 8: Memory management

• Lecture 9: Ownership and smart pointers

• Part 4: How to write C++ for Python
• Lecture 10: Modern C++

• Lecture 11: C++ and C for Python

• Lecture 12: Array code in C++

• Lecture 13: Array-oriented design

• Part 5: Conclude with Python
• Lecture 14: Advanced Python

• Term project presentation
Memory hierarchy
• We go to C++ to make it easier to access hardware

• Modern computer has faster CPU than memory

• High performance comes with hiding the memory-access latency
registers (0 cycle)
L1 cache (4 cycles)
L2 cache (10 cycles)
L3 cache (50 cycles)
Main memory (200 cycles)
Disk (storage) (100,000 cycles)
14
Data object
• Numerical software processes
huge amount of data. Copying
them is expensive.

• Use a pipeline to process the
same block of data

• Use an object to manage the
data: data object

• Data objects may not always be a
good idea in other fields.

• Here we do what it takes for
uncompromisable
performance.
Field initialization
Interior time-marching
Boundary condition
Parallel data sync
Finalization
Data
15
Data access at all phases
Zero-copy: do it where it fits
Python app C++ app
C++
container
Ndarray
manage
access
Python app C++ app
C++
container
Ndarray
manage
accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language memory buffer shared across language
Top (Python) - down (C++) Bottom (C++) - up (Python)
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language
Ndarray
C++
container
16
More detail …
Notes about moving from Python to C++ 

• Python frame object

• Building Python extensions using pybind11
and cmake

• Inspecting assembly code

• x86 intrinsics

• PyObject, CPython API and pybind11 API

• Shared pointer, unique pointer, raw pointer,
and ownership

• Template generic programming

https://tw.pycon.org/2020/en-us/events/talk/
1164539411870777736/
17
How to learn
• Work on a real project.

• Keep in mind that Python is 100x slower than C/C++.

• Always profile (time).

• Don’t treat Python as simply Python.

• View Python as an interpreter library written in C.

• Use tools to call C/C++: Cython, pybind11, etc.
18
What we want
19
See problems
Formulate the
problems
Get something
working
Automate PrototypeReusable
software
? ?
One-time programs may happen
Thanks!
Questions?

More Related Content

What's hot

TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
Andrii Babii
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)
Alessio Tonioni
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Multithreading to Construct Neural Networks
Multithreading to Construct Neural NetworksMultithreading to Construct Neural Networks
Multithreading to Construct Neural Networks
Altoros
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
marwa Ayad Mohamed
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Intro to Python
Intro to PythonIntro to Python
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
Mani Goswami
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Ralph Vincent Regalado
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
Getting started with TensorFlow
Getting started with TensorFlowGetting started with TensorFlow
Getting started with TensorFlow
ElifTech
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
Tensor flow
Tensor flowTensor flow
Tensor flow
Nikhil Krishna Nair
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
Jin Joong Kim
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
Hyunghun Cho
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
Darshan Patel
 
Towards Machine Learning in Pharo with TensorFlow
Towards Machine Learning in Pharo with TensorFlowTowards Machine Learning in Pharo with TensorFlow
Towards Machine Learning in Pharo with TensorFlow
ESUG
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and Basics
Atsunori Kanemura
 

What's hot (19)

TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Multithreading to Construct Neural Networks
Multithreading to Construct Neural NetworksMultithreading to Construct Neural Networks
Multithreading to Construct Neural Networks
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Getting started with TensorFlow
Getting started with TensorFlowGetting started with TensorFlow
Getting started with TensorFlow
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Towards Machine Learning in Pharo with TensorFlow
Towards Machine Learning in Pharo with TensorFlowTowards Machine Learning in Pharo with TensorFlow
Towards Machine Learning in Pharo with TensorFlow
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and Basics
 

Similar to On the necessity and inapplicability of python

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
Fwdays
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
Intro to Multitasking
Intro to MultitaskingIntro to Multitasking
Intro to Multitasking
Brian Schrader
 
Python_intro.ppt
Python_intro.pptPython_intro.ppt
Python_intro.ppt
Mariela Gamarra Paredes
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
2017 arab wic marwa ayad machine learning
2017 arab wic marwa ayad machine learning2017 arab wic marwa ayad machine learning
2017 arab wic marwa ayad machine learning
marwa Ayad Mohamed
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
Pregel
PregelPregel
Pregel
Weiru Dai
 
Return of c++
Return of c++Return of c++
Return of c++
Yongwei Wu
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
Naoki (Neo) SATO
 
01 introduction to cpp
01   introduction to cpp01   introduction to cpp
01 introduction to cpp
Manzoor ALam
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
Dr Reeja S R
 
Algorithms 101 for Data Scientists (Part 2)
Algorithms 101 for Data Scientists (Part 2)Algorithms 101 for Data Scientists (Part 2)
Algorithms 101 for Data Scientists (Part 2)
Christopher Conlan
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
Davide Carboni
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
Travis Oliphant
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CAD
Design World
 

Similar to On the necessity and inapplicability of python (20)

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Intro to Multitasking
Intro to MultitaskingIntro to Multitasking
Intro to Multitasking
 
Python_intro.ppt
Python_intro.pptPython_intro.ppt
Python_intro.ppt
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
2017 arab wic marwa ayad machine learning
2017 arab wic marwa ayad machine learning2017 arab wic marwa ayad machine learning
2017 arab wic marwa ayad machine learning
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
Pregel
PregelPregel
Pregel
 
Return of c++
Return of c++Return of c++
Return of c++
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
 
01 introduction to cpp
01   introduction to cpp01   introduction to cpp
01 introduction to cpp
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Algorithms 101 for Data Scientists (Part 2)
Algorithms 101 for Data Scientists (Part 2)Algorithms 101 for Data Scientists (Part 2)
Algorithms 101 for Data Scientists (Part 2)
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CAD
 

More from Yung-Yu Chen

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
Yung-Yu Chen
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
Yung-Yu Chen
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
Yung-Yu Chen
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020
Yung-Yu Chen
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
Yung-Yu Chen
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
Yung-Yu Chen
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
Yung-Yu Chen
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
Yung-Yu Chen
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
Yung-Yu Chen
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
Yung-Yu Chen
 

More from Yung-Yu Chen (10)

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

On the necessity and inapplicability of python

  • 1. Yung-Yu Chen (@yungyuc) On the necessity and inapplicability of Python Help us develop numerical software
  • 2. Whom I am • I am a mechanical engineer by training, focusing on applications of continuum mechanics. A computational scientist / engineer rather than a computer scientist. • In my day job, I write high-performance code for semiconductor applications of computational geometry and lithography. • In my spare time, I am teaching a course ‘numerical software development’ in the dept. of computer science in NCTU. 2 You can contact me through twitter: https://twitter.com/yungyuc or linkedin: https://www.linkedin.com/in/yungyuc/.
  • 3. PyHUG • Python Hsinchu User Group (established in late 2011) • The first group of staff of PyCon Taiwan (2012) • Weekly meetups at a pub for 3 years, not stopped by COVID-19 • 7+ active user groups in Taiwan • I have been in PyConJP in 2012, 2013 (APAC), 2015, 2019 • Last year I led a visit group to PyConJP (thank you Terada san for the sharing the know- how!) • I hope we can do more 3
  • 4. PyCon Taiwan 5-6 Sep, 2020, Tainan, Taiwan • It is planned to be an on-site conference (unless something incredibly bad happens again) • Speakers may choose to speak online • We still need to wear a face mask • Appreciate the Taiwan citizens and government, who work hard to counter COVID-19 • https://g0v.hackmd.io/@kiang/ mask-info • We hope to see you again in Taiwan! 4 https://tw.pycon.org/2020/
  • 5. Numerical software • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 5
  • 6. Example: OPC 6 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  • 7. Example: PDEs 7 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  • 8. Example: What others do • Machine learning • Examples: TensorFlow, PyTorch • Also: • Computer aided design and engineering (CAD/CAE) • Computer graphics and visualization • Hybrid architecture provides both speed and flexibility • C++ makes it possible to do the huge amount of calculations, e.g., distributed computing of thousands of computers • Python helps describe the complex problems of mathematics or sciences 8
  • 9. Crunch real numbers • Simple example: solve the Laplace equation • • • • Use a two-dimensional array as the spatial grid • Point-Jacobi method: 3-level nested loop ∂2 u ∂x2 + ∂2 u ∂y2 = 0 (0 < x < 1; 0 < y < 1) u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1) u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1) def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm 9 Non-trivial boundary condition
  • 10. Power of Numpy C++ def solve_numpy_array(): u = uoriginal.copy() un = u.copy() converged = False step = 0 while not converged: step += 1 un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] + u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms Wall time: 63.1 ms: Pretty good! CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s Wall time: 5280 ms: Poor speed 10 std::tuple<xt::xarray<double>, size_t, double> solve_cpp(xt::xarray<double> u) { const size_t nx = u.shape(0); xt::xarray<double> un = u; bool converged = false; size_t step = 0; double norm; while (!converged) { ++step; for (size_t it=1; it<nx-1; ++it) { for (size_t jt=1; jt<nx-1; ++jt) { un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4; } } norm = xt::amax(xt::abs(un-u))(); if (norm < 1.e-5) { converged = true; } u = un; } return std::make_tuple(u, step, norm); } CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms Wall time: 29.9 ms: Definitely good! Pure Python 5280 ms Numpy 63.1 ms C++ 29.9 ms 83.7x 2.1x 176.6x Pure Python Numpy C++ The speed is the reason 1000 computers → 5.67 Save a lot of $
  • 11. Recap: Why Python? • Python is slow, but numpy may be reasonably fast. • Coding in C++ is time-consuming. • C++ is only needed in the computing kernel. • Most code is supportive code, but it must not slow down the computing kernel. • Python makes it easier to organize structure the code. This is why high-performance system usually uses a hybrid architecture (C++ with Python or another scripting language). 11
  • 12. Let’s go hybrid, but … • A dilemma: • Engineers (domain experts) know the problems but don’t know C++ and software engineering. • Computer scientists (programmers) know about C++ and software engineering but not the problems. • Either side takes years of practices and study. • Not a lot of people want to play both roles. 12
  • 13. NSD: attempt to improve • Numerical software development: a graduate-level course • Train computer scientists the hybrid architecture for numerical software • https://github.com/yungyuc/nsd • Runnable Jupyter notebooks 13 • Part 1: Start with Python • Lecture 1: Introduction • Lecture 2: Fundamental engineering practices • Lecture 3: Python and numpy • Part 2: Computer architecture for performance • Lecture 4: C++ and computer architecture • Lecture 5: Matrix operations • Lecture 6: Cache optimization • Lecture 7: SIMD • Part 3: Resource management • Lecture 8: Memory management • Lecture 9: Ownership and smart pointers • Part 4: How to write C++ for Python • Lecture 10: Modern C++ • Lecture 11: C++ and C for Python • Lecture 12: Array code in C++ • Lecture 13: Array-oriented design • Part 5: Conclude with Python • Lecture 14: Advanced Python • Term project presentation
  • 14. Memory hierarchy • We go to C++ to make it easier to access hardware • Modern computer has faster CPU than memory • High performance comes with hiding the memory-access latency registers (0 cycle) L1 cache (4 cycles) L2 cache (10 cycles) L3 cache (50 cycles) Main memory (200 cycles) Disk (storage) (100,000 cycles) 14
  • 15. Data object • Numerical software processes huge amount of data. Copying them is expensive. • Use a pipeline to process the same block of data • Use an object to manage the data: data object • Data objects may not always be a good idea in other fields. • Here we do what it takes for uncompromisable performance. Field initialization Interior time-marching Boundary condition Parallel data sync Finalization Data 15 Data access at all phases
  • 16. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 16
  • 17. More detail … Notes about moving from Python to C++ • Python frame object • Building Python extensions using pybind11 and cmake • Inspecting assembly code • x86 intrinsics • PyObject, CPython API and pybind11 API • Shared pointer, unique pointer, raw pointer, and ownership • Template generic programming https://tw.pycon.org/2020/en-us/events/talk/ 1164539411870777736/ 17
  • 18. How to learn • Work on a real project. • Keep in mind that Python is 100x slower than C/C++. • Always profile (time). • Don’t treat Python as simply Python. • View Python as an interpreter library written in C. • Use tools to call C/C++: Cython, pybind11, etc. 18
  • 19. What we want 19 See problems Formulate the problems Get something working Automate PrototypeReusable software ? ? One-time programs may happen