Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the Necessity and Inapplicability of Python

Start Python Club, Global Meetup
Speaker: Yung-Yu Chen
Date: July 9th, 2020

  • Be the first to comment

  • Be the first to like this

On the Necessity and Inapplicability of Python

  1. 1. Yung-Yu Chen (@yungyuc) On the necessity and inapplicability of Python Help us develop numerical software
  2. 2. Whom I am • I am a mechanical engineer by training, focusing on applications of continuum mechanics. A computational scientist / engineer rather than a computer scientist. • In my day job, I write high-performance code for semiconductor applications of computational geometry and lithography. • In my spare time, I am teaching a course ‘numerical software development’ in the dept. of computer science in NCTU. 2 You can contact me through twitter: https://twitter.com/yungyuc or linkedin: https://www.linkedin.com/in/yungyuc/.
  3. 3. PyHUG • Python Hsinchu User Group (established in late 2011) • The first group of staff of PyCon Taiwan (2012) • Weekly meetups at a pub for 3 years, not stopped by COVID-19 • 7+ active user groups in Taiwan • I have been in PyConJP in 2012, 2013 (APAC), 2015, 2019 • Last year I led a visit group to PyConJP (thank you Terada san for the sharing the know- how!) • I hope we can do more 3
  4. 4. PyCon Taiwan 5-6 Sep, 2020, Tainan, Taiwan • It is planned to be an on-site conference (unless something incredibly bad happens again) • Speakers may choose to speak online • We still need to wear a face mask • Appreciate the Taiwan citizens and government, who work hard to counter COVID-19 • https://g0v.hackmd.io/@kiang/ mask-info • We hope to see you again in Taiwan! 4 https://tw.pycon.org/2020/
  5. 5. Numerical software • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 5
  6. 6. Example: OPC 6 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  7. 7. Example: PDEs 7 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  8. 8. Example: What others do • Machine learning • Examples: TensorFlow, PyTorch • Also: • Computer aided design and engineering (CAD/CAE) • Computer graphics and visualization • Hybrid architecture provides both speed and flexibility • C++ makes it possible to do the huge amount of calculations, e.g., distributed computing of thousands of computers • Python helps describe the complex problems of mathematics or sciences 8
  9. 9. Crunch real numbers • Simple example: solve the Laplace equation • • • • Use a two-dimensional array as the spatial grid • Point-Jacobi method: 3-level nested loop ∂2 u ∂x2 + ∂2 u ∂y2 = 0 (0 < x < 1; 0 < y < 1) u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1) u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1) def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm 9 Non-trivial boundary condition
  10. 10. Power of Numpy C++ def solve_numpy_array(): u = uoriginal.copy() un = u.copy() converged = False step = 0 while not converged: step += 1 un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] + u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms Wall time: 63.1 ms: Pretty good! CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s Wall time: 5280 ms: Poor speed 10 std::tuple<xt::xarray<double>, size_t, double> solve_cpp(xt::xarray<double> u) { const size_t nx = u.shape(0); xt::xarray<double> un = u; bool converged = false; size_t step = 0; double norm; while (!converged) { ++step; for (size_t it=1; it<nx-1; ++it) { for (size_t jt=1; jt<nx-1; ++jt) { un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4; } } norm = xt::amax(xt::abs(un-u))(); if (norm < 1.e-5) { converged = true; } u = un; } return std::make_tuple(u, step, norm); } CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms Wall time: 29.9 ms: Definitely good! Pure Python 5280 ms Numpy 63.1 ms C++ 29.9 ms 83.7x 2.1x 176.6x Pure Python Numpy C++ The speed is the reason 1000 computers → 5.67 Save a lot of $
  11. 11. Recap: Why Python? • Python is slow, but numpy may be reasonably fast. • Coding in C++ is time-consuming. • C++ is only needed in the computing kernel. • Most code is supportive code, but it must not slow down the computing kernel. • Python makes it easier to organize structure the code. This is why high-performance system usually uses a hybrid architecture (C++ with Python or another scripting language). 11
  12. 12. Let’s go hybrid, but … • A dilemma: • Engineers (domain experts) know the problems but don’t know C++ and software engineering. • Computer scientists (programmers) know about C++ and software engineering but not the problems. • Either side takes years of practices and study. • Not a lot of people want to play both roles. 12
  13. 13. NSD: attempt to improve • Numerical software development: a graduate-level course • Train computer scientists the hybrid architecture for numerical software • https://github.com/yungyuc/nsd • Runnable Jupyter notebooks 13 • Part 1: Start with Python • Lecture 1: Introduction • Lecture 2: Fundamental engineering practices • Lecture 3: Python and numpy • Part 2: Computer architecture for performance • Lecture 4: C++ and computer architecture • Lecture 5: Matrix operations • Lecture 6: Cache optimization • Lecture 7: SIMD • Part 3: Resource management • Lecture 8: Memory management • Lecture 9: Ownership and smart pointers • Part 4: How to write C++ for Python • Lecture 10: Modern C++ • Lecture 11: C++ and C for Python • Lecture 12: Array code in C++ • Lecture 13: Array-oriented design • Part 5: Conclude with Python • Lecture 14: Advanced Python • Term project presentation
  14. 14. Memory hierarchy • We go to C++ to make it easier to access hardware • Modern computer has faster CPU than memory • High performance comes with hiding the memory-access latency registers (0 cycle) L1 cache (4 cycles) L2 cache (10 cycles) L3 cache (50 cycles) Main memory (200 cycles) Disk (storage) (100,000 cycles) 14
  15. 15. Data object • Numerical software processes huge amount of data. Copying them is expensive. • Use a pipeline to process the same block of data • Use an object to manage the data: data object • Data objects may not always be a good idea in other fields. • Here we do what it takes for uncompromisable performance. Field initialization Interior time-marching Boundary condition Parallel data sync Finalization Data 15 Data access at all phases
  16. 16. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 16
  17. 17. More detail … Notes about moving from Python to C++ • Python frame object • Building Python extensions using pybind11 and cmake • Inspecting assembly code • x86 intrinsics • PyObject, CPython API and pybind11 API • Shared pointer, unique pointer, raw pointer, and ownership • Template generic programming https://tw.pycon.org/2020/en-us/events/talk/ 1164539411870777736/ 17
  18. 18. How to learn • Work on a real project. • Keep in mind that Python is 100x slower than C/C++. • Always profile (time). • Don’t treat Python as simply Python. • View Python as an interpreter library written in C. • Use tools to call C/C++: Cython, pybind11, etc. 18
  19. 19. What we want 19 See problems Formulate the problems Get something working Automate PrototypeReusable software ? ? One-time programs may happen
  20. 20. Thanks! Questions?

×