SlideShare a Scribd company logo
1 of 27
Download to read offline
Write Python for Speed
Yung-Yu Chen
(yyc@sciwork.dev)
October 2023
1
speed is the king
リニア中央新幹線 (Chuo Shinkansen)
https://linear-chuo-shinkansen.jr-central.co.jp/about/design/
500km/h high speed rail by 2038 "can't wait!" 🙂
what we are talking about is unparalleled speed
2
numerical calculation needs speed
u = 0
u = 0
u = 0
u(y) = sin(πy)
i, j
i, j + 1
i, j − 1
i + 1,j
i − 1,j
i + 1,j + 1
i + 1,j − 1
i − 1,j + 1
i − 1,j − 1
grid points
computing domain
(and solution)
∂2
u
∂x2
+
∂2
u
∂y2
= 0 (0 < x < 1; 0 < y < 1)
u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1)
u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1)
Equation for domain interior
Boundary conditions
un+1
(xi, yi) =
un
(xi+1, yj) + un
(xi−1, yj) + un
(xi, yj+1) + un
(xi, yj−1)
4
Use the point-Jacobi method to march
the Laplace equation
3
Python is slow
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4
un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] +
u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4
for (size_t it=1; it<nx-1; ++it)
{
for (size_t jt=1; jt<nx-1; ++jt)
{
un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4;
}
}
Point-Jacobi method
Python nested loop
Numpy array
C++ nested loop
4.797s (1x)
0.055s (87x)
0.025s (192x)
un+1
(xi, yi) =
un
(xi+1, yj) + un
(xi−1, yj) + un
(xi, yj+1) + un
(xi, yj−1)
4
4
example: hyperbolic PDEs
5
numerical simulations of conservation laws:
∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
use case: stress waves in anisotropic solids
use case: compressible
fl
ows
6
observe
theorize
simple setup
software package
wall of
complexity
prototype
numerical
analysis
analytical
solution
mass production run
a lot of code development
u(x, y) =
sinh(πx)
sinh(π)
sin(πy)
development procedure
always keep
the problem
(math or
physics) in
mind
always make
the code run
hybrid architecture
• everyone wants the simplicity of Python
• HPC programmers included
• but we must use C++ for speed
• to be honest, machine code
• write Python to drive the C++ code / the machine
7
machine determines speed
void calc_distance(
size_t const n
, double const * x
, double const * y
, double * r)
{
for (size_t i = 0 ; i < n ; ++i)
{
r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]);
}
}
vmovupd ymm0, ymmword [rsi + r9*8]
vmulpd ymm0, ymm0, ymm0
vmovupd ymm1, ymmword [rdx + r9*8]
vmulpd ymm1, ymm1, ymm1
vaddpd ymm0, ymm0, ymm1
vsqrtpd ymm0, ymm0
movupd xmm0, xmmword [rsi + r8*8]
mulpd xmm0, xmm0
movupd xmm1, xmmword [rdx + r8*8]
mulpd xmm1, xmm1
addpd xmm1, xmm0
sqrtpd xmm0, xmm1
AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization
C++ code
8
array: HPC fundamental
...
...
...
regularly allocated bu
ff
er
regularly allocated bu
ff
er
regularly allocated bu
ff
er
regularly accessing algorithm
randomly accessing algorithm
quoted from SemiAnalysis
https://www.semianalysis.com/p/apple-m2-die-shot-and-architecture
die shots of popular chips
the data structure for cache optimization, data parallelism, SIMD, GPU, all the
techniques of high-performance computing (HPC)
9
array for zero-copy across Python and C++
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
share memory bu
ff
er across language
ndarray
C++
container
Python app C++ app
C++
container
ndarray
manage
access a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory bu
ff
er shared across language
👍 bottom (C++) - up (Python) design
Python app C++ app
C++
container
ndarray
manage
access
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory bu
ff
er shared across language
top (Python) - down (C++) design 👎
Python-controlling
lifecycle is
bad for long-running
process
C++ allows
fi
ne-grained
control of all resources
10
quick prototype in Python:
something new and fragile
• thread pool for I/O? certainly
Python
• certainly not want to write
complicated C++ to prototype
• 64 threads on 32-core servers?
• consider to move it to C++
• data-parallel code on top of the
thread pool?
• time to go with TBB (thread-
building block) C++ library
from _thread import allocate_lock, start_new_thread
class ThreadPool(object):
"""Python prototype for I/O thread pool"""
def __init__(self, nthread):
# Worker callback
self.func = None
# Placeholders for managing data
self.__threadids = [None] * nthread
self.__threads = [None] * nthread
self.__returns = [None] * nthread
# Initialize thread managing data
for it in range(nthread):
mlck = allocate_lock(); mlck.acquire()
wlck = allocate_lock(); wlck.acquire()
tdata = [mlck, wlck, None, None]
self.__threads[it] = tdata
tid = start_new_thread(self.eventloop, (tdata,))
self.__threadids[it] = tid
11
quick prototype in Python:
something complex
i, j
i, j + 1
i, j − 1
i + 1,j
i − 1,j
i + 1,j + 1
i + 1,j − 1
i − 1,j + 1
i − 1,j − 1
di
ff
erence equation
def solve_python_loop():
u = uoriginal.copy() # Input from outer scope
un = u.copy() # Create the buffer for the next time step
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
12
grid points
python nested loop implementing point-Jacobi method
u(xi, yj) =
u(xi+1, yj) + u(xi−1, yj) + u(xi, yj+1) + u(xi, yj−1)
4
point-Jacobi method
un+1
(xi, yi) =
un
(xi+1, yj) + un
(xi−1, yj) + un
(xi, yj+1) + un
(xi, yj−1)
4
production: many data many ways
13
fi
t scattered points to a polynomial
f(x) = a3x3
+ a2x2
+ a1x + a0
test many (like 1,000+) such data sets:
def plot_poly_fitted(i):
slct = (xdata>=i)&(xdata<(i+1))
sub_x = xdata[slct]
sub_y = ydata[slct]
poly = data_prep.fit_poly(sub_x, sub_y, 3)
print(poly)
poly = np.poly1d(poly)
xp = np.linspace(sub_x.min(), sub_x.max(), 100)
plt.plot(sub_x, sub_y, '.', xp, poly(xp), '-')
plot_poly_fitted(10)
(yeah, the data points seem to be too random to be represented by a polynomial)
// The rank of the linear map is (order+1).
modmesh::SimpleArray<double> matrix(std::vector<size_t>{order+1, order+1});
// Use the x coordinates to build the linear map for least-square
// regression.
for (size_t it=0; it<order+1; ++it)
{
for (size_t jt=0; jt<order+1; ++jt)
{
double & val = matrix(it, jt);
val = 0;
for (size_t kt=start; kt<stop; ++kt)
{
val += pow(xarr[kt], it+jt);
}
}
}
>>> with Timer():
>>> # Do the calculation for the 1000 groups of points.
>>> polygroup = np.empty((1000, 3), dtype='float64')
>>> for i in range(1000):
>>> # Use numpy to build the point group.
>>> slct = (xdata>=i)&(xdata<(i+1))
>>> sub_x = xdata[slct]
>>> sub_y = ydata[slct]
>>> polygroup[i,:] = data_prep.fit_poly(sub_x, sub_y, 2)
>>> with Timer():
>>> # Using numpy to build the point groups takes a lot of time.
>>> data_groups = []
>>> for i in range(1000):
>>> slct = (xdata>=i)&(xdata<(i+1))
>>> data_groups.append((xdata[slct], ydata[slct]))
>>> with Timer():
>>> # Fitting helper runtime is much less than building the point groups.
>>> polygroup = np.empty((1000, 3), dtype='float64')
>>> for it, (sub_x, sub_y) in enumerate(data_groups):
>>> polygroup[it,:] = data_prep.fit_poly(sub_x, sub_y, 2)
Wall time: 1.49671 s
Wall time: 1.24653 s
Wall time: 0.215859 s
prepare data
fi
t polynomials
problem of speedup
14
>>> with Timer():
>>> # Using numpy to build the point groups takes a lot of time.
>>> data_groups = []
>>> for i in range(1000):
>>> slct = (xdata>=i)&(xdata<(i+1))
>>> data_groups.append((xdata[slct], ydata[slct]))
Wall time: 1.24653 s
>>> with Timer():
>>> # Fitting helper runtime is much less than building the point groups.
>>> polygroup = np.empty((1000, 3), dtype='float64')
>>> for it, (sub_x, sub_y) in enumerate(data_groups):
>>> polygroup[it,:] = data_prep.fit_poly(sub_x, sub_y, 2)
Wall time: 0.215859 s
>>> with Timer():
>>> rbatch = data_prep.fit_polys(xdata, ydata, 2)
Wall time: 0.21058 s
/**
* This function calculates the least-square regression of multiple sets of
* point clouds to the corresponding polynomial functions of a given order.
*/
modmesh::SimpleArray<double> fit_polys
(
modmesh::SimpleArray<double> const & xarr
, modmesh::SimpleArray<double> const & yarr
, size_t order
)
{
size_t xmin = std::floor(*std::min_element(xarr.begin(), xarr.end()));
size_t xmax = std::ceil(*std::max_element(xarr.begin(), xarr.end()));
size_t ninterval = xmax - xmin;
modmesh::SimpleArray<double> lhs(std::vector<size_t>{ninterval, order+1});
std::fill(lhs.begin(), lhs.end(), 0); // sentinel.
size_t start=0;
for (size_t it=0; it<xmax; ++it)
{
// NOTE: We take advantage of the intrinsic features of the input data
// to determine the grouping. This is ad hoc and hard to maintain. We
// play this trick to demonstrate a hackish way of performing numerical
// calculation.
size_t stop;
for (stop=start; stop<xarr.size(); ++stop)
{
if (xarr[stop]>=it+1) { break; }
}
// Use the single polynomial helper function.
auto sub_lhs = fit_poly(xarr, yarr, start, stop, order);
for (size_t jt=0; jt<order+1; ++jt)
{
lhs(it, jt) = sub_lhs[jt];
}
start = stop;
}
return lhs;
}
prepare data and loop in Python:
prepare data and loop in C++:
C++ wins so much, but we lose
fl
exibility!
// NOTE: We take advantage of the intrinsic features of the input data
// to determine the grouping. This is ad hoc and hard to maintain. We
// play this trick to demonstrate a hackish way of performing numerical
// calculation.
data object:
fl
exible and fast
15
public:
std::vector<Data> inner1(size_t start, size_t len)
{
std::vector<Data> ret;
ret.reserve(len);
inner2(start, len, ret);
return ret;
}
private:
void inner2(size_t start, size_t len,
std::vector<Data> & ret)
{
for (size_t it=0; it < len; ++it)
{
Data data(start+it);
ret.emplace_back(std::move(data));
}
}
public:
void outer(size_t len)
{
result.reserve(len*(len+1)/2);
for (size_t it=0; it < len; ++it)
{
// The output argument passed into
// the private helper is a private
// member datum.
inner2(result.size(), it+1, result);
}
}
Called when consumers want the sub-operation
one by one, and make the code more testable.
Called when batch operation is
demanded.
struct Accumulator
{
public:
std::vector<Data> result;
};
Caller does not see this private
helper that takes an output argument.
only use public data
member for simple
purposes
class to pack data and logic
data object: use case
class StaticMesh
{
// shape data
uint8_t m_ndim = 0;
uint_type m_nnode = 0; ///< Number of nodes (interior).
uint_type m_nface = 0; ///< Number of faces (interior).
uint_type m_ncell = 0; ///< Number of cells (interior).
uint_type m_nbound = 0; ///< Number of boundary faces.
uint_type m_ngstnode = 0; ///< Number of ghost nodes.
uint_type m_ngstface = 0; ///< Number of ghost faces.
uint_type m_ngstcell = 0; ///< Number of ghost cells.
// geometry arrays
MM_DECL_StaticMesh_ARRAY(real_type, ndcrd);
MM_DECL_StaticMesh_ARRAY(real_type, fccnd);
MM_DECL_StaticMesh_ARRAY(real_type, fcnml);
MM_DECL_StaticMesh_ARRAY(real_type, fcara);
MM_DECL_StaticMesh_ARRAY(real_type, clcnd);
MM_DECL_StaticMesh_ARRAY(real_type, clvol);
// meta arrays
MM_DECL_StaticMesh_ARRAY(int_type, fctpn);
MM_DECL_StaticMesh_ARRAY(int_type, cltpn);
MM_DECL_StaticMesh_ARRAY(int_type, clgrp);
// connectivity arrays
MM_DECL_StaticMesh_ARRAY(int_type, fcnds);
MM_DECL_StaticMesh_ARRAY(int_type, fccls);
MM_DECL_StaticMesh_ARRAY(int_type, clnds);
MM_DECL_StaticMesh_ARRAY(int_type, clfcs);
MM_DECL_StaticMesh_ARRAY(int_type, ednds);
}; /* end class StaticMesh */
#define MM_DECL_StaticMesh_ARRAY(TYPE, NAME) 
public: 
SimpleArray<TYPE> const & NAME() const { return m_##NAME; } 
SimpleArray<TYPE> & NAME() { return m_##NAME; } 
template <typename... Args> 
TYPE const & NAME(Args... args) const { return m_##NAME(args...); } 
template <typename... Args> 
TYPE & NAME(Args... args) { return m_##NAME(args...); } 

private: 
SimpleArray<TYPE> m_##NAME
namespace py = pybind11;
(*this)
// shape data
.def_property_readonly("ndim", &wrapped_type::ndim)
.def_property_readonly("nnode", &wrapped_type::nnode)
.def_property_readonly("nface", &wrapped_type::nface)
.def_property_readonly("ncell", &wrapped_type::ncell)
.def_property_readonly("nbound", &wrapped_type::nbound)
.def_property_readonly("ngstnode", &wrapped_type::ngstnode)
.def_property_readonly("ngstface", &wrapped_type::ngstface)
.def_property_readonly("ngstcell", &wrapped_type::ngstcell)
#define MM_DECL_ARRAY(NAME) 
.expose_SimpleArray( 
#NAME, 
[](wrapped_type & self) -> decltype(auto) 
{ return self.NAME(); })
(*this)
// geometry arrays
MM_DECL_ARRAY(ndcrd)
MM_DECL_ARRAY(fccnd)
MM_DECL_ARRAY(fcnml)
MM_DECL_ARRAY(fcara)
MM_DECL_ARRAY(clcnd)
MM_DECL_ARRAY(clvol)
// meta arrays
MM_DECL_ARRAY(fctpn)
MM_DECL_ARRAY(cltpn)
MM_DECL_ARRAY(clgrp)
// connectivity arrays
MM_DECL_ARRAY(fcnds)
MM_DECL_ARRAY(fccls)
MM_DECL_ARRAY(clnds)
MM_DECL_ARRAY(clfcs)
MM_DECL_ARRAY(ednds);
#undef MM_DECL_ARRAY
# Construct the data object
mh = modmesh.StaticMesh(
ndim=3, nnode=4, nface=4, ncell=1)
# Set the data
mh.ndcrd.ndarray[:, :] = 
(0, 0, 0), (0, 1, 0), (-1, 1, 0), (0, 1, 1)
mh.cltpn.ndarray[:] = modmesh.StaticMesh.TETRAHEDRON
mh.clnds.ndarray[:, :5] = [(4, 0, 1, 2, 3)]
# Calculate internal by the input data
# to build up the object
mh.build_interior()
np.testing.assert_almost_equal(
mh.fccnd,
[[-0.3333333, 0.6666667, 0. ],
[ 0. , 0.6666667, 0.3333333],
[-0.3333333, 0.6666667, 0.3333333],
[-0.3333333, 1. , 0.3333333]])
mesh shape information
data arrays
data arrays will be very large: gigabytes in memory
use data in python
C++ library pybind11 wrapper
16
test in Python
def test_2d_trivial_triangles(self):
mh = modmesh.StaticMesh(ndim=2, nnode=4, nface=0, ncell=3)
mh.ndcrd.ndarray[:, :] = (0, 0), (-1, -1), (1, -1), (0, 1)
mh.cltpn.ndarray[:] = modmesh.StaticMesh.TRIANGLE
mh.clnds.ndarray[:, :4] = (3, 0, 1, 2), (3, 0, 2, 3), (3, 0, 3, 1)
self._check_shape(mh, ndim=2, nnode=4, nface=0, ncell=3,
nbound=0, ngstnode=0, ngstface=0, ngstcell=0,
nedge=0)
# Test build interior data.
mh.build_interior(_do_metric=False, _build_edge=False)
self._check_shape(mh, ndim=2, nnode=4, nface=6, ncell=3,
nbound=0, ngstnode=0, ngstface=0, ngstcell=0,
nedge=0)
mh.build_interior() # _do_metric=True, _build_edge=True
self._check_shape(mh, ndim=2, nnode=4, nface=6, ncell=3,
nbound=0, ngstnode=0, ngstface=0, ngstcell=0,
nedge=6)
np.testing.assert_almost_equal(
mh.fccnd,
[[-0.5, -0.5], [0.0, -1.0], [0.5, -0.5],
[0.5, 0.0], [0.0, 0.5], [-0.5, 0.0]])
np.testing.assert_almost_equal(
mh.fcnml,
[[-0.7071068, 0.7071068], [0.0, -1.0], [0.7071068, 0.7071068],
[0.8944272, 0.4472136], [-1.0, -0.0], [-0.8944272, 0.4472136]])
np.testing.assert_almost_equal(
mh.fcara, [1.4142136, 2.0, 1.4142136, 2.236068, 1.0, 2.236068])
np.testing.assert_almost_equal(
mh.clcnd, [[0.0, -0.6666667], [0.3333333, 0.0], [-0.3333333, 0.0]])
np.testing.assert_almost_equal(
mh.clvol, [1.0, 0.5, 0.5])
17
namespace py = pybind11;
(*this)
// shape data
.def_property_readonly("ndim", &wrapped_type::ndim)
.def_property_readonly("nnode", &wrapped_type::nnode)
.def_property_readonly("nface", &wrapped_type::nface)
.def_property_readonly("ncell", &wrapped_type::ncell)
.def_property_readonly("nbound", &wrapped_type::nbound)
.def_property_readonly("ngstnode", &wrapped_type::ngstnode)
.def_property_readonly("ngstface", &wrapped_type::ngstface)
.def_property_readonly("ngstcell", &wrapped_type::ngstcell)
#define MM_DECL_ARRAY(NAME) .expose_SimpleArray( 
#NAME, 
[](wrapped_type & self) -> decltype(auto){ return self.NAME(); })
(*this)
// geometry arrays
MM_DECL_ARRAY(ndcrd)
MM_DECL_ARRAY(fccnd)
MM_DECL_ARRAY(fcnml)
MM_DECL_ARRAY(fcara)
MM_DECL_ARRAY(clcnd)
MM_DECL_ARRAY(clvol)
// meta arrays
MM_DECL_ARRAY(fctpn)
MM_DECL_ARRAY(cltpn)
MM_DECL_ARRAY(clgrp)
// connectivity arrays
MM_DECL_ARRAY(fcnds)
MM_DECL_ARRAY(fccls)
MM_DECL_ARRAY(clnds)
MM_DECL_ARRAY(clfcs)
MM_DECL_ARRAY(ednds);
#undef MM_DECL_ARRAY
wrapping to Python enables fast testing development may be done as writing tests!
# Construct the data object
mh = modmesh.StaticMesh(
ndim=3, nnode=4, nface=4, ncell=1)
# Set the data
mh.ndcrd.ndarray[:, :] = 
(0, 0, 0), (0, 1, 0), (-1, 1, 0), (0, 1, 1)
mh.cltpn.ndarray[:] = modmesh.StaticMesh.TETRAHEDRON
mh.clnds.ndarray[:, :5] = [(4, 0, 1, 2, 3)]
# Calculate internal by the input data
# to build up the object
mh.build_interior()
np.testing.assert_almost_equal(
mh.fccnd,
[[-0.3333333, 0.6666667, 0. ],
[ 0. , 0.6666667, 0.3333333],
[-0.3333333, 0.6666667, 0.3333333],
[-0.3333333, 1. , 0.3333333]])
development
fl
ow
Python prototype
write C++ test C++ wrap to Python
Python app test Python
C++ data object wrap to Python
test Python Python app
Python-centric
C++-centric
class
class class class
function
function
class class
need to be
familiar with
C++ 🙁
happy with
Python 🙂
gain from
the pain
18
many C++ helper classes:
• more features
• better performance
note: a lot of low-level data structure
happens here
develop SimpleArray in C++
SimpleArray std::vector
SimpleArray is
fi
xed size 👍
• only allocate memory on
construction
std::vector is variable size 👎
• bu
ff
er may be invalidated
• implicit memory allocation
(reallocation)
multi-dimensional access 👍
operator()
one-dimensional access 👎
operator[]
19
C++
container
ndarray
manage
access a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory bu
ff
er
make it yourself get it free from STL
buffer ownership
20
ownership
know when and where to allocate
and deallocate memory
(and "resources")
buffer ownership
21
SimpleArray a
meta data
buffer ptr
data bu
ff
er
buffer ownership
22
SimpleArray a
meta data
buffer ptr
data bu
ff
er
SimpleArray b
meta data
buffer ptr
copy
buffer ownership
23
SimpleArray a
meta data
buffer ptr
data bu
ff
er
SimpleArray b
meta data
buffer ptr
copy
buffer ownership
24
SimpleArray a
meta data
buffer ptr
data bu
ff
er
SimpleArray b
meta data
buffer ptr
buffer ownership
25
SimpleArray a
meta data
buffer ptr
data bu
ff
er
SimpleArray b
meta data
buffer ptr
buffer interface and numpy ndarray
template <typename S>
std::enable_if_t<is_simple_array_v<S>, pybind11::array>
to_ndarray(S && sarr)
{
namespace py = pybind11;
using T = typename std::remove_reference_t<S>::value_type;
std::vector<size_t> const shape(sarr.shape().begin(),
sarr.shape().end());
std::vector<size_t> stride(sarr.stride().begin(),
sarr.stride().end());
for (size_t & v : stride) { v *= sarr.itemsize(); }
return py::array(
/* Numpy dtype */
py::detail::npy_format_descriptor<T>::dtype(),
/* Buffer dimensions */
shape,
/* Strides (in bytes) for each index */
stride,
/* Pointer to buffer */
sarr.data(),
/* Owning Python object */
py::cast(sarr.buffer().shared_from_this()));
}
template <typename T>
static SimpleArray<T> makeSimpleArray(
pybind11::array_t<T> & ndarr)
{
typename SimpleArray<T>::shape_type shape;
for (ssize_t i = 0; i < ndarr.ndim(); ++i)
{
shape.push_back(ndarr.shape(i));
}
std::shared_ptr<ConcreteBuffer> const buffer =
ConcreteBuffer::construct(
ndarr.nbytes(),
ndarr.mutable_data(),
std::make_unique<ConcreteBufferNdarrayRemover>(
ndarr));
return SimpleArray<T>(shape, buffer);
}
take SimpleArray bu
ff
er to make ndarray take ndarray bu
ff
er to make SimpleArray
26
conclusions
27
• Python-C++ hybrid system: utmost speed and
fl
exibility
• Python for prototyping and testing
• use data object to organize code in C++
• design zero-copy interface
• what you get: e
ffi
cient, data-parallelizable code fully driven by
Python

More Related Content

What's hot

異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j昌桓 李
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会Shigeru Hanada
 
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!デベロッパーネットワーク
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについてKumazaki Hiroki
 
Perl ウェブ開発の中世〜CGI と Plack の間〜
Perl ウェブ開発の中世〜CGI と Plack の間〜Perl ウェブ開発の中世〜CGI と Plack の間〜
Perl ウェブ開発の中世〜CGI と Plack の間〜鉄次 尾形
 
普通の人でもわかる Paxos
普通の人でもわかる Paxos普通の人でもわかる Paxos
普通の人でもわかる Paxostyonekura
 
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
PyConAPAC2023 ワークフローエンジン Apache Airflowを用いた 大規模データパイプライン構築と改善
PyConAPAC2023 ワークフローエンジン  Apache Airflowを用いた 大規模データパイプライン構築と改善PyConAPAC2023 ワークフローエンジン  Apache Airflowを用いた 大規模データパイプライン構築と改善
PyConAPAC2023 ワークフローエンジン Apache Airflowを用いた 大規模データパイプライン構築と改善株式会社MonotaRO Tech Team
 
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...Masahiko Sawada
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)NTT DATA Technology & Innovation
 
図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』Proktmr
 
グラフデータの視覚化ツールーTom Sawyer Perspectives
グラフデータの視覚化ツールーTom Sawyer Perspectivesグラフデータの視覚化ツールーTom Sawyer Perspectives
グラフデータの視覚化ツールーTom Sawyer Perspectives昌桓 李
 
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏Insight Technology, Inc.
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門Kimikazu Kato
 
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~Shinji Takao
 
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-Takuya Akiba
 
[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata
[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata
[B23] PostgreSQLのインデックス・チューニング by Tomonari KatsumataInsight Technology, Inc.
 

What's hot (20)

異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j異次元のグラフデータベースNeo4j
異次元のグラフデータベースNeo4j
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
 
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについて
 
Perl ウェブ開発の中世〜CGI と Plack の間〜
Perl ウェブ開発の中世〜CGI と Plack の間〜Perl ウェブ開発の中世〜CGI と Plack の間〜
Perl ウェブ開発の中世〜CGI と Plack の間〜
 
普通の人でもわかる Paxos
普通の人でもわかる Paxos普通の人でもわかる Paxos
普通の人でもわかる Paxos
 
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_bigmで全文検索するときに気を付けたい5つのポイント(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
PyConAPAC2023 ワークフローエンジン Apache Airflowを用いた 大規模データパイプライン構築と改善
PyConAPAC2023 ワークフローエンジン  Apache Airflowを用いた 大規模データパイプライン構築と改善PyConAPAC2023 ワークフローエンジン  Apache Airflowを用いた 大規模データパイプライン構築と改善
PyConAPAC2023 ワークフローエンジン Apache Airflowを用いた 大規模データパイプライン構築と改善
 
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
 
ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開ヤフー社内でやってるMySQLチューニングセミナー大公開
ヤフー社内でやってるMySQLチューニングセミナー大公開
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
 
図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』
 
グラフデータの視覚化ツールーTom Sawyer Perspectives
グラフデータの視覚化ツールーTom Sawyer Perspectivesグラフデータの視覚化ツールーTom Sawyer Perspectives
グラフデータの視覚化ツールーTom Sawyer Perspectives
 
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門
 
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
 
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
 
[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata
[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata
[B23] PostgreSQLのインデックス・チューニング by Tomonari Katsumata
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 

Similar to Write Python for Speed

Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithmK Hari Shankar
 
A scrupulous code review - 15 bugs in C++ code
A scrupulous code review - 15 bugs in C++ codeA scrupulous code review - 15 bugs in C++ code
A scrupulous code review - 15 bugs in C++ codePVS-Studio LLC
 
Simple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialJin-Hwa Kim
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and PolynomialAroosa Rajput
 
I have written the code but cannot complete the assignment please help.pdf
I have written the code but cannot complete the assignment please help.pdfI have written the code but cannot complete the assignment please help.pdf
I have written the code but cannot complete the assignment please help.pdfshreeaadithyaacellso
 
Gentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingGentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingSaurabh Singh
 
Chainer-Compiler 動かしてみた
Chainer-Compiler 動かしてみたChainer-Compiler 動かしてみた
Chainer-Compiler 動かしてみたAkira Maruoka
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Cdiscount
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Provectus
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
 
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docx
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docxCOMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docx
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docxTashiBhutia12
 
C++11 - A Change in Style - v2.0
C++11 - A Change in Style - v2.0C++11 - A Change in Style - v2.0
C++11 - A Change in Style - v2.0Yaser Zhian
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35Bilal Ahmed
 
Frsa
FrsaFrsa
Frsa_111
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerAndrey Karpov
 

Similar to Write Python for Speed (20)

Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
A scrupulous code review - 15 bugs in C++ code
A scrupulous code review - 15 bugs in C++ codeA scrupulous code review - 15 bugs in C++ code
A scrupulous code review - 15 bugs in C++ code
 
Simple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorial
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
I have written the code but cannot complete the assignment please help.pdf
I have written the code but cannot complete the assignment please help.pdfI have written the code but cannot complete the assignment please help.pdf
I have written the code but cannot complete the assignment please help.pdf
 
Gentle Introduction to Functional Programming
Gentle Introduction to Functional ProgrammingGentle Introduction to Functional Programming
Gentle Introduction to Functional Programming
 
Chainer-Compiler 動かしてみた
Chainer-Compiler 動かしてみたChainer-Compiler 動かしてみた
Chainer-Compiler 動かしてみた
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docx
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docxCOMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docx
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docx
 
C++11 - A Change in Style - v2.0
C++11 - A Change in Style - v2.0C++11 - A Change in Style - v2.0
C++11 - A Change in Style - v2.0
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35
 
Frsa
FrsaFrsa
Frsa
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
SCIPY-SYMPY.pdf
SCIPY-SYMPY.pdfSCIPY-SYMPY.pdf
SCIPY-SYMPY.pdf
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
 
bobok
bobokbobok
bobok
 

More from Yung-Yu Chen

SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++Yung-Yu Chen
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a jobYung-Yu Chen
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Yung-Yu Chen
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from PythonYung-Yu Chen
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeYung-Yu Chen
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for SpeedYung-Yu Chen
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computingYung-Yu Chen
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational WorkYung-Yu Chen
 

More from Yung-Yu Chen (10)

SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

Write Python for Speed

  • 1. Write Python for Speed Yung-Yu Chen (yyc@sciwork.dev) October 2023 1
  • 2. speed is the king リニア中央新幹線 (Chuo Shinkansen) https://linear-chuo-shinkansen.jr-central.co.jp/about/design/ 500km/h high speed rail by 2038 "can't wait!" 🙂 what we are talking about is unparalleled speed 2
  • 3. numerical calculation needs speed u = 0 u = 0 u = 0 u(y) = sin(πy) i, j i, j + 1 i, j − 1 i + 1,j i − 1,j i + 1,j + 1 i + 1,j − 1 i − 1,j + 1 i − 1,j − 1 grid points computing domain (and solution) ∂2 u ∂x2 + ∂2 u ∂y2 = 0 (0 < x < 1; 0 < y < 1) u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1) u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1) Equation for domain interior Boundary conditions un+1 (xi, yi) = un (xi+1, yj) + un (xi−1, yj) + un (xi, yj+1) + un (xi, yj−1) 4 Use the point-Jacobi method to march the Laplace equation 3
  • 4. Python is slow for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] + u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4 for (size_t it=1; it<nx-1; ++it) { for (size_t jt=1; jt<nx-1; ++jt) { un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4; } } Point-Jacobi method Python nested loop Numpy array C++ nested loop 4.797s (1x) 0.055s (87x) 0.025s (192x) un+1 (xi, yi) = un (xi+1, yj) + un (xi−1, yj) + un (xi, yj+1) + un (xi, yj−1) 4 4
  • 5. example: hyperbolic PDEs 5 numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 use case: stress waves in anisotropic solids use case: compressible fl ows
  • 6. 6 observe theorize simple setup software package wall of complexity prototype numerical analysis analytical solution mass production run a lot of code development u(x, y) = sinh(πx) sinh(π) sin(πy) development procedure always keep the problem (math or physics) in mind always make the code run
  • 7. hybrid architecture • everyone wants the simplicity of Python • HPC programmers included • but we must use C++ for speed • to be honest, machine code • write Python to drive the C++ code / the machine 7
  • 8. machine determines speed void calc_distance( size_t const n , double const * x , double const * y , double * r) { for (size_t i = 0 ; i < n ; ++i) { r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]); } } vmovupd ymm0, ymmword [rsi + r9*8] vmulpd ymm0, ymm0, ymm0 vmovupd ymm1, ymmword [rdx + r9*8] vmulpd ymm1, ymm1, ymm1 vaddpd ymm0, ymm0, ymm1 vsqrtpd ymm0, ymm0 movupd xmm0, xmmword [rsi + r8*8] mulpd xmm0, xmm0 movupd xmm1, xmmword [rdx + r8*8] mulpd xmm1, xmm1 addpd xmm1, xmm0 sqrtpd xmm0, xmm1 AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization C++ code 8
  • 9. array: HPC fundamental ... ... ... regularly allocated bu ff er regularly allocated bu ff er regularly allocated bu ff er regularly accessing algorithm randomly accessing algorithm quoted from SemiAnalysis https://www.semianalysis.com/p/apple-m2-die-shot-and-architecture die shots of popular chips the data structure for cache optimization, data parallelism, SIMD, GPU, all the techniques of high-performance computing (HPC) 9
  • 10. array for zero-copy across Python and C++ Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn share memory bu ff er across language ndarray C++ container Python app C++ app C++ container ndarray manage access a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory bu ff er shared across language 👍 bottom (C++) - up (Python) design Python app C++ app C++ container ndarray manage access a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory bu ff er shared across language top (Python) - down (C++) design 👎 Python-controlling lifecycle is bad for long-running process C++ allows fi ne-grained control of all resources 10
  • 11. quick prototype in Python: something new and fragile • thread pool for I/O? certainly Python • certainly not want to write complicated C++ to prototype • 64 threads on 32-core servers? • consider to move it to C++ • data-parallel code on top of the thread pool? • time to go with TBB (thread- building block) C++ library from _thread import allocate_lock, start_new_thread class ThreadPool(object): """Python prototype for I/O thread pool""" def __init__(self, nthread): # Worker callback self.func = None # Placeholders for managing data self.__threadids = [None] * nthread self.__threads = [None] * nthread self.__returns = [None] * nthread # Initialize thread managing data for it in range(nthread): mlck = allocate_lock(); mlck.acquire() wlck = allocate_lock(); wlck.acquire() tdata = [mlck, wlck, None, None] self.__threads[it] = tdata tid = start_new_thread(self.eventloop, (tdata,)) self.__threadids[it] = tid 11
  • 12. quick prototype in Python: something complex i, j i, j + 1 i, j − 1 i + 1,j i − 1,j i + 1,j + 1 i + 1,j − 1 i − 1,j + 1 i − 1,j − 1 di ff erence equation def solve_python_loop(): u = uoriginal.copy() # Input from outer scope un = u.copy() # Create the buffer for the next time step converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm 12 grid points python nested loop implementing point-Jacobi method u(xi, yj) = u(xi+1, yj) + u(xi−1, yj) + u(xi, yj+1) + u(xi, yj−1) 4 point-Jacobi method un+1 (xi, yi) = un (xi+1, yj) + un (xi−1, yj) + un (xi, yj+1) + un (xi, yj−1) 4
  • 13. production: many data many ways 13 fi t scattered points to a polynomial f(x) = a3x3 + a2x2 + a1x + a0 test many (like 1,000+) such data sets: def plot_poly_fitted(i): slct = (xdata>=i)&(xdata<(i+1)) sub_x = xdata[slct] sub_y = ydata[slct] poly = data_prep.fit_poly(sub_x, sub_y, 3) print(poly) poly = np.poly1d(poly) xp = np.linspace(sub_x.min(), sub_x.max(), 100) plt.plot(sub_x, sub_y, '.', xp, poly(xp), '-') plot_poly_fitted(10) (yeah, the data points seem to be too random to be represented by a polynomial) // The rank of the linear map is (order+1). modmesh::SimpleArray<double> matrix(std::vector<size_t>{order+1, order+1}); // Use the x coordinates to build the linear map for least-square // regression. for (size_t it=0; it<order+1; ++it) { for (size_t jt=0; jt<order+1; ++jt) { double & val = matrix(it, jt); val = 0; for (size_t kt=start; kt<stop; ++kt) { val += pow(xarr[kt], it+jt); } } } >>> with Timer(): >>> # Do the calculation for the 1000 groups of points. >>> polygroup = np.empty((1000, 3), dtype='float64') >>> for i in range(1000): >>> # Use numpy to build the point group. >>> slct = (xdata>=i)&(xdata<(i+1)) >>> sub_x = xdata[slct] >>> sub_y = ydata[slct] >>> polygroup[i,:] = data_prep.fit_poly(sub_x, sub_y, 2) >>> with Timer(): >>> # Using numpy to build the point groups takes a lot of time. >>> data_groups = [] >>> for i in range(1000): >>> slct = (xdata>=i)&(xdata<(i+1)) >>> data_groups.append((xdata[slct], ydata[slct])) >>> with Timer(): >>> # Fitting helper runtime is much less than building the point groups. >>> polygroup = np.empty((1000, 3), dtype='float64') >>> for it, (sub_x, sub_y) in enumerate(data_groups): >>> polygroup[it,:] = data_prep.fit_poly(sub_x, sub_y, 2) Wall time: 1.49671 s Wall time: 1.24653 s Wall time: 0.215859 s prepare data fi t polynomials
  • 14. problem of speedup 14 >>> with Timer(): >>> # Using numpy to build the point groups takes a lot of time. >>> data_groups = [] >>> for i in range(1000): >>> slct = (xdata>=i)&(xdata<(i+1)) >>> data_groups.append((xdata[slct], ydata[slct])) Wall time: 1.24653 s >>> with Timer(): >>> # Fitting helper runtime is much less than building the point groups. >>> polygroup = np.empty((1000, 3), dtype='float64') >>> for it, (sub_x, sub_y) in enumerate(data_groups): >>> polygroup[it,:] = data_prep.fit_poly(sub_x, sub_y, 2) Wall time: 0.215859 s >>> with Timer(): >>> rbatch = data_prep.fit_polys(xdata, ydata, 2) Wall time: 0.21058 s /** * This function calculates the least-square regression of multiple sets of * point clouds to the corresponding polynomial functions of a given order. */ modmesh::SimpleArray<double> fit_polys ( modmesh::SimpleArray<double> const & xarr , modmesh::SimpleArray<double> const & yarr , size_t order ) { size_t xmin = std::floor(*std::min_element(xarr.begin(), xarr.end())); size_t xmax = std::ceil(*std::max_element(xarr.begin(), xarr.end())); size_t ninterval = xmax - xmin; modmesh::SimpleArray<double> lhs(std::vector<size_t>{ninterval, order+1}); std::fill(lhs.begin(), lhs.end(), 0); // sentinel. size_t start=0; for (size_t it=0; it<xmax; ++it) { // NOTE: We take advantage of the intrinsic features of the input data // to determine the grouping. This is ad hoc and hard to maintain. We // play this trick to demonstrate a hackish way of performing numerical // calculation. size_t stop; for (stop=start; stop<xarr.size(); ++stop) { if (xarr[stop]>=it+1) { break; } } // Use the single polynomial helper function. auto sub_lhs = fit_poly(xarr, yarr, start, stop, order); for (size_t jt=0; jt<order+1; ++jt) { lhs(it, jt) = sub_lhs[jt]; } start = stop; } return lhs; } prepare data and loop in Python: prepare data and loop in C++: C++ wins so much, but we lose fl exibility! // NOTE: We take advantage of the intrinsic features of the input data // to determine the grouping. This is ad hoc and hard to maintain. We // play this trick to demonstrate a hackish way of performing numerical // calculation.
  • 15. data object: fl exible and fast 15 public: std::vector<Data> inner1(size_t start, size_t len) { std::vector<Data> ret; ret.reserve(len); inner2(start, len, ret); return ret; } private: void inner2(size_t start, size_t len, std::vector<Data> & ret) { for (size_t it=0; it < len; ++it) { Data data(start+it); ret.emplace_back(std::move(data)); } } public: void outer(size_t len) { result.reserve(len*(len+1)/2); for (size_t it=0; it < len; ++it) { // The output argument passed into // the private helper is a private // member datum. inner2(result.size(), it+1, result); } } Called when consumers want the sub-operation one by one, and make the code more testable. Called when batch operation is demanded. struct Accumulator { public: std::vector<Data> result; }; Caller does not see this private helper that takes an output argument. only use public data member for simple purposes class to pack data and logic
  • 16. data object: use case class StaticMesh { // shape data uint8_t m_ndim = 0; uint_type m_nnode = 0; ///< Number of nodes (interior). uint_type m_nface = 0; ///< Number of faces (interior). uint_type m_ncell = 0; ///< Number of cells (interior). uint_type m_nbound = 0; ///< Number of boundary faces. uint_type m_ngstnode = 0; ///< Number of ghost nodes. uint_type m_ngstface = 0; ///< Number of ghost faces. uint_type m_ngstcell = 0; ///< Number of ghost cells. // geometry arrays MM_DECL_StaticMesh_ARRAY(real_type, ndcrd); MM_DECL_StaticMesh_ARRAY(real_type, fccnd); MM_DECL_StaticMesh_ARRAY(real_type, fcnml); MM_DECL_StaticMesh_ARRAY(real_type, fcara); MM_DECL_StaticMesh_ARRAY(real_type, clcnd); MM_DECL_StaticMesh_ARRAY(real_type, clvol); // meta arrays MM_DECL_StaticMesh_ARRAY(int_type, fctpn); MM_DECL_StaticMesh_ARRAY(int_type, cltpn); MM_DECL_StaticMesh_ARRAY(int_type, clgrp); // connectivity arrays MM_DECL_StaticMesh_ARRAY(int_type, fcnds); MM_DECL_StaticMesh_ARRAY(int_type, fccls); MM_DECL_StaticMesh_ARRAY(int_type, clnds); MM_DECL_StaticMesh_ARRAY(int_type, clfcs); MM_DECL_StaticMesh_ARRAY(int_type, ednds); }; /* end class StaticMesh */ #define MM_DECL_StaticMesh_ARRAY(TYPE, NAME) public: SimpleArray<TYPE> const & NAME() const { return m_##NAME; } SimpleArray<TYPE> & NAME() { return m_##NAME; } template <typename... Args> TYPE const & NAME(Args... args) const { return m_##NAME(args...); } template <typename... Args> TYPE & NAME(Args... args) { return m_##NAME(args...); } private: SimpleArray<TYPE> m_##NAME namespace py = pybind11; (*this) // shape data .def_property_readonly("ndim", &wrapped_type::ndim) .def_property_readonly("nnode", &wrapped_type::nnode) .def_property_readonly("nface", &wrapped_type::nface) .def_property_readonly("ncell", &wrapped_type::ncell) .def_property_readonly("nbound", &wrapped_type::nbound) .def_property_readonly("ngstnode", &wrapped_type::ngstnode) .def_property_readonly("ngstface", &wrapped_type::ngstface) .def_property_readonly("ngstcell", &wrapped_type::ngstcell) #define MM_DECL_ARRAY(NAME) .expose_SimpleArray( #NAME, [](wrapped_type & self) -> decltype(auto) { return self.NAME(); }) (*this) // geometry arrays MM_DECL_ARRAY(ndcrd) MM_DECL_ARRAY(fccnd) MM_DECL_ARRAY(fcnml) MM_DECL_ARRAY(fcara) MM_DECL_ARRAY(clcnd) MM_DECL_ARRAY(clvol) // meta arrays MM_DECL_ARRAY(fctpn) MM_DECL_ARRAY(cltpn) MM_DECL_ARRAY(clgrp) // connectivity arrays MM_DECL_ARRAY(fcnds) MM_DECL_ARRAY(fccls) MM_DECL_ARRAY(clnds) MM_DECL_ARRAY(clfcs) MM_DECL_ARRAY(ednds); #undef MM_DECL_ARRAY # Construct the data object mh = modmesh.StaticMesh( ndim=3, nnode=4, nface=4, ncell=1) # Set the data mh.ndcrd.ndarray[:, :] = (0, 0, 0), (0, 1, 0), (-1, 1, 0), (0, 1, 1) mh.cltpn.ndarray[:] = modmesh.StaticMesh.TETRAHEDRON mh.clnds.ndarray[:, :5] = [(4, 0, 1, 2, 3)] # Calculate internal by the input data # to build up the object mh.build_interior() np.testing.assert_almost_equal( mh.fccnd, [[-0.3333333, 0.6666667, 0. ], [ 0. , 0.6666667, 0.3333333], [-0.3333333, 0.6666667, 0.3333333], [-0.3333333, 1. , 0.3333333]]) mesh shape information data arrays data arrays will be very large: gigabytes in memory use data in python C++ library pybind11 wrapper 16
  • 17. test in Python def test_2d_trivial_triangles(self): mh = modmesh.StaticMesh(ndim=2, nnode=4, nface=0, ncell=3) mh.ndcrd.ndarray[:, :] = (0, 0), (-1, -1), (1, -1), (0, 1) mh.cltpn.ndarray[:] = modmesh.StaticMesh.TRIANGLE mh.clnds.ndarray[:, :4] = (3, 0, 1, 2), (3, 0, 2, 3), (3, 0, 3, 1) self._check_shape(mh, ndim=2, nnode=4, nface=0, ncell=3, nbound=0, ngstnode=0, ngstface=0, ngstcell=0, nedge=0) # Test build interior data. mh.build_interior(_do_metric=False, _build_edge=False) self._check_shape(mh, ndim=2, nnode=4, nface=6, ncell=3, nbound=0, ngstnode=0, ngstface=0, ngstcell=0, nedge=0) mh.build_interior() # _do_metric=True, _build_edge=True self._check_shape(mh, ndim=2, nnode=4, nface=6, ncell=3, nbound=0, ngstnode=0, ngstface=0, ngstcell=0, nedge=6) np.testing.assert_almost_equal( mh.fccnd, [[-0.5, -0.5], [0.0, -1.0], [0.5, -0.5], [0.5, 0.0], [0.0, 0.5], [-0.5, 0.0]]) np.testing.assert_almost_equal( mh.fcnml, [[-0.7071068, 0.7071068], [0.0, -1.0], [0.7071068, 0.7071068], [0.8944272, 0.4472136], [-1.0, -0.0], [-0.8944272, 0.4472136]]) np.testing.assert_almost_equal( mh.fcara, [1.4142136, 2.0, 1.4142136, 2.236068, 1.0, 2.236068]) np.testing.assert_almost_equal( mh.clcnd, [[0.0, -0.6666667], [0.3333333, 0.0], [-0.3333333, 0.0]]) np.testing.assert_almost_equal( mh.clvol, [1.0, 0.5, 0.5]) 17 namespace py = pybind11; (*this) // shape data .def_property_readonly("ndim", &wrapped_type::ndim) .def_property_readonly("nnode", &wrapped_type::nnode) .def_property_readonly("nface", &wrapped_type::nface) .def_property_readonly("ncell", &wrapped_type::ncell) .def_property_readonly("nbound", &wrapped_type::nbound) .def_property_readonly("ngstnode", &wrapped_type::ngstnode) .def_property_readonly("ngstface", &wrapped_type::ngstface) .def_property_readonly("ngstcell", &wrapped_type::ngstcell) #define MM_DECL_ARRAY(NAME) .expose_SimpleArray( #NAME, [](wrapped_type & self) -> decltype(auto){ return self.NAME(); }) (*this) // geometry arrays MM_DECL_ARRAY(ndcrd) MM_DECL_ARRAY(fccnd) MM_DECL_ARRAY(fcnml) MM_DECL_ARRAY(fcara) MM_DECL_ARRAY(clcnd) MM_DECL_ARRAY(clvol) // meta arrays MM_DECL_ARRAY(fctpn) MM_DECL_ARRAY(cltpn) MM_DECL_ARRAY(clgrp) // connectivity arrays MM_DECL_ARRAY(fcnds) MM_DECL_ARRAY(fccls) MM_DECL_ARRAY(clnds) MM_DECL_ARRAY(clfcs) MM_DECL_ARRAY(ednds); #undef MM_DECL_ARRAY wrapping to Python enables fast testing development may be done as writing tests! # Construct the data object mh = modmesh.StaticMesh( ndim=3, nnode=4, nface=4, ncell=1) # Set the data mh.ndcrd.ndarray[:, :] = (0, 0, 0), (0, 1, 0), (-1, 1, 0), (0, 1, 1) mh.cltpn.ndarray[:] = modmesh.StaticMesh.TETRAHEDRON mh.clnds.ndarray[:, :5] = [(4, 0, 1, 2, 3)] # Calculate internal by the input data # to build up the object mh.build_interior() np.testing.assert_almost_equal( mh.fccnd, [[-0.3333333, 0.6666667, 0. ], [ 0. , 0.6666667, 0.3333333], [-0.3333333, 0.6666667, 0.3333333], [-0.3333333, 1. , 0.3333333]])
  • 18. development fl ow Python prototype write C++ test C++ wrap to Python Python app test Python C++ data object wrap to Python test Python Python app Python-centric C++-centric class class class class function function class class need to be familiar with C++ 🙁 happy with Python 🙂 gain from the pain 18 many C++ helper classes: • more features • better performance note: a lot of low-level data structure happens here
  • 19. develop SimpleArray in C++ SimpleArray std::vector SimpleArray is fi xed size 👍 • only allocate memory on construction std::vector is variable size 👎 • bu ff er may be invalidated • implicit memory allocation (reallocation) multi-dimensional access 👍 operator() one-dimensional access 👎 operator[] 19 C++ container ndarray manage access a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory bu ff er make it yourself get it free from STL
  • 20. buffer ownership 20 ownership know when and where to allocate and deallocate memory (and "resources")
  • 21. buffer ownership 21 SimpleArray a meta data buffer ptr data bu ff er
  • 22. buffer ownership 22 SimpleArray a meta data buffer ptr data bu ff er SimpleArray b meta data buffer ptr copy
  • 23. buffer ownership 23 SimpleArray a meta data buffer ptr data bu ff er SimpleArray b meta data buffer ptr copy
  • 24. buffer ownership 24 SimpleArray a meta data buffer ptr data bu ff er SimpleArray b meta data buffer ptr
  • 25. buffer ownership 25 SimpleArray a meta data buffer ptr data bu ff er SimpleArray b meta data buffer ptr
  • 26. buffer interface and numpy ndarray template <typename S> std::enable_if_t<is_simple_array_v<S>, pybind11::array> to_ndarray(S && sarr) { namespace py = pybind11; using T = typename std::remove_reference_t<S>::value_type; std::vector<size_t> const shape(sarr.shape().begin(), sarr.shape().end()); std::vector<size_t> stride(sarr.stride().begin(), sarr.stride().end()); for (size_t & v : stride) { v *= sarr.itemsize(); } return py::array( /* Numpy dtype */ py::detail::npy_format_descriptor<T>::dtype(), /* Buffer dimensions */ shape, /* Strides (in bytes) for each index */ stride, /* Pointer to buffer */ sarr.data(), /* Owning Python object */ py::cast(sarr.buffer().shared_from_this())); } template <typename T> static SimpleArray<T> makeSimpleArray( pybind11::array_t<T> & ndarr) { typename SimpleArray<T>::shape_type shape; for (ssize_t i = 0; i < ndarr.ndim(); ++i) { shape.push_back(ndarr.shape(i)); } std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct( ndarr.nbytes(), ndarr.mutable_data(), std::make_unique<ConcreteBufferNdarrayRemover>( ndarr)); return SimpleArray<T>(shape, buffer); } take SimpleArray bu ff er to make ndarray take ndarray bu ff er to make SimpleArray 26
  • 27. conclusions 27 • Python-C++ hybrid system: utmost speed and fl exibility • Python for prototyping and testing • use data object to organize code in C++ • design zero-copy interface • what you get: e ffi cient, data-parallelizable code fully driven by Python