SlideShare a Scribd company logo
Masashi Shibata
MLOps Case Studies:
Building fast, scalable, and
high-accuracy ML systems
2
Three MLOps Case Studies
Case studies to apply ML technologies into our products.
1. How to build a memory-efficient Python
binding using Cython and Numpy C-API
2. Implement a transfer learning method for
Hyperparameter Optimization
3. Fix complex bug of WebSocket server
Understanding Green threads and how WSGI works
Accelerate a prediction server and
write our own memory-efficient
Python binding
1
Dynalyst

An advertisement product (DSP)

We use FFM for CVR prediction

5
Field-aware Factorization Machines
https://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf
We use FFM for CVR prediction.
● LIBFFM is written in C++ and
provide a command line
interface.
● We added some new features to
improve the performance.
● Repository:
https://github.com/ycjuan/libffm
6
Feedback Shift
Correction
● We propose importance weighting
approach to address the feedback
shift.
● According to an online experiment,
our method improves the sales 30%.
● We added some modifications in the
loss function of LIBFFM.
https://dl.acm.org/doi/10.1145/3366423.3380032
We need to implement our own
Python binding for LIBFFM
A Feedback Shift Correction in Predicting
Conversion Rates under Delayed Feedback
Click Conversion
Train ML model
Time
Some positive instances at the
training period are labeled as negative.
7
Performance Tuning
High Performance Prediction Server
● Throughput: a few hundred thousand rps
● Latency: ~ 100ms
8
Challenges
ML Pipeline
● Implement our own Python-binding of LIBFFM(C++)
High Performance Prediction Server
● Throughput: a few hundred thousand rps
● Latency: ~ 100ms
9
Accelerate Prediction Server
using Cython.
10
Prediction Server

(gRPC)

ML Pipeline

Prediction Server

Cython

In [1]: %load_ext cython
In [2]: def py_fibonacci(n):
...: a, b = 0.0, 1.0
...: for i in range(n):
...: a, b = a + b, a
...: return a
In [2]: %%cython
...: def cy_fibonacci(int n):
...: cdef int i
...: cdef double a = 0.0, b = 1.0
...: for i in range(n):
...: a, b = a + b, a
...: return a
In [4]: %timeit py_fibonacci(10)
582 ns ± 3.72 ns per loop (...)
In [5]: %timeit cy_fibonacci(10)
43.4 ns ± 0.14 ns per loop (...)
An optimising static compiler for both
the Python and Cython

12
Releasing GIL
GIL (Global Interpreter Lock)
● Only one native thread that holds
GIL can execute Python bytecode
● Even if using multi-threads, it isn’t
executed in parallel at the processor
core level
● GIL can be explicitly released when
calling pure C function. ※1
def fibonacci(kwargs):
cdef double a
cdef int n
n = kwargs.get('n')
with nogil:
a = fibonacci_nogil(n)
return a
cdef double fibonacci_nogil(int n) nogil:
...
Yellow lines of code
interacts with Python/C API
Pure C function
※1 Calling PY_BEGIN_ALLOW_THREAD macro and
Py_END_ALLOW_THREADS macro in C-level.
13
Cython Compiler
Directives
● cdivision: ZeroDivisionError
Exception
● boundscheck: IndexError Exception
● wraparound: Negative Indexing
if size is zero, Python must
throw ZeroDivisionError
exception.
14
Results
The latency and throughput is
improved by Cython
● The time of FFM prediction is 10%
of the original code.
● Latency is 60% than before
● It can receive 1.35x requests per
second than before
15
Build a memory-efficient Python
binding using Cython and NumPy
C-API
16
Wrapping LIBFFM
1. Declare C++ functions and structs by
cdef extern from keyword
2. Initialize C++ structs by
PyMem_Malloc※1
3. Calling C++ functions
4. Release a memory by PyMem_Free
# cython: language_level=3
from cpython.mem cimport PyMem_Malloc, PyMem_Free
cdef extern from "ffm.h" namespace "ffm" nogil:
struct ffm_problem:
ffm_data* data
ffm_model *ffm_train_with_validation(...)
cdef ffm_problem* make_ffm_prob(...):
cdef ffm_problem* prob
prob = <ffm_problem *> PyMem_Malloc(sizeof(ffm_problem))
if prob is NULL:
raise MemoryError("Insufficient memory for prob")
prob.data = ...
return prob
def train(...):
cdef ffm_problem* tr_ptr = make_ffm_prob(...)
try:
tr_ptr = make_ffm_prob(tr[0], tr[1])
model_ptr = ffm_train_with_validation(tr_ptr, ...)
finally:
free_ffm_prob(tr_ptr)
return weights, best_iteration
※1 from libc.stdlib cimport malloc can also be used, but PyMem_Malloc
allocates memory area from the CPython heap, so the number of system call
issuance can be reduced. It is more efficient to allocate a particularly small area.
17
C++ (LIBFFM)
Cython
Memory Management
Allocate memory for Weights
ptr = malloc(n*m*k*sizeof(float))
Train FFM Model
model = ffm.train()
Call C++ Function
ffm_train_with_validation()
Python
Release memory
free(ptr)
Release a Python object
del model
Wrap weights array on NumPy
(NumPy C-APIを利用)
Instantiate Python object
model = ffm.train()
18
Reference Counting
● CPython’s memory management
mechanism is based on the reference
counting.
● Release the memory area of the C++
array at the same time that the Numpy
array is destroyed.
● Note that the reference count is
displayed as 2 because it is
incremented when calling
sys.getrefcount()
import ffm
import sys
def main():
train_data = ffm.Dataset(...)
valid_data = ffm.Dataset(...)
# ‘model._weights’ is C++ weights array
# We need to deallocate it in conjunction
# with Python's memory management
model = ffm.train(train_data, valid_data)
print(sys.getrefcount(model._weights))
# -> 2
del model
# -> ‘model.weights’ is deallocated.
print("Done")
# -> Done
19
NumPy C-API
● Release a memory buffer of C++ array
by libc.stdlib.free()
● PyArray_SimpleNewFromData:
Wrap C-contiguous array with NumPy
by specifying array pointer, shape
and type information.
● PyArray_SetBaseObject:
Set an base object that holds the
content of NumPy Array(model_ptr)
cimport numpy as cnp
from libc.stdlib cimport free
cdef class _weights_finalizer:
cdef void *_data
def __dealloc__(self):
if self._data is not NULL:
free(self._data)
cdef object _train(...):
cdef:
cnp.ndarray arr
_weights_finalizer f = _weights_finalizer()
model_ptr = ffm_train_with_validation(...)
shape = (model_ptr.n, model_ptr.m, model_ptr.k)
# Wrap FFM weights(model_ptr.W) with NumPy Array
arr = cnp.PyArray_SimpleNewFromData(
3, shape, cnp.NPY_FLOAT32, model_ptr.W)
f._data = <void*> model_ptr.W
cnp.set_array_base(arr, f)
free(model_ptr)
return arr, best_iteration
20
● 機械学習モデルの精度が売上に直結
○ 因果推論の手法を使った遅れコンバージョン問題への対処
○ データコピーなしで安全に配列のメモリー領域を管理
● 大量のトラフィック、厳しいレイテンシー要件 (100ms以内)
○ Cythonを使った推論処理の高速化
○ スループット1.35倍、レイテンシー60%
Summary
Implement Transfer Learning
Method for Hyperparameter
Optimization
2
22
Situation
Fetch latest
training data
ML Pipeline Run HPO
Best
Hyperparameters
Fetch latest
training data
ML Pipeline Run HPO
Best
hyperparameters
Our ML pipeline triggered weekly and optimize hyperparameters with new dataset.
1 week later
23
Challenges
Fetch latest
training data
ML Pipeline Run HPO HPO results
Fetch latest
training data
ML Pipeline Run HPO HPO results
How can we exploit previous optimization history?
1 week later
24
Optuna + MLflow
25
Optuna
Python library for hyperparameter
optimization.
● Define-by-Run style API
● Various state-of-the-art
algorithms support
● Pluggable storage backend
● Easy distributed optimization
● Web Dashboard
https://github.com/optuna/optuna
import optuna
def objective(trial):
regressor_name = trial.suggest_categorical(
'classifier', ['SVR', 'RandomForest']
)
if regressor_name == 'SVR':
svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True)
regressor_obj = sklearn.svm.SVR(C=svr_c)
else:
rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32)
regressor_obj = RandomForestRegressor(max_depth=rf_max_depth)
X_train, X_val, y_train, y_val = ...
regressor_obj.fit(X_train, y_train)
y_pred = regressor_obj.predict(X_val)
return sklearn.metrics.mean_squared_error(y_val, y_pred)
study = optuna.create_study()
study.optimize(objective, n_trials=100)
26
Choosing an
algorithm
Algorithms that can consider
dependencies ※1:

● Multivariate TPE

● CMA-ES

● Gaussian Process based Bayesian
Optimization

※1 Univariate TPE, Optuna’s default algorithm does not take hyperparameter dependencies into account.
※2 Refer this figure from http://proceedings.mlr.press/v80/falkner18a/falkner18a-supp.pdf
def objective(trial):
x = trial.suggest_float('x', -10, 10)
y = trial.suggest_float('y', -10, 10)
v1 = (x-5)**2 + (y-5)**2
v2 = (x+5)**2 + (y+5)**2
return min(v1, v2)
27
CMA-ES
● One of the most promising methods
for black-box optimization ※1
● I implemented CMA-ES and its Optuna
sampler. See the blog post at Optuna official blog.
https://medium.com/optuna/introduction-to-cma-es-sampler-ee68194c8f88 

※1 N. Hansen, The CMA Evolution Strategy: A Tutorial. arXiv:1604.00772, 2016.

https://github.com/CyberAgentAILab/cmaes
Covariance Matrix Adaptation
Evolution Strategy
28
Warm Starting
CMA-ES
Transfer prior knowledge on similar HPO tasks



● proposed by Masahiro Nomura,

a member of CyberAgent AI Lab

● accepted at AAAI 2021

● supported from Optuna v2.6.0

# Get previous optimization history from SQLite3 DB
source_study = optuna.load_study(
storage="sqlite:///source-db.sqlite3",
study_name="..."
)
source_trials = source_study.trials
# Run hyperparameter optimizations
study = optuna.create_study(
sampler=CmaEsSampler(source_trials=source_trials),
storage="sqlite:///db.sqlite3",
study_name="..."
)
study.optimize(objective, n_trials=20)
https://github.com/optuna/optuna/releases/tag/v2.6.0

29
MLflow
Platform for managing ML lifecycles.
● Collect metrics, params, artifacts
● Versioning trained models.
# Connect to Experiment
mlflow.set_experiment("train_foo_model")
# Generate new MLflow Run in the Experiment
with mlflow.start_run(run_name="...") as run:
# Register trained model
model = train(...)
mv = mlflow.register_model(model_uri, model_name)
MlflowClient().transition_model_version_stage(
name=model_name, version=mv.version,
stage="Production"
)
# Save parameters (Key-Value style)
mlflow.log_param("auc", auc)
# Save metrics (Key-Value style)
mlflow.log_metric("logloss", log_loss)
# Save artifacts
mlflow.log_artifacts(dir_name)
Terms of MLflow
1. Run: A single execution
2. Experiment: Group of Runs
30
Exploit previous HPO results
Fetch latest data
ML Pipeline Optuna
Store history on
MLflow Artifact
Fetch latest data
ML Pipeline Optuna
Store history on
MLflow Artifact
1 weeks later
31
Integrate Optuna
with MLflow
1. Retrieve source trials for
Warm-Starting CMA-ES.

2. Evaluate a default hyperparameter.

3. Collect metrics of HPO.

4. Save Optuna trials(SQLite3 file) in
MLflow Artifacts.

mlflow.set_experiment("train_foo_model")
with mlflow.start_run(run_name="...") as run:
# Retrieve source trials for Warm-Starting CMA-ES
source_trials = ...
sampler = CmaEsSampler(source_trials=source_trials)
# Enqueue a default hyperparameter of XGBoost. This means that
# we can find better hyperparameters than default at least.
study.enqueue_trial({"alpha": 0.0, ...})
study.optimize(optuna_objective, n_trials=20)
# Collect metrics of HPO
mlflow.log_params(study.best_params)
mlflow.log_metric("default_trial_auc", study.trials[0].value)
mlflow.log_metric("best_trial_auc", study.best_value)
# Set tag to detect search space changes
mlflow.set_tag("optuna_objective_ver", optuna_objective_ver)
# Save Optuna trials(SQLite3 file) in MLflow Artifacts
mlflow.log_artifacts(dir_name)
32
Retrieve previous
executions
1. Get a Model information from
MLflow Model Registry
2. Get Run ID from Model
information
3. Get SQLite3 file from Artifacts
def load_optuna_source_storage():
client = MlflowClient()
try:
model_infos = client.get_latest_versions(
model_name, stages=["Production"])
except mlflow_exceptions.RestException as e:
if e.error_code == "RESOURCE_DOES_NOT_EXIST":
# 初回実行時は、ここに到達する。
return None
raise
if len(model_infos) == 0:
return None
run_id = model_infos[0].run_id
run = client.get_run(run_id)
if run.data.tags.get("optuna_obj_ver") != optuna_obj_ver:
return None
filenames = [a.path for a client.list_artifacts(run_id)]
if optuna_storage_filename not in filenames:
return None
client.download_artifacts(run_id, path=..., dst_path=...)
return RDBStorage(f"sqlite:///path/to/optuna.db")
33
Results
Univariate TPE Warm Starting CMA-ES
AUC
(Private)
The number of trials. The number of trials.
The evaluation value of XGBoost’s
default hyperparameter.
Search promising fields from an early phase by Warm Starting CMA-ES.
So that it can find better hyperparameters than default’s one.
AUC
(Private)
AI Voice Bot for phone calls
Green threads and WebSocket
3
35
AI Voice bot
Communicate with users via WebSocket
WebSocket
IP phone call
Our product
36
Challenge
"Our WebSocket server works when started from the
python command, but it does not work on Gunicorn,
so please fix it."
37
WSGI and Green Threads
38
Web Server Gateway Interface (PEP 3333)

● WSGI application is a callable object (e.g.
function)

● Difficult to implement Bidirectional Real-Time
Communication such as WebSocket ※1

● The thread that calls WSGI application cannot be
released until the communication is completed.

Limitations
※1 In Flask-sockets (created by Kenneth Reitz), pre-instantiate
WebSocket object is passed via WSGI environment and use it on Flask.

def application(env, start_response):
start_response('200 OK', [
('Content-type', 'text/plain; charset=utf-8')
])
return [b'Hello World']
39
Green Threads (Micro Threads)
Avoid to assign one OS native thread (threading.Thread) to each WebSocket
connection.
● The context switch of OS native thread is heavy
○ Dump the register values (thread states) to memory, load register
values of another thread from memory, and execute it.
● The stack size of OS native thread is large.
○ e.g. 2MB fixed stack
Something like a thread that runs in user land is required.
→ Flask-sockets uses Gevent-WebSocket under the hood.
40
The internal of Gevent-websocket
41
Gevent
import threading
import time
thread1 = threading.Thread(target=time.sleep, args=(5,))
thread2 = threading.Thread(target=time.sleep, args=(5,))
thread1.start()
thread2.start()
thread1.join()
thread2.join() Spawn two threads and
concurrently executed
42
Gevent
from gevent import monkey
monkey.patch_all()
import threading
import time
thread1 = threading.Thread(target=time.sleep, args=(5,))
thread2 = threading.Thread(target=time.sleep, args=(5,))
thread1.start()
thread2.start()
thread1.join()
thread2.join() By using Gevent, `time.sleep()` are
concurrently executed in one thread.
43
from gevent import monkey
monkey.patch_all()
import threading
import time
thread1 = threading.Thread(target=time.sleep, args=(5,))
# -> gevent.Greenlet(gevent.sleep, 5)
...
Gevent
Replace all blocking operation in standard libraries.
threading.Thread → gevent.Greenlet (Green-thread)
time.sleep → gevent.sleep
44
WebSocket
The internal of Gevent-websocket

● Apply Monkey patches after spawned
worker processes.

● Call WSGI application on
gevent.Greenlet(Green-thread)

from gevent.pool import Pool
from gevent import hub, monkey, socket, pywsgi
class GeventWorker(AsyncWorker):
def init_process(self):
# Apply Monkey patches after spawned a process
monkey.patch_all()
...
def run(self):
servers = []
for s in self.sockets:
# Create Greenlet(Green Threadds) pool
pool = Pool(self.worker_connections)
environ = base_environ(self.cfg)
environ.update({"wsgi.multithread": True})
server = self.server_class(
s, application=self.wsgi, ...
)
server.start()
servers.append(server)
gunicorn/workers/ggevent.py#L37-L38
If third party library (e.g. gRPC library)
implements blocking operation, Gevent cannot
replace it by default.
Conclusion
46
Conclusion
In this talk, I shared our knowledges around MLOps:
● Performance tuning of Prediction Server using Cython
● Build an memory-efficient Python-binding of C++ library (LIBFFM)
● Implement a transfer learning method for hyperparameter optimization
using Optuna and MLflow
● The internal of WSGI and Gevent-websocket
Acknowledgements / Thank You /
Questions
Masashi Shibata
CyberAgent, Inc.

More Related Content

What's hot

Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
Vengada Karthik Rangaraju
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-Service
Hiroshi Doyu
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
Dragos Sbîrlea
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
Kenta Oono
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Codemotion Tel Aviv
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Juan Fumero
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward
 
Open mp
Open mpOpen mp
Open mp
Gopi Saiteja
 
JCConf 2018 - Retrospect and Prospect of Java
JCConf 2018 - Retrospect and Prospect of JavaJCConf 2018 - Retrospect and Prospect of Java
JCConf 2018 - Retrospect and Prospect of Java
Joseph Kuo
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
Dhanashree Prasad
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
Akhila Prabhakaran
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
Ehsan Totoni
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
Prabhakaran V M
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
Oleg Nazarevych
 

What's hot (20)

Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-Service
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Open mp
Open mpOpen mp
Open mp
 
OpenMP
OpenMPOpenMP
OpenMP
 
JCConf 2018 - Retrospect and Prospect of Java
JCConf 2018 - Retrospect and Prospect of JavaJCConf 2018 - Retrospect and Prospect of Java
JCConf 2018 - Retrospect and Prospect of Java
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 

Similar to MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at PyCon APAC 2021

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
eugeniadean34240
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
Alison Chaiken
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
yaman dua
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong
 
PyHEP 2018: Tools to bind to Python
PyHEP 2018:  Tools to bind to PythonPyHEP 2018:  Tools to bind to Python
PyHEP 2018: Tools to bind to Python
Henry Schreiner
 
Compiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flatteningCompiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flattening
CAFxX
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Yusuke Izawa
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
GopalPatidar13
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
Preferred Networks
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
Seiya Tokui
 
Onnc intro
Onnc introOnnc intro
Onnc intro
Luba Tang
 
Getting Started Cpp
Getting Started CppGetting Started Cpp
Getting Started CppLong Cao
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Marina Kolpakova
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
Ralf Gommers
 

Similar to MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at PyCon APAC 2021 (20)

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
 
mpi4py.pdf
mpi4py.pdfmpi4py.pdf
mpi4py.pdf
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
PyHEP 2018: Tools to bind to Python
PyHEP 2018:  Tools to bind to PythonPyHEP 2018:  Tools to bind to Python
PyHEP 2018: Tools to bind to Python
 
Compiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flatteningCompiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flattening
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Getting Started Cpp
Getting Started CppGetting Started Cpp
Getting Started Cpp
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 

More from Masashi Shibata

実践Djangoの読み方 - みんなのPython勉強会 #72
実践Djangoの読み方 - みんなのPython勉強会 #72実践Djangoの読み方 - みんなのPython勉強会 #72
実践Djangoの読み方 - みんなのPython勉強会 #72
Masashi Shibata
 
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
Masashi Shibata
 
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
Masashi Shibata
 
Implementing sobol's quasirandom sequence generator
Implementing sobol's quasirandom sequence generatorImplementing sobol's quasirandom sequence generator
Implementing sobol's quasirandom sequence generator
Masashi Shibata
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会
Masashi Shibata
 
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnGoptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Masashi Shibata
 
Djangoアプリのデプロイに関するプラクティス / Deploy django application
Djangoアプリのデプロイに関するプラクティス / Deploy django applicationDjangoアプリのデプロイに関するプラクティス / Deploy django application
Djangoアプリのデプロイに関するプラクティス / Deploy django application
Masashi Shibata
 
Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django REST Framework における API 実装プラクティス | PyCon JP 2018Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django REST Framework における API 実装プラクティス | PyCon JP 2018
Masashi Shibata
 
Django の認証処理実装パターン / Django Authentication Patterns
Django の認証処理実装パターン / Django Authentication PatternsDjango の認証処理実装パターン / Django Authentication Patterns
Django の認証処理実装パターン / Django Authentication Patterns
Masashi Shibata
 
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMPRTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
Masashi Shibata
 
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
Masashi Shibata
 
Golangにおける端末制御 リッチなターミナルUIの実現方法
Golangにおける端末制御 リッチなターミナルUIの実現方法Golangにおける端末制御 リッチなターミナルUIの実現方法
Golangにおける端末制御 リッチなターミナルUIの実現方法
Masashi Shibata
 
How to develop a rich terminal UI application
How to develop a rich terminal UI applicationHow to develop a rich terminal UI application
How to develop a rich terminal UI application
Masashi Shibata
 
Introduction of Feedy
Introduction of FeedyIntroduction of Feedy
Introduction of Feedy
Masashi Shibata
 
Webフレームワークを作ってる話 #osakapy
Webフレームワークを作ってる話 #osakapyWebフレームワークを作ってる話 #osakapy
Webフレームワークを作ってる話 #osakapy
Masashi Shibata
 
Pythonのすすめ
PythonのすすめPythonのすすめ
Pythonのすすめ
Masashi Shibata
 
pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話
Masashi Shibata
 
Pythonistaのためのデータ分析入門 - C4K Meetup #3
Pythonistaのためのデータ分析入門 - C4K Meetup #3Pythonistaのためのデータ分析入門 - C4K Meetup #3
Pythonistaのためのデータ分析入門 - C4K Meetup #3
Masashi Shibata
 
テスト駆動開発入門 - C4K Meetup#2
テスト駆動開発入門 - C4K Meetup#2テスト駆動開発入門 - C4K Meetup#2
テスト駆動開発入門 - C4K Meetup#2
Masashi Shibata
 
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
Masashi Shibata
 

More from Masashi Shibata (20)

実践Djangoの読み方 - みんなのPython勉強会 #72
実践Djangoの読み方 - みんなのPython勉強会 #72実践Djangoの読み方 - みんなのPython勉強会 #72
実践Djangoの読み方 - みんなのPython勉強会 #72
 
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
 
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
 
Implementing sobol's quasirandom sequence generator
Implementing sobol's quasirandom sequence generatorImplementing sobol's quasirandom sequence generator
Implementing sobol's quasirandom sequence generator
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会
 
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnGoptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
 
Djangoアプリのデプロイに関するプラクティス / Deploy django application
Djangoアプリのデプロイに関するプラクティス / Deploy django applicationDjangoアプリのデプロイに関するプラクティス / Deploy django application
Djangoアプリのデプロイに関するプラクティス / Deploy django application
 
Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django REST Framework における API 実装プラクティス | PyCon JP 2018Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django REST Framework における API 実装プラクティス | PyCon JP 2018
 
Django の認証処理実装パターン / Django Authentication Patterns
Django の認証処理実装パターン / Django Authentication PatternsDjango の認証処理実装パターン / Django Authentication Patterns
Django の認証処理実装パターン / Django Authentication Patterns
 
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMPRTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
 
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
 
Golangにおける端末制御 リッチなターミナルUIの実現方法
Golangにおける端末制御 リッチなターミナルUIの実現方法Golangにおける端末制御 リッチなターミナルUIの実現方法
Golangにおける端末制御 リッチなターミナルUIの実現方法
 
How to develop a rich terminal UI application
How to develop a rich terminal UI applicationHow to develop a rich terminal UI application
How to develop a rich terminal UI application
 
Introduction of Feedy
Introduction of FeedyIntroduction of Feedy
Introduction of Feedy
 
Webフレームワークを作ってる話 #osakapy
Webフレームワークを作ってる話 #osakapyWebフレームワークを作ってる話 #osakapy
Webフレームワークを作ってる話 #osakapy
 
Pythonのすすめ
PythonのすすめPythonのすすめ
Pythonのすすめ
 
pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話
 
Pythonistaのためのデータ分析入門 - C4K Meetup #3
Pythonistaのためのデータ分析入門 - C4K Meetup #3Pythonistaのためのデータ分析入門 - C4K Meetup #3
Pythonistaのためのデータ分析入門 - C4K Meetup #3
 
テスト駆動開発入門 - C4K Meetup#2
テスト駆動開発入門 - C4K Meetup#2テスト駆動開発入門 - C4K Meetup#2
テスト駆動開発入門 - C4K Meetup#2
 
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
Introduction of PyCon JP 2015 at PyCon APAC/Taiwan 2015
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at PyCon APAC 2021

  • 1. Masashi Shibata MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems
  • 2. 2 Three MLOps Case Studies Case studies to apply ML technologies into our products. 1. How to build a memory-efficient Python binding using Cython and Numpy C-API 2. Implement a transfer learning method for Hyperparameter Optimization 3. Fix complex bug of WebSocket server Understanding Green threads and how WSGI works
  • 3. Accelerate a prediction server and write our own memory-efficient Python binding 1
  • 4. Dynalyst
 An advertisement product (DSP)
 We use FFM for CVR prediction

  • 5. 5 Field-aware Factorization Machines https://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf We use FFM for CVR prediction. ● LIBFFM is written in C++ and provide a command line interface. ● We added some new features to improve the performance. ● Repository: https://github.com/ycjuan/libffm
  • 6. 6 Feedback Shift Correction ● We propose importance weighting approach to address the feedback shift. ● According to an online experiment, our method improves the sales 30%. ● We added some modifications in the loss function of LIBFFM. https://dl.acm.org/doi/10.1145/3366423.3380032 We need to implement our own Python binding for LIBFFM A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback Click Conversion Train ML model Time Some positive instances at the training period are labeled as negative.
  • 7. 7 Performance Tuning High Performance Prediction Server ● Throughput: a few hundred thousand rps ● Latency: ~ 100ms
  • 8. 8 Challenges ML Pipeline ● Implement our own Python-binding of LIBFFM(C++) High Performance Prediction Server ● Throughput: a few hundred thousand rps ● Latency: ~ 100ms
  • 11. Cython
 In [1]: %load_ext cython In [2]: def py_fibonacci(n): ...: a, b = 0.0, 1.0 ...: for i in range(n): ...: a, b = a + b, a ...: return a In [2]: %%cython ...: def cy_fibonacci(int n): ...: cdef int i ...: cdef double a = 0.0, b = 1.0 ...: for i in range(n): ...: a, b = a + b, a ...: return a In [4]: %timeit py_fibonacci(10) 582 ns ± 3.72 ns per loop (...) In [5]: %timeit cy_fibonacci(10) 43.4 ns ± 0.14 ns per loop (...) An optimising static compiler for both the Python and Cython

  • 12. 12 Releasing GIL GIL (Global Interpreter Lock) ● Only one native thread that holds GIL can execute Python bytecode ● Even if using multi-threads, it isn’t executed in parallel at the processor core level ● GIL can be explicitly released when calling pure C function. ※1 def fibonacci(kwargs): cdef double a cdef int n n = kwargs.get('n') with nogil: a = fibonacci_nogil(n) return a cdef double fibonacci_nogil(int n) nogil: ... Yellow lines of code interacts with Python/C API Pure C function ※1 Calling PY_BEGIN_ALLOW_THREAD macro and Py_END_ALLOW_THREADS macro in C-level.
  • 13. 13 Cython Compiler Directives ● cdivision: ZeroDivisionError Exception ● boundscheck: IndexError Exception ● wraparound: Negative Indexing if size is zero, Python must throw ZeroDivisionError exception.
  • 14. 14 Results The latency and throughput is improved by Cython ● The time of FFM prediction is 10% of the original code. ● Latency is 60% than before ● It can receive 1.35x requests per second than before
  • 15. 15 Build a memory-efficient Python binding using Cython and NumPy C-API
  • 16. 16 Wrapping LIBFFM 1. Declare C++ functions and structs by cdef extern from keyword 2. Initialize C++ structs by PyMem_Malloc※1 3. Calling C++ functions 4. Release a memory by PyMem_Free # cython: language_level=3 from cpython.mem cimport PyMem_Malloc, PyMem_Free cdef extern from "ffm.h" namespace "ffm" nogil: struct ffm_problem: ffm_data* data ffm_model *ffm_train_with_validation(...) cdef ffm_problem* make_ffm_prob(...): cdef ffm_problem* prob prob = <ffm_problem *> PyMem_Malloc(sizeof(ffm_problem)) if prob is NULL: raise MemoryError("Insufficient memory for prob") prob.data = ... return prob def train(...): cdef ffm_problem* tr_ptr = make_ffm_prob(...) try: tr_ptr = make_ffm_prob(tr[0], tr[1]) model_ptr = ffm_train_with_validation(tr_ptr, ...) finally: free_ffm_prob(tr_ptr) return weights, best_iteration ※1 from libc.stdlib cimport malloc can also be used, but PyMem_Malloc allocates memory area from the CPython heap, so the number of system call issuance can be reduced. It is more efficient to allocate a particularly small area.
  • 17. 17 C++ (LIBFFM) Cython Memory Management Allocate memory for Weights ptr = malloc(n*m*k*sizeof(float)) Train FFM Model model = ffm.train() Call C++ Function ffm_train_with_validation() Python Release memory free(ptr) Release a Python object del model Wrap weights array on NumPy (NumPy C-APIを利用) Instantiate Python object model = ffm.train()
  • 18. 18 Reference Counting ● CPython’s memory management mechanism is based on the reference counting. ● Release the memory area of the C++ array at the same time that the Numpy array is destroyed. ● Note that the reference count is displayed as 2 because it is incremented when calling sys.getrefcount() import ffm import sys def main(): train_data = ffm.Dataset(...) valid_data = ffm.Dataset(...) # ‘model._weights’ is C++ weights array # We need to deallocate it in conjunction # with Python's memory management model = ffm.train(train_data, valid_data) print(sys.getrefcount(model._weights)) # -> 2 del model # -> ‘model.weights’ is deallocated. print("Done") # -> Done
  • 19. 19 NumPy C-API ● Release a memory buffer of C++ array by libc.stdlib.free() ● PyArray_SimpleNewFromData: Wrap C-contiguous array with NumPy by specifying array pointer, shape and type information. ● PyArray_SetBaseObject: Set an base object that holds the content of NumPy Array(model_ptr) cimport numpy as cnp from libc.stdlib cimport free cdef class _weights_finalizer: cdef void *_data def __dealloc__(self): if self._data is not NULL: free(self._data) cdef object _train(...): cdef: cnp.ndarray arr _weights_finalizer f = _weights_finalizer() model_ptr = ffm_train_with_validation(...) shape = (model_ptr.n, model_ptr.m, model_ptr.k) # Wrap FFM weights(model_ptr.W) with NumPy Array arr = cnp.PyArray_SimpleNewFromData( 3, shape, cnp.NPY_FLOAT32, model_ptr.W) f._data = <void*> model_ptr.W cnp.set_array_base(arr, f) free(model_ptr) return arr, best_iteration
  • 20. 20 ● 機械学習モデルの精度が売上に直結 ○ 因果推論の手法を使った遅れコンバージョン問題への対処 ○ データコピーなしで安全に配列のメモリー領域を管理 ● 大量のトラフィック、厳しいレイテンシー要件 (100ms以内) ○ Cythonを使った推論処理の高速化 ○ スループット1.35倍、レイテンシー60% Summary
  • 21. Implement Transfer Learning Method for Hyperparameter Optimization 2
  • 22. 22 Situation Fetch latest training data ML Pipeline Run HPO Best Hyperparameters Fetch latest training data ML Pipeline Run HPO Best hyperparameters Our ML pipeline triggered weekly and optimize hyperparameters with new dataset. 1 week later
  • 23. 23 Challenges Fetch latest training data ML Pipeline Run HPO HPO results Fetch latest training data ML Pipeline Run HPO HPO results How can we exploit previous optimization history? 1 week later
  • 25. 25 Optuna Python library for hyperparameter optimization. ● Define-by-Run style API ● Various state-of-the-art algorithms support ● Pluggable storage backend ● Easy distributed optimization ● Web Dashboard https://github.com/optuna/optuna import optuna def objective(trial): regressor_name = trial.suggest_categorical( 'classifier', ['SVR', 'RandomForest'] ) if regressor_name == 'SVR': svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True) regressor_obj = sklearn.svm.SVR(C=svr_c) else: rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32) regressor_obj = RandomForestRegressor(max_depth=rf_max_depth) X_train, X_val, y_train, y_val = ... regressor_obj.fit(X_train, y_train) y_pred = regressor_obj.predict(X_val) return sklearn.metrics.mean_squared_error(y_val, y_pred) study = optuna.create_study() study.optimize(objective, n_trials=100)
  • 26. 26 Choosing an algorithm Algorithms that can consider dependencies ※1:
 ● Multivariate TPE
 ● CMA-ES
 ● Gaussian Process based Bayesian Optimization
 ※1 Univariate TPE, Optuna’s default algorithm does not take hyperparameter dependencies into account. ※2 Refer this figure from http://proceedings.mlr.press/v80/falkner18a/falkner18a-supp.pdf def objective(trial): x = trial.suggest_float('x', -10, 10) y = trial.suggest_float('y', -10, 10) v1 = (x-5)**2 + (y-5)**2 v2 = (x+5)**2 + (y+5)**2 return min(v1, v2)
  • 27. 27 CMA-ES ● One of the most promising methods for black-box optimization ※1 ● I implemented CMA-ES and its Optuna sampler. See the blog post at Optuna official blog. https://medium.com/optuna/introduction-to-cma-es-sampler-ee68194c8f88 
 ※1 N. Hansen, The CMA Evolution Strategy: A Tutorial. arXiv:1604.00772, 2016.
 https://github.com/CyberAgentAILab/cmaes Covariance Matrix Adaptation Evolution Strategy
  • 28. 28 Warm Starting CMA-ES Transfer prior knowledge on similar HPO tasks
 
 ● proposed by Masahiro Nomura,
 a member of CyberAgent AI Lab
 ● accepted at AAAI 2021
 ● supported from Optuna v2.6.0
 # Get previous optimization history from SQLite3 DB source_study = optuna.load_study( storage="sqlite:///source-db.sqlite3", study_name="..." ) source_trials = source_study.trials # Run hyperparameter optimizations study = optuna.create_study( sampler=CmaEsSampler(source_trials=source_trials), storage="sqlite:///db.sqlite3", study_name="..." ) study.optimize(objective, n_trials=20) https://github.com/optuna/optuna/releases/tag/v2.6.0

  • 29. 29 MLflow Platform for managing ML lifecycles. ● Collect metrics, params, artifacts ● Versioning trained models. # Connect to Experiment mlflow.set_experiment("train_foo_model") # Generate new MLflow Run in the Experiment with mlflow.start_run(run_name="...") as run: # Register trained model model = train(...) mv = mlflow.register_model(model_uri, model_name) MlflowClient().transition_model_version_stage( name=model_name, version=mv.version, stage="Production" ) # Save parameters (Key-Value style) mlflow.log_param("auc", auc) # Save metrics (Key-Value style) mlflow.log_metric("logloss", log_loss) # Save artifacts mlflow.log_artifacts(dir_name) Terms of MLflow 1. Run: A single execution 2. Experiment: Group of Runs
  • 30. 30 Exploit previous HPO results Fetch latest data ML Pipeline Optuna Store history on MLflow Artifact Fetch latest data ML Pipeline Optuna Store history on MLflow Artifact 1 weeks later
  • 31. 31 Integrate Optuna with MLflow 1. Retrieve source trials for Warm-Starting CMA-ES.
 2. Evaluate a default hyperparameter.
 3. Collect metrics of HPO.
 4. Save Optuna trials(SQLite3 file) in MLflow Artifacts.
 mlflow.set_experiment("train_foo_model") with mlflow.start_run(run_name="...") as run: # Retrieve source trials for Warm-Starting CMA-ES source_trials = ... sampler = CmaEsSampler(source_trials=source_trials) # Enqueue a default hyperparameter of XGBoost. This means that # we can find better hyperparameters than default at least. study.enqueue_trial({"alpha": 0.0, ...}) study.optimize(optuna_objective, n_trials=20) # Collect metrics of HPO mlflow.log_params(study.best_params) mlflow.log_metric("default_trial_auc", study.trials[0].value) mlflow.log_metric("best_trial_auc", study.best_value) # Set tag to detect search space changes mlflow.set_tag("optuna_objective_ver", optuna_objective_ver) # Save Optuna trials(SQLite3 file) in MLflow Artifacts mlflow.log_artifacts(dir_name)
  • 32. 32 Retrieve previous executions 1. Get a Model information from MLflow Model Registry 2. Get Run ID from Model information 3. Get SQLite3 file from Artifacts def load_optuna_source_storage(): client = MlflowClient() try: model_infos = client.get_latest_versions( model_name, stages=["Production"]) except mlflow_exceptions.RestException as e: if e.error_code == "RESOURCE_DOES_NOT_EXIST": # 初回実行時は、ここに到達する。 return None raise if len(model_infos) == 0: return None run_id = model_infos[0].run_id run = client.get_run(run_id) if run.data.tags.get("optuna_obj_ver") != optuna_obj_ver: return None filenames = [a.path for a client.list_artifacts(run_id)] if optuna_storage_filename not in filenames: return None client.download_artifacts(run_id, path=..., dst_path=...) return RDBStorage(f"sqlite:///path/to/optuna.db")
  • 33. 33 Results Univariate TPE Warm Starting CMA-ES AUC (Private) The number of trials. The number of trials. The evaluation value of XGBoost’s default hyperparameter. Search promising fields from an early phase by Warm Starting CMA-ES. So that it can find better hyperparameters than default’s one. AUC (Private)
  • 34. AI Voice Bot for phone calls Green threads and WebSocket 3
  • 35. 35 AI Voice bot Communicate with users via WebSocket WebSocket IP phone call Our product
  • 36. 36 Challenge "Our WebSocket server works when started from the python command, but it does not work on Gunicorn, so please fix it."
  • 37. 37 WSGI and Green Threads
  • 38. 38 Web Server Gateway Interface (PEP 3333)
 ● WSGI application is a callable object (e.g. function)
 ● Difficult to implement Bidirectional Real-Time Communication such as WebSocket ※1
 ● The thread that calls WSGI application cannot be released until the communication is completed.
 Limitations ※1 In Flask-sockets (created by Kenneth Reitz), pre-instantiate WebSocket object is passed via WSGI environment and use it on Flask.
 def application(env, start_response): start_response('200 OK', [ ('Content-type', 'text/plain; charset=utf-8') ]) return [b'Hello World']
  • 39. 39 Green Threads (Micro Threads) Avoid to assign one OS native thread (threading.Thread) to each WebSocket connection. ● The context switch of OS native thread is heavy ○ Dump the register values (thread states) to memory, load register values of another thread from memory, and execute it. ● The stack size of OS native thread is large. ○ e.g. 2MB fixed stack Something like a thread that runs in user land is required. → Flask-sockets uses Gevent-WebSocket under the hood.
  • 40. 40 The internal of Gevent-websocket
  • 41. 41 Gevent import threading import time thread1 = threading.Thread(target=time.sleep, args=(5,)) thread2 = threading.Thread(target=time.sleep, args=(5,)) thread1.start() thread2.start() thread1.join() thread2.join() Spawn two threads and concurrently executed
  • 42. 42 Gevent from gevent import monkey monkey.patch_all() import threading import time thread1 = threading.Thread(target=time.sleep, args=(5,)) thread2 = threading.Thread(target=time.sleep, args=(5,)) thread1.start() thread2.start() thread1.join() thread2.join() By using Gevent, `time.sleep()` are concurrently executed in one thread.
  • 43. 43 from gevent import monkey monkey.patch_all() import threading import time thread1 = threading.Thread(target=time.sleep, args=(5,)) # -> gevent.Greenlet(gevent.sleep, 5) ... Gevent Replace all blocking operation in standard libraries. threading.Thread → gevent.Greenlet (Green-thread) time.sleep → gevent.sleep
  • 44. 44 WebSocket The internal of Gevent-websocket
 ● Apply Monkey patches after spawned worker processes.
 ● Call WSGI application on gevent.Greenlet(Green-thread)
 from gevent.pool import Pool from gevent import hub, monkey, socket, pywsgi class GeventWorker(AsyncWorker): def init_process(self): # Apply Monkey patches after spawned a process monkey.patch_all() ... def run(self): servers = [] for s in self.sockets: # Create Greenlet(Green Threadds) pool pool = Pool(self.worker_connections) environ = base_environ(self.cfg) environ.update({"wsgi.multithread": True}) server = self.server_class( s, application=self.wsgi, ... ) server.start() servers.append(server) gunicorn/workers/ggevent.py#L37-L38 If third party library (e.g. gRPC library) implements blocking operation, Gevent cannot replace it by default.
  • 46. 46 Conclusion In this talk, I shared our knowledges around MLOps: ● Performance tuning of Prediction Server using Cython ● Build an memory-efficient Python-binding of C++ library (LIBFFM) ● Implement a transfer learning method for hyperparameter optimization using Optuna and MLflow ● The internal of WSGI and Gevent-websocket
  • 47. Acknowledgements / Thank You / Questions Masashi Shibata CyberAgent, Inc.