Machine Learning
Platform in LINE
Fukuoka
Kengo Tateishi, LFK Data Labs Team
LINE Fukuoka DataLabs.
From Fukuoka, Japan.
立石 賢吾
TATEISHI KENGO
@tkengo
@tkengo
• Introduction to Data Labs
• Machine learning lifecycle
• Platform for machine learning
Agenda
Introduction to Data Labs
• Support our various services from the view point “Data”
• High level analysis for the services.
• Machine learning model development.
• Hadoop clusters as a Data Lake storing huge data.
What is Data Labs?
Data
Service
Service
Service
Service
Service
Service
Data
Data
What is Data Labs?
Data
Scientist
ML
Engineer
• 2 roles in LFK Data Labs Team.
What is Data Labs?
Machine learning
lifecycle
What is necessary for machine learning?
Machine learning lifecycle
• Build a model to predict something.
• Learn a new algorithm to improve accuracy.
• Read papers to get more knowledge.
• Understand what happens inside of the machine learning model.
• And so on…
What is necessary for machine learning?
Machine learning lifecycle
• Build a model to predict something.
• Learn a new algorithm to improve accuracy.
• Read papers to get more knowledge.
• Understand what happens inside of the machine learning model.
• And so on…
What is necessary for machine learning?
ALL STEPS ARE IMPORTANT
Machine learning lifecycle
• Build a model to predict something.
• Learn a new algorithm to improve accuracy.
• Read papers to get more knowledge.
• Understand what happens inside of the machine learning model.
• And so on…
What is necessary for machine learning?
However…
Machine learning lifecycle
Machine learning lifecycle
There are so many other things we have to do.
• Collect data used for model training.
• Prepare a datastore to store the collected data.
• Do pre-processing such as cleaning, normalization, completion, and so on.
• Create a training environment (Distributed? Single?)
• Expose our prediction model as an API.
• Build an infrastructure to serve APIs.
• Monitor accuracy online and offline regularly.
• And others…
Machine learning lifecycle
There are so many other things we have to do.
Feedbac
k
Basic Flow Parameter
Tuning
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Model
Training
Machine learning lifecycle
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
• Everyone tends to learn these 2 parts.
• But other parts are still important.
Basic Flow
Feedbac
k
Basic Flow
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Parameter
Tuning
Model
Training
Machine learning lifecycle
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
• Everyone tends to learn these 2 parts.
• But other parts are still important.
• In addition…
Model
Training
Parameter
Tuning
Basic Flow
Prepare a storage
Code review
Test
Prepare a
dashboard
CI
Prepare servers
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
Basic Flow
Feedbac
k
Basic Flow
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Parameter
Tuning
Model
Training
Machine learning lifecycle
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
• Everyone tends to learn these 2 parts.
• But other parts are still important.
• In addition…
Model
Training
Parameter
Tuning
Basic Flow
• We need to iterate this process as a
lifecycle continuously.
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
Model
Training
Parameter
Tuning
Basic Flow
Prepare a storage
Code review
Test
Prepare a
dashboard
CI
Prepare servers
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Feedback
Basic Flow
Feedbac
k
Basic Flow
Data
Collection
Data
Storing
Pre-
Processing
API
Development
Deployment
Accuracy
Monitoring
Parameter
Tuning
Model
Training
Machine learning lifecycle
Platform for machine
learning
As you may know.
To achieve rapid continuous development in machine learning,
we need an environment for machine learning.
Platform for machine learning
HDFS
Sqoop
Fluentd
Flask
App
Hive
PySpark
Training
Servers
Our Platform
Application side
Code repository (Github)
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Importing
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Storing HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Data understanding
Experimental modeling
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Storing
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Developing something great…
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Data understanding
Experimental modeling
Storing
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Deployment Deployment Deployment
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Data understanding
Experimental modeling
Storing
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
ETL
Training
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
Deployment Deployment Deployment
Data understanding
Experimental modeling
Storing
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
App
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Uploading
Trained Model
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
ETL
Training
Deployment Deployment Deployment
Data understanding
Experimental modeling
Storing
Importing
Code repository (Github)
HDFS
Sqoop
Fluentd
Jupyter
Notebook
sklearn
Tensorflow
Flask
Hive
PySpark
Kubernetes
Cluster
Training
Servers
Our Platform
Application side
Pre-
Processing
(ETL)
Object
Storage
Drone as a CI
Serving our model
HDFS
Sqoop
Fluentd
Flask
Hive
PySpark
Training
Servers
Object
Storage
Drone as a CI
Jupyter
Notebook
Pre-
Processing
(ETL)
sklearn
Tensorflow
Kubernetes
Cluster
ETL
Training
Deployment Deployment Deployment
Data understanding
Experimental modeling
Storing
Importing
Uploading
Trained Model
Code repository (Github)
App
Summary
What is necessary for machine learning?
Summary
• Of course, we need knowledge about training algorithms,
parameter tuning, and so on.
• In addition, the platform for machine learning is also important.
• It makes our development very rapid, robust, and easy.
What is necessary for machine learning?
Summary
THANK YOU

Machine Learning Platform in LINE Fukuoka