ETL & Machine Learning
@ Kudo
FemaleGeek & PHP Indonesia
Kudoplex 2, 18 Juni 2016
• Three years experienced software
engineer in Java, Android and Python
• Have big interest about Big Data and
Data Science
• Currently working as a Data Engineer
at Kudo
Luthfi Hariz
Email : luthfi@kudo.co.id
Linkedin : https://id.linkedin.com/in/luthfihariz
Data Analyst
Predictive Analytics, Reporting
Analysis, Fraud Analysis
Data
Engineer
ETL, Data Infrastructure,
Machine Learning at Scale
Why Kudo need Data team ?
● Our data is getting higher,
especially in the variety
● Partnered with many vendor
with different characteristic data
● Unique user (agent) behavior,
not a typical e-commerce user
● Specific user (agent) profile
ETL (Extract Transform Load) Machine Learning
ETL (Extract Transform Load)
We need ”analytics friendly” database that is single source of all
data in Kudo
python package : petl, pandas
Extract Transfor
m
Load
operational
& logs
analytical DB
Business Intelligence
Tableau UI
ETL is all about jobs that run periodically
we need to make sure all jobs run “pretty smooth..”
Airflow
a platform to programmaticaly author, schedule and monitor our
data pipelines
Support :
• Retries
• Complex
Dependency (DAG)
• Python Operator
• Email on Error/Retry
• Exchange Message
between Task
• Web UI
• etc…
Airflow
Airflow
DAG
(Directed Acyclic Graph)
Task A Task B
Task C
Task D
Task E
Email OperatorPython Operator
Retry 5 times
HTTP Operator
SQL Operator
Bash Operator
Airflow Web UI
Airflow Web UI
Machine Learning
Product Classification
Category
Product
Process
“360 Degree
Rotating Quiet
Usb Fan”
Elektronik
0.8
Fesyen
0.15
Perhiasan & Emas
0.05
Model
Product Name
Train Data
0 1 0 1 0 0 0
1 0 1 0 0 0 0
1 1 1 0 1 0 0
1 0 1 1 1 0 0
1 1 0 0 1 1 0
Naive
Bayes
Recommendation Engine
Item to Item Similiarity, Collaborative Filtering
User Profilling
Labelling Kudo Agents
Thank You!Luthfi Hariz - luthfi@kudo.co.id
Pssst..we are hiring!

ETL & Machine Learning