BUILDING A
DISTRIBUTED
MACHINE LEARNING AT SCALE
BACKGROUND
DATA
▸Data is everywhere
▸Data, unapplied, is useless
▸How can we turn high volume & velocity data into value?
BACKGROUND
PIPELINE
▸Process the data continuously
▸Apply several processing steps
COLLECT MODEL DEPLOY INTEGRA
TE
SOLUTION
ANALYSE THE STOCK MARKET
YAHOO.C
OM
YAHOO.C
OM
(PREFETCHED)
COLLECTO
R
MESSAGE
BROKER
STREAMIN
G STORAGE
MODEL
MACHINE
LEARNING
MLlibWEBSERVI
CE
USER /
CLIENTS
DEMO
DEMO (FINGERS CROSSED)
DONE
QUESTIONS?
▸?

Building a Distributed Data Pipeline