Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Distributed Data Pipeline

526 views

Published on

Spark, Akka, MLlib, Kafka, Spray
Presentation & demo for http://www.daysofcode.nl/ @daysofcode

Published in: Software
  • Be the first to comment

  • Be the first to like this

Building a Distributed Data Pipeline

  1. 1. BUILDING A DISTRIBUTED MACHINE LEARNING AT SCALE
  2. 2. BACKGROUND DATA ▸Data is everywhere ▸Data, unapplied, is useless ▸How can we turn high volume & velocity data into value?
  3. 3. BACKGROUND PIPELINE ▸Process the data continuously ▸Apply several processing steps COLLECT MODEL DEPLOY INTEGRA TE
  4. 4. SOLUTION ANALYSE THE STOCK MARKET YAHOO.C OM YAHOO.C OM (PREFETCHED) COLLECTO R MESSAGE BROKER STREAMIN G STORAGE MODEL MACHINE LEARNING MLlibWEBSERVI CE USER / CLIENTS
  5. 5. DEMO DEMO (FINGERS CROSSED)
  6. 6. DONE QUESTIONS? ▸?

×