StreamProcessing.be
Brussels, May 27, 2015
Theme: hosted solutions for
Stream Processing and ML
#StreamProcessingBe
Agenda
15’ Intro (Peter)
35’ Azure Stream Analytics and ML (Jan)
5’ short break
35’ Google Cloud DataFlow (Alex)
35’ Amazon AWS ML (Nils)
Many thanks to
Microsoft Belux
Jan, Alex, Nils
@maasg, @svendfx
BigData.be, DataScience.be, AWS Belgium
you !
Next StreamProcessing.be Meetup
Thu, June 25, 2015, near Mechelen station
(looking for a location +/- 50 ppl)
● Introduction to Apache Kafka (Svend)
● Akka Streams and Kinesis (Peter)
● Understanding Spark Streaming (Gerard)
whoami : Peter Vandenabeele @peter_v
All Things Data (my consultancy)
current clients:
Real Impact Analytics
Telecom Analytics (emerging markets)
“Green” start-up (stealth mode)
IoT project (see next Meetup)
Why ?
(before anything else)
Why Stream Processing ?
(a personal view)
E.g. collaborative research (2013)
UniProt
(180 GB)
monthly update
consumer
update cost
≅
freq (1/month)
*
size (180 GB)
*
# consumers (5)
fetch + load + index
FULL data set
solution: Stream of updates (CDC)
Users table
continuous
updates
consumer
update cost
≅
Rate of Change
(10% / month)
*
size * # consumers
fetch + load
ONLY updates
stream
3M entries
300k updates/month
(independent of consumer update frequency)
Why Stream Processing ?
Real-time
*
Big Data
*
Distributed processing
(“many collaborators”)
Stream becomes the “master data”
● see stream as the master data (not the DB)
● allows real-time, distributed processing
● allows unification between:
○ operational teams
○ analytics teams
○ security, ...
● e.g. Kafka at LinkedIn (Kappa architecture)
Kafka (LinkedIn) : Martin Kleppmann
source : Martin Kleppmann
at strata Hadoop London
Kafka (LinkedIn) : Jay Kreps
source: Jay Kreps
on slideshare
“I ♥ Log”
Real-time Data and Apache Kafka
Why Stream Processing ?
Peter : real-time * (big data * distributed proc.)
Nathan Marz : recovery from human error + ...
Jay Kreps : organizational scalability + ...
Martin Kleppmann : data agility + …
YOU : ??? let’s discuss at beer ...
Speakers for today
● Jan Tielens (Microsoft) @jantielens
● Alex Van Boxel (Vente-Exclusive.com)
@alexvb
● Nils De Moor (Woorank) @ndemoor

Intro stream processing.be meetup #1

  • 1.
    StreamProcessing.be Brussels, May 27,2015 Theme: hosted solutions for Stream Processing and ML #StreamProcessingBe
  • 2.
    Agenda 15’ Intro (Peter) 35’Azure Stream Analytics and ML (Jan) 5’ short break 35’ Google Cloud DataFlow (Alex) 35’ Amazon AWS ML (Nils)
  • 3.
    Many thanks to MicrosoftBelux Jan, Alex, Nils @maasg, @svendfx BigData.be, DataScience.be, AWS Belgium you !
  • 4.
    Next StreamProcessing.be Meetup Thu,June 25, 2015, near Mechelen station (looking for a location +/- 50 ppl) ● Introduction to Apache Kafka (Svend) ● Akka Streams and Kinesis (Peter) ● Understanding Spark Streaming (Gerard)
  • 5.
    whoami : PeterVandenabeele @peter_v All Things Data (my consultancy) current clients: Real Impact Analytics Telecom Analytics (emerging markets) “Green” start-up (stealth mode) IoT project (see next Meetup)
  • 6.
  • 7.
    Why Stream Processing? (a personal view)
  • 8.
    E.g. collaborative research(2013) UniProt (180 GB) monthly update consumer update cost ≅ freq (1/month) * size (180 GB) * # consumers (5) fetch + load + index FULL data set
  • 9.
    solution: Stream ofupdates (CDC) Users table continuous updates consumer update cost ≅ Rate of Change (10% / month) * size * # consumers fetch + load ONLY updates stream 3M entries 300k updates/month (independent of consumer update frequency)
  • 10.
    Why Stream Processing? Real-time * Big Data * Distributed processing (“many collaborators”)
  • 11.
    Stream becomes the“master data” ● see stream as the master data (not the DB) ● allows real-time, distributed processing ● allows unification between: ○ operational teams ○ analytics teams ○ security, ... ● e.g. Kafka at LinkedIn (Kappa architecture)
  • 12.
    Kafka (LinkedIn) :Martin Kleppmann source : Martin Kleppmann at strata Hadoop London
  • 13.
    Kafka (LinkedIn) :Jay Kreps source: Jay Kreps on slideshare “I ♥ Log” Real-time Data and Apache Kafka
  • 14.
    Why Stream Processing? Peter : real-time * (big data * distributed proc.) Nathan Marz : recovery from human error + ... Jay Kreps : organizational scalability + ... Martin Kleppmann : data agility + … YOU : ??? let’s discuss at beer ...
  • 15.
    Speakers for today ●Jan Tielens (Microsoft) @jantielens ● Alex Van Boxel (Vente-Exclusive.com) @alexvb ● Nils De Moor (Woorank) @ndemoor