SlideShare a Scribd company logo
1 of 62
Download to read offline
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Streaming analytics better
than batch - when and why ?
_Adam Kawa - Dawid Wysakowicz -_
Krzysztof Zarzycki_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Have you ever
built cool
Big Data
pipelines?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Use-Case
■ Can be done in batch and
real-time
■ User session analytics at Spotify
● Simple stats
■ Duration, number of songs,
skips, searches etc.
● Advanced analytics
■ Mood, physical activity, real-time
content, ads
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
_1. Dashboards_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
Australian users are
listening to Discover
Weekly too short !!!
_1. Dashboards_ _2. Alerts_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
Australian users are
listening to Discover
Weekly too short !!!
Recommend songs
and ads based on
current activity.
_1. Dashboards_ _2. Alerts_ _3. Content_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
1st
- Batch Architecture
1h
1h
1h
1h - 1d
1h
User
Events
User
Sessions
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
1st
- Batch Architecture
1h
1h
1h
1d
1h
User
Events
User
Sessions
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The More Moving Parts …
⬇ The higher learning curve
⬇ The more gluing code
⬇ The larger administrative effort
⬇ The more error-prone solution
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Long Waiting Time
Image source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and
http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
2nd
- Micro-Batch Architecture
1m - 1h
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
♪ ♪
No Built-In Session Windows
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[10:00 - 11:00) [11:00 - 12:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
♪ ♪
No Built-In Session Windows
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[10:00 - 11:00) [11:00 - 12:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late Data …
♪ ♪ ♪ ♪ ♪ ♪ Event Time
14:55 - 16:35
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
... Included in Current Batch
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪
♪
14:55 - 16:35 16:50 - …
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ ♪ ♪ ♫
♪ ♪ ♫
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ ♪ ♪ ♫
♪ ♪ ♫
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
... Breaks Correctness
♪ ♫ ♪ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫
♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫
♪
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Problems
FILES,
BATCHES,
DATA LAKES
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Solving Streaming Problem With Batch?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
3rd
- Streaming-First Architecture
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
User Session Windows
♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Case B ♪ ♪ ♪ ♪ ♪ ♪
Session gap
eg. 15
minutes
♪
♪♪
5
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
User Session Windows
♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Case B ♪ ♪ ♪ ♪ ♪ ♪
Session gap
eg. 15
minutes
♪
♪♪
5
[3,2]
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reading From Kafka
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪ ♪ ♪ ♪ ♪ ♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Session Windows With Gap
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪
User 1
User 2
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Session Windows With Gap
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
User 1 ♪ ♪ ♪ ♪ ♪ ♪
Session gap
- 15 minutes
♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Analyzing User Session
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Handling Late Events
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Triggering Early Results
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.trigger(EarlyTriggeringTrigger.every(Time.minutes(10)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Sessionization Example
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.trigger(EarlyTriggeringTrigger.every(Time.minutes(10)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
Working example:
https://github.com/getindata/flink-use-case
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Modern Stream Processing Engines
■ Rich stream processing semantic
● Built-in support for event-time windows
● Accurate results for late / out-of-order events and replays
● Early triggers
■ Low latency and high-throughput
■ Exactly-once stateful processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Modern Stream Processing Engines
■ Rich stream processing semantic
● Built-in support for event-time windows
● Accurate results for late / out-of-order events and replays
● Early triggers
■ Low latency and high-throughput
■ Exactly-once stateful processing
User survey:
http://data-artisans.com/flink-user-survey-2016-part-1
http://data-artisans.com/flink-user-survey-2016-part-2
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How can I reprocess data?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reprocessing Events In Flink
1. Take periodic snapshots of a job
● It stores Kafka offsets, on-flight sessions, application state
2. Restart a job from a savepoint rather than from a beginning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What if data is no longer in Kafka?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Consuming Data From HDFS
■ Run your streaming code on HDFS (bounded data)
● You need to read data in event-time based order
● Implement mechanism of proper watermark generation
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What are usual stream processing
applications?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Analytics
Image source: https://www.slideshare.net/sinisalyh/storm-at-spotify
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream 24/7 Applications
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
When is batch processing good?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Batch Processing Use-Cases
■ Ad-hoc analytics and data exploration
● Notebooks, Spark/Flink/Hive, Parquet, complete data sets
■ Technical advantages
● A large swaths of historical data in HDFS
● High-level libraries in mature batch technologies
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Batch Processing Use-Cases
■ Ad-hoc analytics and data exploration
● Notebooks, Spark/Flink/Hive, Parquet, complete data sets
■ Implementation advantages
● Offline experiments over large historical data
■ Historical events are usually stored in HDFS, not Kafka
● High-level libraries in batch processing technologies
■ Spark MLlib, H2O
(when data arrives continuously)
don’t solve
streaming problem
with batch jobs
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Who Are You, actually?
■ At GetInData, we build custom Big Data solutions
● Hadoop, Flink, Spark, Kafka and more
■ Our team is today represented by
Krzysztof
Zarzycki
Dawid
Wysakowicz
Adam
Kawa
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
■ Stream often the natural representation of your data
■ Stream processing is not only about low latency
Summary
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Thanks !
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Log Abstraction
11:00 -
12:00
12:00 -
13:00
…
…
10:00 - …
10:00 - …
10:00 -
11:00
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Spark Structured Streaming
⬇ Operates on top of micro-batches (Spark SQL engine)
■ The ALPHA version and the experimental API until July 11,
2017
⬆ Easy-to-learn API (Dataset/DataFrame)
⬆ Rich ecosystem of tools and libraries e.g. MLlib
⬆ Supports event-time
⬇ Sessionization not yet supported - SPARK-10816
⬇ Queryable state not yet supported - SPARK-16738
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kafka Streams
⬇ No exactly-once (just at-least-once)
⬇ Kafka as the only data source
⬇ No bounded streams (batch) optimizations
⬆ Simplicity
⬆ Embedded into application
⬆ Supports event-time
⬇ Lack of session windows
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Apache Beam
⬆ Unified API for batch and streaming
⬆ Rich streaming processing semantics
⬆ Complex TriggerDSL
⬆ Multiple runtime environments
⬆ Spark, Flink, Apex, Dataflow
⬆ Side inputs and outputs
⬇ Verbose Java API
⬇ New project - Top level since 01/2017
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Google Dataflow
■ Runtime environment for Apache Beam in Google Cloud
⬇ No support for Iterative Computations
⬆ Supports Side Outputs
⬆ Works with every Google Cloud Service (Pub/Sub, BigTable
etc.)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How to join with other data
sets/streams?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Join With Other Datasets / Streams
■ Flink can join windowed streams easily
■ Join of data stream with data set is WIP
● Even with slowly changing data set!
● Even keyed data
Stream 2
Stream 1
Joined Stream Input Stream Joined Stream
+
Id Name
1 John Doe
2 Jane Doe
Dataset
+
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
I like this streaming API.
Can I use it for batch?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Unified batch and streaming API
■ Not with raw Flink API
■ But with Flink Table API
■ Apache Beam
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Is Flink production ready?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Powered By Flink

More Related Content

What's hot

Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013zanthrash
 
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경Mintak Son
 
4Developers: Dns vs webapp
4Developers: Dns vs webapp4Developers: Dns vs webapp
4Developers: Dns vs webappPROIDEA
 
Introduction To Git Workshop
Introduction To Git WorkshopIntroduction To Git Workshop
Introduction To Git Workshopthemystic_ca
 
Introducción a git y GitHub
Introducción a git y GitHubIntroducción a git y GitHub
Introducción a git y GitHubLucas Videla
 
Git - Get Ready To Use It
Git - Get Ready To Use ItGit - Get Ready To Use It
Git - Get Ready To Use ItDaniel Kummer
 
Astricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment toolsAstricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment toolsJöran Vinzens
 

What's hot (9)

Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013
 
Working with Git
Working with GitWorking with Git
Working with Git
 
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
 
Gittalk
GittalkGittalk
Gittalk
 
4Developers: Dns vs webapp
4Developers: Dns vs webapp4Developers: Dns vs webapp
4Developers: Dns vs webapp
 
Introduction To Git Workshop
Introduction To Git WorkshopIntroduction To Git Workshop
Introduction To Git Workshop
 
Introducción a git y GitHub
Introducción a git y GitHubIntroducción a git y GitHub
Introducción a git y GitHub
 
Git - Get Ready To Use It
Git - Get Ready To Use ItGit - Get Ready To Use It
Git - Get Ready To Use It
 
Astricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment toolsAstricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment tools
 

Similar to Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017

Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)GetInData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Apache Flume
Apache FlumeApache Flume
Apache FlumeGetInData
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5Kazuho Oku
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Eric Sammer
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...VirtualTech Japan Inc.
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...GetInData
 
Apache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer AppApache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer AppJim Jagielski
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaTreasure Data, Inc.
 
Spring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyoSpring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyoToshiaki Maki
 
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)GetInData
 
Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)Iwana Chan
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
 
Velocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overloadVelocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overloadsarahjwells
 

Similar to Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017 (20)

Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Apache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer AppApache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer App
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Time-State Analytics
Time-State AnalyticsTime-State Analytics
Time-State Analytics
 
Spring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyoSpring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyo
 
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
 
Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)
 
About time
About timeAbout time
About time
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Velocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overloadVelocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overload
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017

  • 1.
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Streaming analytics better than batch - when and why ? _Adam Kawa - Dawid Wysakowicz -_ Krzysztof Zarzycki_
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Have you ever built cool Big Data pipelines?
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Use-Case ■ Can be done in batch and real-time ■ User session analytics at Spotify ● Simple stats ■ Duration, number of songs, skips, searches etc. ● Advanced analytics ■ Mood, physical activity, real-time content, ads
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? _1. Dashboards_
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? Australian users are listening to Discover Weekly too short !!! _1. Dashboards_ _2. Alerts_
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? Australian users are listening to Discover Weekly too short !!! Recommend songs and ads based on current activity. _1. Dashboards_ _2. Alerts_ _3. Content_
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1st - Batch Architecture 1h 1h 1h 1h - 1d 1h User Events User Sessions
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1st - Batch Architecture 1h 1h 1h 1d 1h User Events User Sessions
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The More Moving Parts … ⬇ The higher learning curve ⬇ The more gluing code ⬇ The larger administrative effort ⬇ The more error-prone solution
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Long Waiting Time Image source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 2nd - Micro-Batch Architecture 1m - 1h
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ♪ ♪ No Built-In Session Windows ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [10:00 - 11:00) [11:00 - 12:00)
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ♪ ♪ No Built-In Session Windows ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [10:00 - 11:00) [11:00 - 12:00)
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late Data … ♪ ♪ ♪ ♪ ♪ ♪ Event Time 14:55 - 16:35 Processing Time
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ... Included in Current Batch ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ 14:55 - 16:35 16:50 - … Event Time Processing Time
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ Event Time Processing Time
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♪ ♫ Event Time Processing Time
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♪ ♫ Event Time Processing Time
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ... Breaks Correctness ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫ ♪ Event Time Processing Time
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Problems FILES, BATCHES, DATA LAKES
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Solving Streaming Problem With Batch?
  • 24. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 3rd - Streaming-First Architecture
  • 25. © Copyright. All rights reserved. Not to be reproduced without prior written consent. User Session Windows ♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ Case B ♪ ♪ ♪ ♪ ♪ ♪ Session gap eg. 15 minutes ♪ ♪♪ 5
  • 26. © Copyright. All rights reserved. Not to be reproduced without prior written consent. User Session Windows ♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ Case B ♪ ♪ ♪ ♪ ♪ ♪ Session gap eg. 15 minutes ♪ ♪♪ 5 [3,2]
  • 27. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Reading From Kafka val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪ ♪ ♪ ♪ ♪ ♪
  • 28. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Session Windows With Gap val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ User 1 User 2
  • 29. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Session Windows With Gap val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) User 1 ♪ ♪ ♪ ♪ ♪ ♪ Session gap - 15 minutes ♪♪
  • 30. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Analyzing User Session val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
  • 31. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Handling Late Events val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪
  • 32. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Triggering Early Results val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .trigger(EarlyTriggeringTrigger.every(Time.minutes(10))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
  • 33. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Sessionization Example val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .trigger(EarlyTriggeringTrigger.every(Time.minutes(10))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) Working example: https://github.com/getindata/flink-use-case
  • 34. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Modern Stream Processing Engines ■ Rich stream processing semantic ● Built-in support for event-time windows ● Accurate results for late / out-of-order events and replays ● Early triggers ■ Low latency and high-throughput ■ Exactly-once stateful processing
  • 35. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Modern Stream Processing Engines ■ Rich stream processing semantic ● Built-in support for event-time windows ● Accurate results for late / out-of-order events and replays ● Early triggers ■ Low latency and high-throughput ■ Exactly-once stateful processing User survey: http://data-artisans.com/flink-user-survey-2016-part-1 http://data-artisans.com/flink-user-survey-2016-part-2
  • 36. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 37. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How can I reprocess data?
  • 38. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Reprocessing Events In Flink 1. Take periodic snapshots of a job ● It stores Kafka offsets, on-flight sessions, application state 2. Restart a job from a savepoint rather than from a beginning
  • 39. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What if data is no longer in Kafka?
  • 40. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Consuming Data From HDFS ■ Run your streaming code on HDFS (bounded data) ● You need to read data in event-time based order ● Implement mechanism of proper watermark generation
  • 41. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What are usual stream processing applications?
  • 42. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Analytics Image source: https://www.slideshare.net/sinisalyh/storm-at-spotify
  • 43. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream 24/7 Applications
  • 44. © Copyright. All rights reserved. Not to be reproduced without prior written consent. When is batch processing good?
  • 45. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Batch Processing Use-Cases ■ Ad-hoc analytics and data exploration ● Notebooks, Spark/Flink/Hive, Parquet, complete data sets ■ Technical advantages ● A large swaths of historical data in HDFS ● High-level libraries in mature batch technologies
  • 46. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Batch Processing Use-Cases ■ Ad-hoc analytics and data exploration ● Notebooks, Spark/Flink/Hive, Parquet, complete data sets ■ Implementation advantages ● Offline experiments over large historical data ■ Historical events are usually stored in HDFS, not Kafka ● High-level libraries in batch processing technologies ■ Spark MLlib, H2O (when data arrives continuously) don’t solve streaming problem with batch jobs
  • 47. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Who Are You, actually? ■ At GetInData, we build custom Big Data solutions ● Hadoop, Flink, Spark, Kafka and more ■ Our team is today represented by Krzysztof Zarzycki Dawid Wysakowicz Adam Kawa
  • 48. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ■ Stream often the natural representation of your data ■ Stream processing is not only about low latency Summary
  • 49. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A
  • 50. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Thanks !
  • 51. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 52. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Log Abstraction 11:00 - 12:00 12:00 - 13:00 … … 10:00 - … 10:00 - … 10:00 - 11:00
  • 53. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Spark Structured Streaming ⬇ Operates on top of micro-batches (Spark SQL engine) ■ The ALPHA version and the experimental API until July 11, 2017 ⬆ Easy-to-learn API (Dataset/DataFrame) ⬆ Rich ecosystem of tools and libraries e.g. MLlib ⬆ Supports event-time ⬇ Sessionization not yet supported - SPARK-10816 ⬇ Queryable state not yet supported - SPARK-16738
  • 54. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kafka Streams ⬇ No exactly-once (just at-least-once) ⬇ Kafka as the only data source ⬇ No bounded streams (batch) optimizations ⬆ Simplicity ⬆ Embedded into application ⬆ Supports event-time ⬇ Lack of session windows
  • 55. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Apache Beam ⬆ Unified API for batch and streaming ⬆ Rich streaming processing semantics ⬆ Complex TriggerDSL ⬆ Multiple runtime environments ⬆ Spark, Flink, Apex, Dataflow ⬆ Side inputs and outputs ⬇ Verbose Java API ⬇ New project - Top level since 01/2017
  • 56. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Google Dataflow ■ Runtime environment for Apache Beam in Google Cloud ⬇ No support for Iterative Computations ⬆ Supports Side Outputs ⬆ Works with every Google Cloud Service (Pub/Sub, BigTable etc.)
  • 57. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How to join with other data sets/streams?
  • 58. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Join With Other Datasets / Streams ■ Flink can join windowed streams easily ■ Join of data stream with data set is WIP ● Even with slowly changing data set! ● Even keyed data Stream 2 Stream 1 Joined Stream Input Stream Joined Stream + Id Name 1 John Doe 2 Jane Doe Dataset +
  • 59. © Copyright. All rights reserved. Not to be reproduced without prior written consent. I like this streaming API. Can I use it for batch?
  • 60. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Unified batch and streaming API ■ Not with raw Flink API ■ But with Flink Table API ■ Apache Beam
  • 61. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Flink production ready?
  • 62. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Powered By Flink