SlideShare a Scribd company logo
Approximate queries and
graph streams on Flink
Theodore Vasiloudis, Swedish Institute of Technology
tvas@kth.se
@thvasilo
Hello, I’m Theo!
Motivation
● We want analytics that provide us with immediate answers
○ Hadoop: Yesterday’s insights, tomorrow!
● Data never stops!
○ Infinite memory?
● Solution: windows and approximations
○ Windows give us a snapshot of the world
○ Approximations allow us to continuously measure the world
○ First part of presentation is about approximate streaming algorithms, second focuses on
using windows for graph analytics
Approximate Queries on Flink
Work by Tobias Lindener, KTH
https://github.com/tlindener/ApproximateQueries/
End Goal: Approximate SQL queries on Flink
Source: BlinkDB
First step: Sketches for standing queries
● Num. unique users who visited both Apps and Music over the last hour
● Median and 95%ile Time Spent over the last day
● Most frequently purchased songs
Source: Yahoo Datasketches
Sketch Algorithms for Massive Data
● Research area since the 70s (Knuth, Indyk, Flajolet)
● Goal: Efficient (compute+memory) algorithms for “simple” tasks
○ Frequent items
○ Set cardinality
○ Moments (mean, median, variance etc.)
○ Quantiles and histograms
○ Graph algorithms (triangle count, connected components)
○ Nearest neighbors
● We use them as building blocks for more complex algorithms
○ Databases (joins)
○ Machine learning (decision trees)
Yahoo Datasketches
● Highly optimized sketch library
● Apache Licensed
● Available for Pig, Hive
Big Win #1: Size of the Query Process
Source: Yahoo Datasketches
Big Win #2: Sketch Mergeability Enables
Parallel Processing
Source: Yahoo Datasketches
Big Wins #3 & 4: Query Speed, Architecture
Simplicity
Source: Yahoo Datasketches
Big Wins #5 & 6: Real Time, Late Data
Updates
Source: Yahoo Datasketches
● From PR sketches-core#81: Sketch now implements Serializable (closed)
● Lee Rhodes (package author) writes:
Why Flink?
Sketches are streaming algorithms and are stateful by design. Attempting to
force them into a stateless paradigm will result in orders-of-magnitude poorer
performance. It is like pounding a square peg into a round hole.
Library Design
Source: Tobias
Lindender
Library Design
Source:
Tobias
Lindender
Query API
● Cardinality Estimation Queries
Query API
● Frequent Items & Quantiles
Experiments
Datasets
● Amazon Reviews
○ ~84M reviews of Amazon products
○ User, Item, Rating, Timestamp
● WikiTrace
○ Wikipedia access logs
○ ~80M requests, ~7M URLs
Amazon Reviews per product
Runtime and memory consumption
Runtime Memory
Exact ~700s ~11GB
Sketch ~90s ~2GB
Wikitraces: Accuracy of HyperLogLog sketch compared to exact
Amazon: Accuracy of frequent item sketch for different map sizes (lower is better)
● Built library to allow to easily run approximate queries on top of Flink
● End goal is to enable approximate streaming queries with SQL syntax
Summary
Gelly Stream:
Streaming Graph Processing
Paris Carbone, KTH, Flink Committer
Vasiliki Kalavri, ETH, Flink PMC
Motivation
● Graphs are powerful representations of many interactions
○ Social networks
○ Purchases
○ Media views
● Again, data are massive, constantly arriving, and unbounded
● So we need distributed streaming graph processing
Slides by Paris Carbone
Previous work
● Graph snapshots
○ Pregel, Giraph, GraphX
● Graph streams
○ Summaries, approximate algorithms
○ Semi-streaming (disk)
Load-Compute-Store
1. Load snapshot to memory
2. Compute state/superstep
3. Store updated graph state
4. Goto 1
Load-Compute-Store
● Wide adoption: Pregel, Graphlab, GraphX
● Interfaces well with existing batch systems
● But has model issues:
○ Unnecessary latency for all graph measures.
○ Inefficient for incorporating updates
○ Sensitive to the partitioning method
○ Re-computation across snapshots
Graph Summaries: Intuition
Graph Summary Flavours
● Spanners : distance estimation
● Sparsifiers : cut estimation
● Sketches : homomorphic properties
Engineering benefits of stream processing
● Low latency and high-throughput
● Long-running processes can now pipeline
computation
● Production Ready: end-to-end fault tolerance
Realizations brought by stream processing
1. Duality of input data + computational state
2. Out-of-order processing
Exploiting stream processing for graphs
1. Duality of input data + computational state
a. Define evolving graph properties
b. Graph updates (input) ⇔ properties (state)
2. Out-of-order processing
a. Pre-compute blocking graph operations
b. Multiplex processing per snapshot or window
Gelly-stream overview
➤Static Graphs
➤Multi-Pass Algorithms
➤Single Answer
➤Dynamic Graphs
➤Single-Pass
Properties/Summaries
➤Multi-Pass on Snapshots
Gelly-Stream Data Types
● EdgeStream -> Non-Blocking / Single-Pass Computation
○ A distributed data stream consisting of graph edge
additions.
○ Edges can contain state (e.g. weights).
○ Supports property streams, transformations and
aggregations.
● SnapshotStream -> Blocking / Multi-Pass Computation
• Each Snapshot is bounded~ i.e., static graph window.
• It enables neighborhood aggregations, iterations (e.g.,
BSP)
EdgeStream Operations
.getEdges()
.getVertices()
.numberOfVertices()
.numberOfEdges()
.getDegrees()
.inDegrees()
.outDegrees()
.mapEdges();
.distinct();
.filterVertices();
.filterEdges();
.reverse();
.undirected();
.union();
Property Streams
EdgeStream → DataStream
Transformation Streams
EdgeStream → EdgeStream
EdgeStream Summaries
edgeStream.aggregate
(new Summary(window, fold, combine, lower))
edgeStream.aggregate
(new Summary(window, fold, combine, lower))
edgeStream.aggregate
(new Summary(window, fold, combine, lower))
Controls the output
granularity
edgeStream.aggregate
(new Summary(window, fold, combine, lower))
window-triggered
tree-reduce
edgeStream.aggregate
(new Summary(window, fold, combine, lower))
Provided Aggregates/Summaries
● Connected Components
● Bipartiteness Check (Binary)
● Window Triangle Count
● Rolling Triangle Count (Approximate)
● Continuous Degree Aggregate
Neighborhood Aggregation Example
edgeStream.filterVertices(DataScientists())
.slice(Time.of(10, MINUTE), EdgeDirection.IN)
.applyOnNeighbors(FindPairs())
{wendy, steve}
{steve, sandra}
{wendy, sandra}
{tom, rafa}
Snapshot Iterations
● Most “deep” graph properties require multiple passes
● Sensitivity to synchrony during iterative processing
depends on the algorithm and should be flexible (e.g.,
as in GraphLab).
● Avoiding scheduling delays (e.g. scheduling DataSet
Iterations) is crucial for continuous processing.
Flink Stream Iterations
● A logical+physical loop redesign on Flink
● Introduces scoping and custom progress tracking
● Extends out-of-order dataflow processing
● Fully decentralised iterative execution
Take home message
● Streaming means unbounded
● Input <==> State
● Sketches and summaries let you deal
with the unbounded nature of data
using limited resources
Thank you!
tvas@kth.se
@thvasilo
References
● ApproximateQueries on Flink:
https://github.com/tlindener/ApproximateQueries/
● Gelly Streaming
https://github.com/vasia/gelly-streaming
● Yahoo Datasketches
https://datasketches.github.io/
● Collection of links on streaming algorithms and sketches
https://gist.github.com/debasishg/8172796

More Related Content

What's hot

Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Ansgar Scherp
 
Mining data streams
Mining data streamsMining data streams
Mining data streams
Akash Gupta
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Towards Data Operations
Towards Data OperationsTowards Data Operations
Towards Data Operations
Andrea Monacchi
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
MLconf
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
Pier Luca Lanzi
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor search
FaithWestdorp
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
Joe Kelley
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
Vasia Kalavri
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
Harisankar H
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
Vasia Kalavri
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
Spark Summit
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
Daniel Marcous
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Scalable Machine Learning
Scalable Machine LearningScalable Machine Learning
Scalable Machine Learning
Mikio L. Braun
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
Fraugster
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
MLconf
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
Ansgar Scherp
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
Rim Moussa
 

What's hot (20)

Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
Mining data streams
Mining data streamsMining data streams
Mining data streams
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Towards Data Operations
Towards Data OperationsTowards Data Operations
Towards Data Operations
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor search
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Scalable Machine Learning
Scalable Machine LearningScalable Machine Learning
Scalable Machine Learning
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 

Similar to Approximate queries and graph streams on Flink, theodore vasiloudis, seattle flink meetup jun 2018

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
Mario Alexandro Santini
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
Databricks
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Neville Li
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
Edward Capriolo
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
Demi Ben-Ari
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
Claudiu Coman
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
Neville Li
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
Rajesh Muppalla
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
Neville Li
 

Similar to Approximate queries and graph streams on Flink, theodore vasiloudis, seattle flink meetup jun 2018 (20)

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 

More from Bowen Li

Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
Bowen Li
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
Bowen Li
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
Bowen Li
 
Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
Bowen Li
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Bowen Li
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
Bowen Li
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
Bowen Li
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
Bowen Li
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Bowen Li
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
Bowen Li
 

More from Bowen Li (15)

Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
 
Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
 

Recently uploaded

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Approximate queries and graph streams on Flink, theodore vasiloudis, seattle flink meetup jun 2018