SlideShare a Scribd company logo
Seif Haridi KTH/RISE
AI @ RISE
Hopsworks, Apache Flink and Beyond
Big Data Analytics Platforms
By KTH and RISE SICS
Hopsworks: End2End Data Platform for Analytics/ML
Datasources
Applications
API
Dashboards
Hopsworks
Apache Beam
Apache Spark Pip
Conda
Tensorflow
scikit-learn
PyTorch
J upyter
Notebooks
Tensorboard
Apache Beam
Apache Spark
Apache Flink
Kubernetes
Batch Distributed
ML &DL
Model
Serving
Hopsworks
Feature Store
Kafka +
Spark
Streaming
Model
Monitoring
Orchestration in Airflow
Data Preparation
&Ingestion
Experimentation
&Model Training
Deploy
&Productionalize
Streaming
Filesystem and Metadata storage
HopsFS
Apache
Kafka
Datasources
Logical Clocks was founded by
the team that created and
continues to drive
Hopsworks a Data-Intensive AI
platform, and its Feature Store,
a warehouse for machine
learning features.
Logical Clocks’ vision is to
simplify the process of
refining data into intelligence
at scale
25
Continuous Intelligence
A design pattern in which real-time analytics are integrated within a business operation,
processing current and historical data to prescribe actions in response to events.
Business
Tech
https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner-identifies-top-10-data-and-analytics-technolo
events actions
Paradigm Shift in Data Processing
Data
lots of
Queries
retrospective
answers
Query
lots of
Data
real-time
answers
• Data Stream Processing as a 24/7 execution paradigm
paradigm
shift
6
Stream SQL, CEP…
Kafka, Pub/Sub, Kinesis,
Pravega…
Flink, Beam, Kafka-Streams,
Apex, Storm, Spark
Streaming…
Storage
Compute
High Level
Models
The Real-Time Analytics Stack
Actors vs Streams
vs
Data Stream ComputingActor Programming
• Declarative Programming
• State Managed by the system
• Robust: Built-in Fault Tolerance
• Scalable Deployments
service
logic
service
logic
state log
ic
log
ic
log
ic
log
ic
log
iclogic logic
logic
log
ic
logic
state
• Low-Level Event-Based Programming
• Manual/External State
• Not Robust: Manual Fault Tolerance
• Not flexible scaling
Declarative
Program
service
Stream SQL, CEP…
Kafka, Pub/Sub, Kinesis,
Pravega…
Flink, Beam, Kafka-Streams,
Apex, Storm, Spark Streaming…
Storage
Compute
High Level
Models
8
The Real-Time Analytics Stack
9
Apache Flink Foundations
commercial
deployments
• Top-level Apache Project
• #1 stream processor (2019)
• Production-Proof
• > 400 contributors
• 100s of deployments
Data Streams, Fault Tolerance,
Window Aggregation
Calcite
stream-SQL
influenced
Structure of a 24/7 Application
Event Logs
Historic
Data
Event Logs
Files
Applications/Services
Stream
Processing
State
Program Hierarchy in Flink
11
Dataflow Engine
• Fault Tolerance
• Scalability
• Monitoring/IO Management
Automates
Program Hierarchy in Flink
12
Dataflow Engine
• Fault Tolerance
• Scalability
• Monitoring/IO Management
• Dynamic program state
• Operations on out-of-order streams
Event Processing API f(input, state, time)
Automates
Program Hierarchy in Flink
13
Dataflow Engine
• Fault Tolerance
• Scalability
• Monitoring/IO Management
• Dynamic program state
• Operations on out-of-order streams
Event Processing API f(input, state, time)
DataStream API window,map,filter etc.
• Higher-Order Streaming Functions
• Event Windowing (sessions, time etc.)
Automates
Program Hierarchy in Flink
14
Dataflow Engine
• Fault Tolerance
• Scalability
• Monitoring/IO Management
• Dynamic program state
• Operations on out-of-order streams
Event Processing API f(input, state, time)
DataStream API window,map,filter etc.
• Higher-Order Streaming Functions
• Event Windowing (sessions, time etc.)
SQL, CEP, Tables, ML
• Fully Declarative Programming
• Event Patterns, Relations etc.
Automates
Domain-Specific APIs
15
Declarative Streaming Examples
Average Tip per Hour
with Stream SQL
16
Declarative Streaming Examples
Completed Taxi Rides within
120min with Complex Event
Processing
Example Use Cases
Real-Time Analytics in Action
https://flink.apache.org/poweredby.html
https://www.flink-forward.org/
18
Marketplace - Dynamic Ride Pricing with Apache Flink (2018)
https://marketplace.uber.com/ Flink Forward 2018
• supply
• demand (taxi orders)
• Trips
• Traffic
Compute Location-Sensitive Trends in Rider Demand and Driver Availability
Prices
• Pricing
• Dispatch
• Promotions
• Driver Positioning
Geo-Sensitive Time-based Aggregations
million events
per sec
Input Streams Output Decisions
19
Flink as an Anomaly-Detection Engine
for the Cloud (2018)
• Activity-Based Threat Protection
• Behavioural model/per cloud user
• Detect outliers/suspicious behavior
• Cross-reference suspicious users
• Alert Admins within seconds
We needed a stateful and scalable stream processing framework. We tested everything (Azure ML/Streams,
MS Orlieans, Apache Storm/Samza/Spark/Ignite/Beam etc.) and chose Flink. - Yonatan Most & Avihai
Berkovitz -https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-yonatan-most-avihai-berkovitz-anomaly-detection-engine-for-cloud-activities-using-flink
8 data clusters. many TB of state
30k events per second
20
Data Streaming at Mass Scale
https://data-artisans.com/blog/blink-flink-alibaba-search
• Biggest Retailer in the world.
• Entire Product Search, A/B Testing, User
Recommendations and Analytics Services are powered by
Blink (fork of Flink).
• 1000s of nodes actively in production.
Continuous Deep Analytics CDA
knowledge
PROCESSING
∞
Data
REASONING
Decision
Making
The goal of the CDA
• Create a Big Data platform
that can leverage complex
real-time decisions based on
massive live data.
Real-Time and Deep Analytics
for Central & Edge Clouds
Our promise and vision
From Real-Time Analytics to
Continuous Deep Analytics
X
Query
live
data
real-time
answers
Deep
Analytics
Historic
Model
historic
data
CDA
system
all
data
critical
decision
making
Live
Model
online
offline
The Continuous Deep Analytics Paradigm Shift
?
?
?
?
The Bigger Picture
24
Data
Processing
• scalable, fault tolerant analytics
• event-based business logic
• out-of-order computation
• dynamic relational tables (SQL)
• event pattern-matching (CEP)
Data Streams
• tensors
• graph algorithms
• deep learning
• feature learning
• reinforcement learning
• ….
but what about deeper analytics…
Data Pipelines Today
•Many Frameworks/Frontends for different needs
•(ML Training & Serving, SQL, Streams, Tensors, Graphs)
25 ⋈
⋈
⋈
σθ
σθ
σθ
σθ
π
π
Streams
Feature Learning
Tensor Programming Dynamic
Graphs
AI ML
RL
Simulation tasks
Reasoning
Feature Engineering
Model Serving
26
Marketplace - Dynamic Ride Pricing with Apache Flink (2018)
https://marketplace.uber.com/ Flink Forward 2018
• supply
• demand (taxi orders)
• Trips
• Traffic
Compute Location-Sensitive Trends in Rider Demand and Driver Availability
Prices
• Pricing
• Dispatch
• Promotions
• Driver Positioning
Geo-Sensitive Time-based Aggregations
million events
per sec
Input Streams Output Decisions
The Problem & Solution
Problem
Data analytics pipelines build on diverse programming models
with hard abstraction boundaries
Performance deteriorates from context switching, steep data
movement costs and excessive type conversions
Solution
A solution is to raise the level of abstraction through an
intermediate representation (IR). The IR is a programming
language that is able to both express and reason about each of the
programming models.
ArconArcon
Arcon
28
The Arcon Vision
Tensors DataFrames DataStreams Graphs
Unified Declarative Programming
Shared Native Execution
Cross-Compile
Optimize
Generated code
The Arcon Architecture
29
Unified Analytics DSL
Arcon Runtime
Arc IR (Intermediate Representation)
30
Arc IR
Translation
Data
Streams
Linear
Algebra
Relational
Algebra
σθ
σθ
π
⋈
Core
DSL
Unified analytics DSL
• Host language-agnostic core
• Compositional
• First-class citizen support for:
• streams, tensors, relations
Stream
Task
The Arc
Intermediate Representation
Graph
Task
Tensor
Task
λ2 λ3λ1
λ1IR λ2IR λ3IR
λ1 + λ2 + λ3
32
Arcon
Arc (High Level IR)
Logical Dataflow IR
Arcon runner
Hardware
Arcon Compiler Pipeline
Dataflow optimizations
Compiler optimizations
Cross-domain optimizations
Rust based runner
Hardware accelerated
Dynamic task execution
CPU/GPU/FPGA
Local & distributed
Dynamic scaling
Arc an IR for expressing and
optimizing computations that
combine stream, relations and
linear algebra
Arcon a general purpose
distributed runtime written in Rust
Arc IR
33
• A minimal yet feature-complete set of read/write-only types and expressions
Arc Optimisations
• Arc supports both compiler and dataflow optimisations
• Compiler: Loop unrolling, partial evaluation,
• Dataflow: Operator fusion, fission, reordering,
specialization, ...
34
Performance
35
v + 3
v + 1 + 1 + 1
v + 1 v + 1 v + 1
v + 1 v + 1 v + 1
Unoptimised
Fused
Partially Evaluated
Inlined
(Task with function)
Performance
36
x2 orders of
magnitude
faster
Unoptimised
Partially Evaluated
Fused
Inlined
• 10M elements mapped 50
times on Apache Flink
• Arc can boost even existing
frameworks
A Runtime Capable for
Unified Analytics
37
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications SOCC 2019
Garefalakis, Karanasos, Pietzuch
Hadoop SparkFlink Arcon
Storm
Performance Matters
• Arc Optimizer : ~10x Speedup
• Shared Hardware Acceleration : ~102x Speedup
• Data Parallel Execution : ~103x Speedup
38
Thanks
• To the CDA and HOPS teams and in general to the
distributed computing group at KTH and RISE SICS
• Please Visit
• DC@KTH https://dcatkth.github.io/
• HOPS https://www.hops.io/
• LogicalClocks https://www.logicalclocks.com/

More Related Content

What's hot

Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
DataWorks Summit
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analytics
DataWorks Summit
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
confluent
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Khai Tran
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Wei Di
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
Databricks
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Databricks
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
DataWorks Summit/Hadoop Summit
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
DataWorks Summit
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
confluent
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
Databricks
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
DataWorks Summit/Hadoop Summit
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Big Data Spain
 

What's hot (20)

Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analytics
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
 

Similar to Big Data Analytics Platforms by KTH and RISE SICS

Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
Spark Summit
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
Aniket Mokashi
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
Databricks
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
Amazon Web Services
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
lisanl
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
ZalandoHayley
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
Zalando Technology
 
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
Apigee | Google Cloud
 

Similar to Big Data Analytics Platforms by KTH and RISE SICS (20)

Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
 

More from Big Data Value Association

Data Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharingData Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharing
Big Data Value Association
 
Key Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplaceKey Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplace
Big Data Value Association
 
GDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharingGDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharing
Big Data Value Association
 
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Big Data Value Association
 
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and PrivacyThree pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Big Data Value Association
 
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Big Data Value Association
 
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
Big Data Value Association
 
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
Big Data Value Association
 
BDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionalsBDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionals
Big Data Value Association
 
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
Big Data Value Association
 
BDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshopBDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshop
Big Data Value Association
 
BDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshopBDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshop
Big Data Value Association
 
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
Big Data Value Association
 
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
Big Data Value Association
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
Big Data Value Association
 
Virtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench FrameworkVirtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench Framework
Big Data Value Association
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Big Data Value Association
 
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Big Data Value Association
 
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical OverviewPolicy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
Big Data Value Association
 
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Big Data Value Association
 

More from Big Data Value Association (20)

Data Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharingData Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharing
 
Key Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplaceKey Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplace
 
GDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharingGDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharing
 
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
 
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and PrivacyThree pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
 
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
 
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
 
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
 
BDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionalsBDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionals
 
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
 
BDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshopBDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshop
 
BDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshopBDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshop
 
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
 
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
 
Virtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench FrameworkVirtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench Framework
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
 
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
 
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical OverviewPolicy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
 
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
 

Recently uploaded

一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 

Recently uploaded (20)

一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 

Big Data Analytics Platforms by KTH and RISE SICS

  • 1. Seif Haridi KTH/RISE AI @ RISE Hopsworks, Apache Flink and Beyond Big Data Analytics Platforms By KTH and RISE SICS
  • 2.
  • 3. Hopsworks: End2End Data Platform for Analytics/ML Datasources Applications API Dashboards Hopsworks Apache Beam Apache Spark Pip Conda Tensorflow scikit-learn PyTorch J upyter Notebooks Tensorboard Apache Beam Apache Spark Apache Flink Kubernetes Batch Distributed ML &DL Model Serving Hopsworks Feature Store Kafka + Spark Streaming Model Monitoring Orchestration in Airflow Data Preparation &Ingestion Experimentation &Model Training Deploy &Productionalize Streaming Filesystem and Metadata storage HopsFS Apache Kafka Datasources
  • 4. Logical Clocks was founded by the team that created and continues to drive Hopsworks a Data-Intensive AI platform, and its Feature Store, a warehouse for machine learning features. Logical Clocks’ vision is to simplify the process of refining data into intelligence at scale
  • 5. 25 Continuous Intelligence A design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. Business Tech https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner-identifies-top-10-data-and-analytics-technolo events actions
  • 6. Paradigm Shift in Data Processing Data lots of Queries retrospective answers Query lots of Data real-time answers • Data Stream Processing as a 24/7 execution paradigm paradigm shift 6 Stream SQL, CEP… Kafka, Pub/Sub, Kinesis, Pravega… Flink, Beam, Kafka-Streams, Apex, Storm, Spark Streaming… Storage Compute High Level Models The Real-Time Analytics Stack
  • 7. Actors vs Streams vs Data Stream ComputingActor Programming • Declarative Programming • State Managed by the system • Robust: Built-in Fault Tolerance • Scalable Deployments service logic service logic state log ic log ic log ic log ic log iclogic logic logic log ic logic state • Low-Level Event-Based Programming • Manual/External State • Not Robust: Manual Fault Tolerance • Not flexible scaling Declarative Program service
  • 8. Stream SQL, CEP… Kafka, Pub/Sub, Kinesis, Pravega… Flink, Beam, Kafka-Streams, Apex, Storm, Spark Streaming… Storage Compute High Level Models 8 The Real-Time Analytics Stack
  • 9. 9 Apache Flink Foundations commercial deployments • Top-level Apache Project • #1 stream processor (2019) • Production-Proof • > 400 contributors • 100s of deployments Data Streams, Fault Tolerance, Window Aggregation Calcite stream-SQL influenced
  • 10. Structure of a 24/7 Application Event Logs Historic Data Event Logs Files Applications/Services Stream Processing State
  • 11. Program Hierarchy in Flink 11 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management Automates
  • 12. Program Hierarchy in Flink 12 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) Automates
  • 13. Program Hierarchy in Flink 13 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) DataStream API window,map,filter etc. • Higher-Order Streaming Functions • Event Windowing (sessions, time etc.) Automates
  • 14. Program Hierarchy in Flink 14 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) DataStream API window,map,filter etc. • Higher-Order Streaming Functions • Event Windowing (sessions, time etc.) SQL, CEP, Tables, ML • Fully Declarative Programming • Event Patterns, Relations etc. Automates Domain-Specific APIs
  • 15. 15 Declarative Streaming Examples Average Tip per Hour with Stream SQL
  • 16. 16 Declarative Streaming Examples Completed Taxi Rides within 120min with Complex Event Processing
  • 17. Example Use Cases Real-Time Analytics in Action https://flink.apache.org/poweredby.html https://www.flink-forward.org/
  • 18. 18 Marketplace - Dynamic Ride Pricing with Apache Flink (2018) https://marketplace.uber.com/ Flink Forward 2018 • supply • demand (taxi orders) • Trips • Traffic Compute Location-Sensitive Trends in Rider Demand and Driver Availability Prices • Pricing • Dispatch • Promotions • Driver Positioning Geo-Sensitive Time-based Aggregations million events per sec Input Streams Output Decisions
  • 19. 19 Flink as an Anomaly-Detection Engine for the Cloud (2018) • Activity-Based Threat Protection • Behavioural model/per cloud user • Detect outliers/suspicious behavior • Cross-reference suspicious users • Alert Admins within seconds We needed a stateful and scalable stream processing framework. We tested everything (Azure ML/Streams, MS Orlieans, Apache Storm/Samza/Spark/Ignite/Beam etc.) and chose Flink. - Yonatan Most & Avihai Berkovitz -https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-yonatan-most-avihai-berkovitz-anomaly-detection-engine-for-cloud-activities-using-flink 8 data clusters. many TB of state 30k events per second
  • 20. 20 Data Streaming at Mass Scale https://data-artisans.com/blog/blink-flink-alibaba-search • Biggest Retailer in the world. • Entire Product Search, A/B Testing, User Recommendations and Analytics Services are powered by Blink (fork of Flink). • 1000s of nodes actively in production.
  • 21. Continuous Deep Analytics CDA knowledge PROCESSING ∞ Data REASONING Decision Making The goal of the CDA • Create a Big Data platform that can leverage complex real-time decisions based on massive live data.
  • 22. Real-Time and Deep Analytics for Central & Edge Clouds Our promise and vision From Real-Time Analytics to Continuous Deep Analytics X Query live data real-time answers Deep Analytics Historic Model historic data CDA system all data critical decision making Live Model online offline The Continuous Deep Analytics Paradigm Shift
  • 23. ? ? ? ? The Bigger Picture 24 Data Processing • scalable, fault tolerant analytics • event-based business logic • out-of-order computation • dynamic relational tables (SQL) • event pattern-matching (CEP) Data Streams • tensors • graph algorithms • deep learning • feature learning • reinforcement learning • …. but what about deeper analytics…
  • 24. Data Pipelines Today •Many Frameworks/Frontends for different needs •(ML Training & Serving, SQL, Streams, Tensors, Graphs) 25 ⋈ ⋈ ⋈ σθ σθ σθ σθ π π Streams Feature Learning Tensor Programming Dynamic Graphs AI ML RL Simulation tasks Reasoning Feature Engineering Model Serving
  • 25. 26 Marketplace - Dynamic Ride Pricing with Apache Flink (2018) https://marketplace.uber.com/ Flink Forward 2018 • supply • demand (taxi orders) • Trips • Traffic Compute Location-Sensitive Trends in Rider Demand and Driver Availability Prices • Pricing • Dispatch • Promotions • Driver Positioning Geo-Sensitive Time-based Aggregations million events per sec Input Streams Output Decisions
  • 26. The Problem & Solution Problem Data analytics pipelines build on diverse programming models with hard abstraction boundaries Performance deteriorates from context switching, steep data movement costs and excessive type conversions Solution A solution is to raise the level of abstraction through an intermediate representation (IR). The IR is a programming language that is able to both express and reason about each of the programming models.
  • 27. ArconArcon Arcon 28 The Arcon Vision Tensors DataFrames DataStreams Graphs Unified Declarative Programming Shared Native Execution Cross-Compile Optimize Generated code
  • 28. The Arcon Architecture 29 Unified Analytics DSL Arcon Runtime Arc IR (Intermediate Representation)
  • 29. 30 Arc IR Translation Data Streams Linear Algebra Relational Algebra σθ σθ π ⋈ Core DSL Unified analytics DSL • Host language-agnostic core • Compositional • First-class citizen support for: • streams, tensors, relations
  • 31. 32 Arcon Arc (High Level IR) Logical Dataflow IR Arcon runner Hardware Arcon Compiler Pipeline Dataflow optimizations Compiler optimizations Cross-domain optimizations Rust based runner Hardware accelerated Dynamic task execution CPU/GPU/FPGA Local & distributed Dynamic scaling Arc an IR for expressing and optimizing computations that combine stream, relations and linear algebra Arcon a general purpose distributed runtime written in Rust
  • 32. Arc IR 33 • A minimal yet feature-complete set of read/write-only types and expressions
  • 33. Arc Optimisations • Arc supports both compiler and dataflow optimisations • Compiler: Loop unrolling, partial evaluation, • Dataflow: Operator fusion, fission, reordering, specialization, ... 34
  • 34. Performance 35 v + 3 v + 1 + 1 + 1 v + 1 v + 1 v + 1 v + 1 v + 1 v + 1 Unoptimised Fused Partially Evaluated Inlined (Task with function)
  • 35. Performance 36 x2 orders of magnitude faster Unoptimised Partially Evaluated Fused Inlined • 10M elements mapped 50 times on Apache Flink • Arc can boost even existing frameworks
  • 36. A Runtime Capable for Unified Analytics 37 Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications SOCC 2019 Garefalakis, Karanasos, Pietzuch Hadoop SparkFlink Arcon Storm
  • 37. Performance Matters • Arc Optimizer : ~10x Speedup • Shared Hardware Acceleration : ~102x Speedup • Data Parallel Execution : ~103x Speedup 38
  • 38. Thanks • To the CDA and HOPS teams and in general to the distributed computing group at KTH and RISE SICS • Please Visit • DC@KTH https://dcatkth.github.io/ • HOPS https://www.hops.io/ • LogicalClocks https://www.logicalclocks.com/