SlideShare a Scribd company logo
Flink for Everyone: Self-Service Data
Analytics with StreamPipes
Patrick Wiener, Philipp Zehnder
Flink Forward Europe 2019, Berlin, 2019-10-08
www.streampipes.org | @streampipes | github.com/streampipes
2
"A self-service IoT toolbox to enable non-technical users
to connect, analyze and explore IoT data streams"
What's StreamPipes?
www.streampipes.org | @streampipes | github.com/streampipes
3
What's StreamPipes?
Big Data / Edge
InfrastructureExecute
Reusable
algorithm toolbox
Install
Model pipelines
www.streampipes.org | @streampipes | github.com/streampipes
About us
4
Dominik Riemer
Senior Research Scientist
Philipp Zehnder
Research Scientist
Patrick Wiener
Research Scientist
FZI Research Center for Information Technology, Karlsruhe, Germany
Stream Processing, Data Management, Machine Learning
Non-profit research center for applied ICT research (250 employees)
Started StreamPipes in 2014, first OSS release 2018
www.streampipes.org | @streampipes | github.com/streampipes
Agenda
The need for self-service IoT data analytics1
StreamPipes: Technical Overview
Demo
2
Lessons Learned w/ Flink & Getting Started3
The need for self-service IoT data analytics
1
www.streampipes.org | @streampipes | github.com/streampipes
Conveyor Belts
Pressure
Oil temperature
Dust particles
Production plans
Environmental Data
Gear box drive
Energy consumption
Telematics
Industrial Internet of Things
Data streams everywhere
Continuous Monitoring Situational Awareness
Continuous Data
Harmonization
Flexible data integration
from heterogeneous
sources and monitoring
of current system states
Detect time-critical
situations, e.g., by
means of rules or ML
approaches
Continuous pre-
processing and
transformation of input
streams for third party
systems
Industrial Internet of Things
Typical application scenarios
www.streampipes.org | @streampipes | github.com/streampipes
StreamPipes
Open Source framework to easily manage IoT data
Data Access
Data analytics &
harmonization
Data exploration &
exploitation
Generic adapters
Specific adapters
Metadata
Data streams & sets
Pre-processing
Filter/Aggregation
Pattern Detection
ML
Situation detection
Harmonized data sets
Visualizations
Third-party systems
9
Technical Overview
2
www.streampipes.org | @streampipes | github.com/streampipes
High-level architecture
Analytics Microservices
Data Integration
Data Sources
Adapter Library
Pipeline Editor
Streaming Engine
11
www.streampipes.org | @streampipes | github.com/streampipes
High-level architecture
Analytics Microservices
Data Integration
Data Sources
Adapter Library
Pipeline Editor
Streaming Engine
12
Data Access
StreamPipes Connect: Easily connect IoT sources
www.streampipes.org | @streampipes | github.com/streampipes
Data Access
Machine-interpretable metadata
100
011
010
001
010
010
100
101
000
111
data stream
{ 
"tstamp": 1453478160,
"machineId": "ID5",
"temperature": 73.5,
"flowRate": 4.2
}
Semantic
metadata
Data type, runtime name,
semantic type
Frequency, latency,
measurement unit
Format, Protocol
Schema
Quality
Grounding
14
www.streampipes.org | @streampipes | github.com/streampipes
Data Access
Machine-interpretable metadata
Example
temperature
schema.org/temperature
schema.org/degreeCelsius
xsd:float
[0,80]
100
011
010
001
010
010
100
101
000
111
data stream
{ 
"tstamp": 1453478160,
"machineId": "ID5",
"temperature": 73.5,
"flowRate": 4.2
}
Semantic
metadata
15
www.streampipes.org | @streampipes | github.com/streampipes
Data Access
StreamPipes Connect: Architecture
Connect Master
Connect Worker 1 Connect Worker 2 Connect Worker n
MySQL
RESTROS
OPC-UAPLC
MQTT
Messaging
Edge Worker Cloud Worker
…
register
capabilities
16
Demo
Introduction to StreamPipes
Connecting and visualizing flow rate measurements of a multi tank system
Demo
Introduction to StreamPipes
Flow
Sensor
Aggregate
data
VisualizeMQTT
StreamPipes Connect
Connecting and visualizing flow rate measurements of a multi tank system
www.streampipes.org | @streampipes | github.com/streampipes
High-level architecture
Analytics Microservices
Data Sources
Adapter Library
Pipeline Editor
Data Integration
19
Streaming Engine
Analytics microservices
Extensible toolbox
www.streampipes.org | @streampipes | github.com/streampipes
• Extensible toolbox for pre-
processing & analytics
• Semantics-based
consistency checking
• Exchangable run-time
wrappers
• Stateful/stateless
• Inclusion of ML-models
possible
Features
Analytics microservices
Extensible toolbox
21
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
Anatomy of a processing element
Aggregation
Controller
output eventsinput events
Runtime
22
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
How to implement a new processing element
Select Wrapper
Implement
runtime
Describe
controller
Build / Install
Maven
Archetype
StreamPipes
SDK
StreamPipes
SDK
SDK, Docker,
UI
Aggregation
Controller
Runtime
23
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
Runtime Wrapper
Standalone/Edge
Wrapper
Kafka Streams
Wrapper
Python Wrapper
Select
Wrapper
Implement
runtime
Describe
controller
Build /
Install
Aggregation
Controller
output eventsinput events
Runtime
24
Flink Wrapper
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
SDK: Runtime
Select
Wrapper
Implement
runtime
Describe
controller
Build /
Install
Aggregation
Controller
Runtime
25
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
Processing Element Description
User Configuration Output StrategyInput Requirements
Schema, Quality, Protocol,
Format
Text Input, Selections, Domain
Knowledge, …
Keep, Custom, Transform,
Append, …
Semantic Metadata
Select
Wrapper
Implement
runtime
Describe
controller
Build /
Install
Aggregation
Controller
Runtime
26
www.streampipes.org | @streampipes | github.com/streampipes
Analytics microservices
Development: Maven Archetypes & SDK
Select
Wrapper
Implement
runtime
Describe
controller
Build /
Install
Aggregation
Controller
Runtime
27
Input
User Config
Output
www.streampipes.org | @streampipes | github.com/streampipes
Flink Cluster
Aggregation Job
28
Select
Wrapper
Implement
runtime
Describe
controller
Build /
InstallAnalytics microservices
Flink Deployment
Pipeline Management
register start
Controller
output eventsinput events
Runtime
Aggregation
RemoteEnvironment
Upload jar
Submit execution graph
Kafka
Source
Kafka
Sink
Demo
Condition monitoring + StreamPipes
Rule-based monitoring of flow rate measurements in a multi tank system
Demo
Condition monitoring + StreamPipes
Rule-based monitoring of flow rate measurements in a multi tank system
Flow
Sensor
Aggregate
data
Detect
Leakage
Notify
MQTT
IoTDB
StreamPipes Connect
Calculate
Statistics
Lessons Learned & Getting Started
3
www.streampipes.org | @streampipes | github.com/streampipes
Potentially huge stream of sensor data needs scalability
Remote Environment eased the implementation of Flink Wrapper
Clean & intuitive Flink API enables fast processor development
Simple setup for development (mini cluster) and deployment
Easy to configure & monitor
Good integration with Apache Kafka
Flink + StreamPipes
Lessons learned






www.streampipes.org | @streampipes | github.com/streampipes
How to start
Setting up StreamPipes
Docker-based installation
streampipes.org/en/download
Download installer from Github1
./streampipes start2
Finish installation in browser3
33
www.streampipes.org | @streampipes | github.com/streampipes
34
What's next?
Data Access
Data analytics &
harmonization
Data exploration &
exploitation
Metadata recognition
PLC4X
Flink fault tolerance
Python wrapper
AutoML
Historical data
explorer
New features: Current work-in-progress
Infrastructure (Edge / Fog)
Let's connect!
…and if you like StreamPipes, star us on Github 
streampipes.org
docs.streampipes.org
github.com/streampipes/streampipes
twitter.com/streampipes
feedback@streampipes.org

More Related Content

What's hot

INTERFACE, by apidays - C* made easy with Stargate APIs by Kirsten Hunter, D...
INTERFACE, by apidays  - C* made easy with Stargate APIs by Kirsten Hunter, D...INTERFACE, by apidays  - C* made easy with Stargate APIs by Kirsten Hunter, D...
INTERFACE, by apidays - C* made easy with Stargate APIs by Kirsten Hunter, D...
apidays
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
RabbitMQ & Kafka
RabbitMQ & KafkaRabbitMQ & Kafka
RabbitMQ & Kafka
VMware Tanzu
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
Elasticsearch
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
Kai Wähner
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719
Patrik Kleindl
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overview
Cisco DevNet
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
Kai Wähner
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Elasticsearch
 
Rethinking Geo-replication for the Cloud | Luke Knepper, Confluent
Rethinking Geo-replication for the Cloud | Luke Knepper, ConfluentRethinking Geo-replication for the Cloud | Luke Knepper, Confluent
Rethinking Geo-replication for the Cloud | Luke Knepper, Confluent
HostedbyConfluent
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
VMware Tanzu
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
Application performance monitoring with Elastic APM and the ELK stack
Application performance monitoring with Elastic APM and the ELK stackApplication performance monitoring with Elastic APM and the ELK stack
Application performance monitoring with Elastic APM and the ELK stack
Alain Lompo
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Creating an Event Backbone for the Hybrid Cloud
Creating an Event Backbone for the Hybrid CloudCreating an Event Backbone for the Hybrid Cloud
Creating an Event Backbone for the Hybrid Cloud
Solace
 
Flink London meetup 3 March 2016 - Flink basics
Flink London meetup 3 March 2016 - Flink basicsFlink London meetup 3 March 2016 - Flink basics
Flink London meetup 3 March 2016 - Flink basics
Cyrus New
 

What's hot (20)

INTERFACE, by apidays - C* made easy with Stargate APIs by Kirsten Hunter, D...
INTERFACE, by apidays  - C* made easy with Stargate APIs by Kirsten Hunter, D...INTERFACE, by apidays  - C* made easy with Stargate APIs by Kirsten Hunter, D...
INTERFACE, by apidays - C* made easy with Stargate APIs by Kirsten Hunter, D...
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
RabbitMQ & Kafka
RabbitMQ & KafkaRabbitMQ & Kafka
RabbitMQ & Kafka
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overview
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
 
Rethinking Geo-replication for the Cloud | Luke Knepper, Confluent
Rethinking Geo-replication for the Cloud | Luke Knepper, ConfluentRethinking Geo-replication for the Cloud | Luke Knepper, Confluent
Rethinking Geo-replication for the Cloud | Luke Knepper, Confluent
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
 
Application performance monitoring with Elastic APM and the ELK stack
Application performance monitoring with Elastic APM and the ELK stackApplication performance monitoring with Elastic APM and the ELK stack
Application performance monitoring with Elastic APM and the ELK stack
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Creating an Event Backbone for the Hybrid Cloud
Creating an Event Backbone for the Hybrid CloudCreating an Event Backbone for the Hybrid Cloud
Creating an Event Backbone for the Hybrid Cloud
 
Flink London meetup 3 March 2016 - Flink basics
Flink London meetup 3 March 2016 - Flink basicsFlink London meetup 3 March 2016 - Flink basics
Flink London meetup 3 March 2016 - Flink basics
 

Similar to Flink for Everyone: Self-Service Data Analytics with StreamPipes

Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink Forward
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
ratthaslip ranokphanuwat
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Getting started with apache flink streaming api
Getting started with apache flink streaming apiGetting started with apache flink streaming api
Getting started with apache flink streaming api
Preetdeep Kumar
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
Guido Schmutz
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
DataWorks Summit
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Romit Mehta
 

Similar to Flink for Everyone: Self-Service Data Analytics with StreamPipes (20)

Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Getting started with apache flink streaming api
Getting started with apache flink streaming apiGetting started with apache flink streaming api
Getting started with apache flink streaming api
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 

Recently uploaded

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 

Recently uploaded (20)

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 

Flink for Everyone: Self-Service Data Analytics with StreamPipes

  • 1. Flink for Everyone: Self-Service Data Analytics with StreamPipes Patrick Wiener, Philipp Zehnder Flink Forward Europe 2019, Berlin, 2019-10-08
  • 2. www.streampipes.org | @streampipes | github.com/streampipes 2 "A self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams" What's StreamPipes?
  • 3. www.streampipes.org | @streampipes | github.com/streampipes 3 What's StreamPipes? Big Data / Edge InfrastructureExecute Reusable algorithm toolbox Install Model pipelines
  • 4. www.streampipes.org | @streampipes | github.com/streampipes About us 4 Dominik Riemer Senior Research Scientist Philipp Zehnder Research Scientist Patrick Wiener Research Scientist FZI Research Center for Information Technology, Karlsruhe, Germany Stream Processing, Data Management, Machine Learning Non-profit research center for applied ICT research (250 employees) Started StreamPipes in 2014, first OSS release 2018
  • 5. www.streampipes.org | @streampipes | github.com/streampipes Agenda The need for self-service IoT data analytics1 StreamPipes: Technical Overview Demo 2 Lessons Learned w/ Flink & Getting Started3
  • 6. The need for self-service IoT data analytics 1
  • 7. www.streampipes.org | @streampipes | github.com/streampipes Conveyor Belts Pressure Oil temperature Dust particles Production plans Environmental Data Gear box drive Energy consumption Telematics Industrial Internet of Things Data streams everywhere
  • 8. Continuous Monitoring Situational Awareness Continuous Data Harmonization Flexible data integration from heterogeneous sources and monitoring of current system states Detect time-critical situations, e.g., by means of rules or ML approaches Continuous pre- processing and transformation of input streams for third party systems Industrial Internet of Things Typical application scenarios
  • 9. www.streampipes.org | @streampipes | github.com/streampipes StreamPipes Open Source framework to easily manage IoT data Data Access Data analytics & harmonization Data exploration & exploitation Generic adapters Specific adapters Metadata Data streams & sets Pre-processing Filter/Aggregation Pattern Detection ML Situation detection Harmonized data sets Visualizations Third-party systems 9
  • 11. www.streampipes.org | @streampipes | github.com/streampipes High-level architecture Analytics Microservices Data Integration Data Sources Adapter Library Pipeline Editor Streaming Engine 11
  • 12. www.streampipes.org | @streampipes | github.com/streampipes High-level architecture Analytics Microservices Data Integration Data Sources Adapter Library Pipeline Editor Streaming Engine 12
  • 13. Data Access StreamPipes Connect: Easily connect IoT sources
  • 14. www.streampipes.org | @streampipes | github.com/streampipes Data Access Machine-interpretable metadata 100 011 010 001 010 010 100 101 000 111 data stream {  "tstamp": 1453478160, "machineId": "ID5", "temperature": 73.5, "flowRate": 4.2 } Semantic metadata Data type, runtime name, semantic type Frequency, latency, measurement unit Format, Protocol Schema Quality Grounding 14
  • 15. www.streampipes.org | @streampipes | github.com/streampipes Data Access Machine-interpretable metadata Example temperature schema.org/temperature schema.org/degreeCelsius xsd:float [0,80] 100 011 010 001 010 010 100 101 000 111 data stream {  "tstamp": 1453478160, "machineId": "ID5", "temperature": 73.5, "flowRate": 4.2 } Semantic metadata 15
  • 16. www.streampipes.org | @streampipes | github.com/streampipes Data Access StreamPipes Connect: Architecture Connect Master Connect Worker 1 Connect Worker 2 Connect Worker n MySQL RESTROS OPC-UAPLC MQTT Messaging Edge Worker Cloud Worker … register capabilities 16
  • 17. Demo Introduction to StreamPipes Connecting and visualizing flow rate measurements of a multi tank system
  • 18. Demo Introduction to StreamPipes Flow Sensor Aggregate data VisualizeMQTT StreamPipes Connect Connecting and visualizing flow rate measurements of a multi tank system
  • 19. www.streampipes.org | @streampipes | github.com/streampipes High-level architecture Analytics Microservices Data Sources Adapter Library Pipeline Editor Data Integration 19 Streaming Engine
  • 21. www.streampipes.org | @streampipes | github.com/streampipes • Extensible toolbox for pre- processing & analytics • Semantics-based consistency checking • Exchangable run-time wrappers • Stateful/stateless • Inclusion of ML-models possible Features Analytics microservices Extensible toolbox 21
  • 22. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices Anatomy of a processing element Aggregation Controller output eventsinput events Runtime 22
  • 23. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices How to implement a new processing element Select Wrapper Implement runtime Describe controller Build / Install Maven Archetype StreamPipes SDK StreamPipes SDK SDK, Docker, UI Aggregation Controller Runtime 23
  • 24. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices Runtime Wrapper Standalone/Edge Wrapper Kafka Streams Wrapper Python Wrapper Select Wrapper Implement runtime Describe controller Build / Install Aggregation Controller output eventsinput events Runtime 24 Flink Wrapper
  • 25. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices SDK: Runtime Select Wrapper Implement runtime Describe controller Build / Install Aggregation Controller Runtime 25
  • 26. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices Processing Element Description User Configuration Output StrategyInput Requirements Schema, Quality, Protocol, Format Text Input, Selections, Domain Knowledge, … Keep, Custom, Transform, Append, … Semantic Metadata Select Wrapper Implement runtime Describe controller Build / Install Aggregation Controller Runtime 26
  • 27. www.streampipes.org | @streampipes | github.com/streampipes Analytics microservices Development: Maven Archetypes & SDK Select Wrapper Implement runtime Describe controller Build / Install Aggregation Controller Runtime 27 Input User Config Output
  • 28. www.streampipes.org | @streampipes | github.com/streampipes Flink Cluster Aggregation Job 28 Select Wrapper Implement runtime Describe controller Build / InstallAnalytics microservices Flink Deployment Pipeline Management register start Controller output eventsinput events Runtime Aggregation RemoteEnvironment Upload jar Submit execution graph Kafka Source Kafka Sink
  • 29. Demo Condition monitoring + StreamPipes Rule-based monitoring of flow rate measurements in a multi tank system
  • 30. Demo Condition monitoring + StreamPipes Rule-based monitoring of flow rate measurements in a multi tank system Flow Sensor Aggregate data Detect Leakage Notify MQTT IoTDB StreamPipes Connect Calculate Statistics
  • 31. Lessons Learned & Getting Started 3
  • 32. www.streampipes.org | @streampipes | github.com/streampipes Potentially huge stream of sensor data needs scalability Remote Environment eased the implementation of Flink Wrapper Clean & intuitive Flink API enables fast processor development Simple setup for development (mini cluster) and deployment Easy to configure & monitor Good integration with Apache Kafka Flink + StreamPipes Lessons learned      
  • 33. www.streampipes.org | @streampipes | github.com/streampipes How to start Setting up StreamPipes Docker-based installation streampipes.org/en/download Download installer from Github1 ./streampipes start2 Finish installation in browser3 33
  • 34. www.streampipes.org | @streampipes | github.com/streampipes 34 What's next? Data Access Data analytics & harmonization Data exploration & exploitation Metadata recognition PLC4X Flink fault tolerance Python wrapper AutoML Historical data explorer New features: Current work-in-progress Infrastructure (Edge / Fog)
  • 35. Let's connect! …and if you like StreamPipes, star us on Github  streampipes.org docs.streampipes.org github.com/streampipes/streampipes twitter.com/streampipes feedback@streampipes.org