BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Streaming Visualization
Guido Schmutz
DOAG Big Data 2018 – 20.9.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Agenda
1. Visualization in Big Data Reference Architecture
2. How to implement „Data-in-Motion“?
3. Blueprints for Streaming Visualization
4. Blueprints for Stream Visualization – Implementation
Visualization in Big Data Reference
Architecture
Data Value Chain
Milliseconds
• Place Trace
• Serve ad
• Enrich Stream
• Approve Trans
Hundredths of Seconds
• Calculate Risk
• Leaderboard
• Aggregate
• Count
Second(s)
• Retrieve Click
Stream
• Show orders
Minutes
• Backtest algo
• BI
• Daily Reports
Hours
• Algo discovery
• Log analysis
• Fraud pattern match
Architekturen von Big Data Anwendungen
Traditional BI Infrastructures
Enterprise Data
Warehouse
ETL / Stored
Procedures
Bulk Source
DB
Extract
File
DB
Architekturen von Big Data Anwendungen
BI Tools
Search / Explore
Enterprise Apps
Logic
{ }
API
high latency
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Source
Location
Telemetry
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event Stream
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Stream
Event Source
Location
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event
Hub
Event
Hub
Event
Hub
Telemetry
"Data at Rest" vs. "Data in Motion"
Data at Rest Data in Motion
Store
Act
Analyze
StoreAct
Analyze
1110
1010
1010
110
1110
1010
1010
110
Introduction to Stream Processing
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Stream Processing Architecture solves Velocity
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Low(est) latency, no history
Telemetry
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Big Data for all historical data analysis
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Introduction to Stream Processing
Telemetry
Data Store
Integrate existing systems through CDC
Data
Event Hub
Integration
Consuming Systems
StateLogic
CDC
CDC Connector
Traditional Silo-based
System
LogicUser Interface
Capture changes directly on database
Change Data Capture (CDC) => think like
a global database trigger
Transform existing systems to event
producer
Event
Stream
Event
Stream
Introduction to Stream Processing
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Integrate existing systems with lower latency through CDC
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
File Import / SQL Import
Event
Stream
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Telemetry
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Two Types of Stream Processing
(from Gartner)
Introduction to Stream Processing
Stream Data Integration
• primarily focuses on the ingestion and
processing of data sources targeting real-
time extract-transform-load (ETL) and data
integration use cases
• filter and enrich the data
• optionally calculate time-windowed
aggregations before storing the results in a
database or file system
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business through real-time dashboards,
alerts or decision automation
How to implement „Data-in-
Motion“?
”Data-in-Motion” Ecosystem
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
Introduction to Stream Processing
Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Blueprints for Stream Visualization
1) Direct Streaming to the Consumer
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Data
Sources
2) Use a fast datastore and do regular polling from
consumer
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
3) Use stateful Stream Analytics and query directly the
store
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
Blueprints for Stream Visualization
- Impementation
Visualization: many many options! But do they support
Streaming Data?
Oracle Stream Analytics
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Data
Sources
Oracle Stream Analytics
• Stream Analytics and Visualization in
one
• offers real-time actionable business
insight on streaming data
• automates action to drive today’s agile
businesses (business user)
• Runs on top of Spark Streaming
• Cloud and on-premises
• Data Sources: Kafka, JMS, GoldenGate,
File
Web Sockets / SSE / Custom Java Script Application
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Sever Sent Event (SSE)
Slack / WhatsApp / Twitter / …
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
WebSockets vs. Server Sent Events (SSE)
WebSockets
• provide a richer protocol to perform bi-
directional, full-duplex communication
• require full-duplex connections and
new Web Socket servers to handle the
protocol
• Having a two-way channel is more
attractive for things like games,
messaging apps, and for cases where
you need near real-time updates in
both directions
SSE
• SSEs are sent over traditional HTTP
• do not require a special protocol or
server implementation to get working
• If only one direction is necessary,
• Server-Sent Events on the other hand,
have been designed from the ground
up to be efficient
KSQL / REST API / Custom App
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
KSQL & Arcadia Data
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
Arcadia Data
• Combines Batch and Streaming
Visualization in one
• Streaming Visualizations based on
Confluent KSQL (Kafka)
• Acadia Instant and Arcadia Enterprise
Druid & Superset / Imply
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
What is Druid?
• Open Source Time Series DB by
Metamarkets
• Apache Incubating
• Column-Oriented Storage
• Streaming and Batch Ingest
• Time optimized partitioning
• SQL Support
• Deep Storage can be HDFS / S3
Imply
• Commercial offering of Druid
• Built around Apache Druid
• Analytics, search and intelligence for
event-driven data
Superset
• Open source data visualization tool by
Airbnb
• Apache incubator
• Superset supports 30 types of
visualizations
• easy-to-use interface for exploring and
visualizing data
• Create and share dashboards
• Deep integration with Druid
• Integration with most SQL-speaking
RDBMS through SQLAlchemy
Elasticsearch / Kibana
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
Elasticsearch / Kibana
Elasticsearch
• NoSQL store
• a distributed, RESTful search and analytics
engine
• centrally stores your data so you can
discover the expected and uncover the
unexpected
• lets you perform and combine many types
of searches — structured, unstructured,
geo, metric
• aggregations let you zoom out to explore
trends and patterns in your data
Kibana
• Window into Elasticsearch
• Enables visual exploration and analysis of
data stored in Elasticsearch
InfluxDB / Grafana or Chronograf
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
InfluxDB
InfluxDB
• Popular Time Series Database
• Open source as well as Commercial offering
Chronograf
Grafana
Grafana allows to query, visualize, alert
and understand metrics independent of
their storage
Supports various datasources
• Elasticsearch
• InfluxDB
• Prometheus
• OpenTSDB
• MySQL
• …
Technology on its own won't help you.
You need to know how to use it properly.

Streaming Visualization

  • 1.
    BASEL BERN BRUGGDÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Streaming Visualization Guido Schmutz DOAG Big Data 2018 – 20.9.2018 @gschmutz guidoschmutz.wordpress.com
  • 2.
    Guido Schmutz Working atTrivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3.
    Agenda 1. Visualization inBig Data Reference Architecture 2. How to implement „Data-in-Motion“? 3. Blueprints for Streaming Visualization 4. Blueprints for Stream Visualization – Implementation
  • 4.
    Visualization in BigData Reference Architecture
  • 5.
    Data Value Chain Milliseconds •Place Trace • Serve ad • Enrich Stream • Approve Trans Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count Second(s) • Retrieve Click Stream • Show orders Minutes • Backtest algo • BI • Daily Reports Hours • Algo discovery • Log analysis • Fraud pattern match Architekturen von Big Data Anwendungen
  • 6.
    Traditional BI Infrastructures EnterpriseData Warehouse ETL / Stored Procedures Bulk Source DB Extract File DB Architekturen von Big Data Anwendungen BI Tools Search / Explore Enterprise Apps Logic { } API high latency
  • 7.
    Bulk Source Hadoop Clusterd HadoopCluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing
  • 8.
    Bulk Source Hadoop Clusterd HadoopCluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Stream
  • 9.
    Bulk Source Hadoop Clusterd HadoopCluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Hub Event Hub Event Hub Telemetry
  • 10.
    "Data at Rest"vs. "Data in Motion" Data at Rest Data in Motion Store Act Analyze StoreAct Analyze 1110 1010 1010 110 1110 1010 1010 110 Introduction to Stream Processing
  • 11.
    Event Hub Event Hub Hadoop Clusterd Hadoop Cluster StreamAnalytics Platform Stream Processing Architecture solves Velocity BI Tools Enterprise Data Warehouse Event Hub Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Low(est) latency, no history Telemetry
  • 12.
    Hadoop Clusterd Hadoop Cluster StreamAnalytics Platform Big Data for all historical data analysis BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Introduction to Stream Processing Telemetry
  • 13.
    Data Store Integrate existingsystems through CDC Data Event Hub Integration Consuming Systems StateLogic CDC CDC Connector Traditional Silo-based System LogicUser Interface Capture changes directly on database Change Data Capture (CDC) => think like a global database trigger Transform existing systems to event producer Event Stream Event Stream Introduction to Stream Processing
  • 14.
    Hadoop Clusterd Hadoop Cluster StreamAnalytics Platform Integrate existing systems with lower latency through CDC BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Telemetry
  • 15.
    Hadoop Clusterd Hadoop Cluster BigData Unified Architecture for Modern Data Analytics Solutions SQL Search BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API Event Stream Event Stream Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 16.
    Two Types ofStream Processing (from Gartner) Introduction to Stream Processing Stream Data Integration • primarily focuses on the ingestion and processing of data sources targeting real- time extract-transform-load (ETL) and data integration use cases • filter and enrich the data • optionally calculate time-windowed aggregations before storing the results in a database or file system Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business through real-time dashboards, alerts or decision automation
  • 17.
    How to implement„Data-in- Motion“?
  • 18.
    ”Data-in-Motion” Ecosystem Stream Analytics EventHub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge Introduction to Stream Processing
  • 19.
    Apache Kafka –A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 20.
    Blueprints for StreamVisualization
  • 21.
    1) Direct Streamingto the Consumer ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  • 22.
    2) Use afast datastore and do regular polling from consumer ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 23.
    3) Use statefulStream Analytics and query directly the store ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 24.
    Blueprints for StreamVisualization - Impementation
  • 25.
    Visualization: many manyoptions! But do they support Streaming Data?
  • 26.
    Oracle Stream Analytics ”Datain Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  • 27.
    Oracle Stream Analytics •Stream Analytics and Visualization in one • offers real-time actionable business insight on streaming data • automates action to drive today’s agile businesses (business user) • Runs on top of Spark Streaming • Cloud and on-premises • Data Sources: Kafka, JMS, GoldenGate, File
  • 28.
    Web Sockets /SSE / Custom Java Script Application ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Sever Sent Event (SSE)
  • 29.
    Slack / WhatsApp/ Twitter / … ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow
  • 30.
    WebSockets vs. ServerSent Events (SSE) WebSockets • provide a richer protocol to perform bi- directional, full-duplex communication • require full-duplex connections and new Web Socket servers to handle the protocol • Having a two-way channel is more attractive for things like games, messaging apps, and for cases where you need near real-time updates in both directions SSE • SSEs are sent over traditional HTTP • do not require a special protocol or server implementation to get working • If only one direction is necessary, • Server-Sent Events on the other hand, have been designed from the ground up to be efficient
  • 31.
    KSQL / RESTAPI / Custom App ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 32.
    KSQL & ArcadiaData ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 33.
    Arcadia Data • CombinesBatch and Streaming Visualization in one • Streaming Visualizations based on Confluent KSQL (Kafka) • Acadia Instant and Arcadia Enterprise
  • 34.
    Druid & Superset/ Imply ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 35.
    What is Druid? •Open Source Time Series DB by Metamarkets • Apache Incubating • Column-Oriented Storage • Streaming and Batch Ingest • Time optimized partitioning • SQL Support • Deep Storage can be HDFS / S3
  • 36.
    Imply • Commercial offeringof Druid • Built around Apache Druid • Analytics, search and intelligence for event-driven data
  • 37.
    Superset • Open sourcedata visualization tool by Airbnb • Apache incubator • Superset supports 30 types of visualizations • easy-to-use interface for exploring and visualizing data • Create and share dashboards • Deep integration with Druid • Integration with most SQL-speaking RDBMS through SQLAlchemy
  • 38.
    Elasticsearch / Kibana ”Datain Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 39.
    Elasticsearch / Kibana Elasticsearch •NoSQL store • a distributed, RESTful search and analytics engine • centrally stores your data so you can discover the expected and uncover the unexpected • lets you perform and combine many types of searches — structured, unstructured, geo, metric • aggregations let you zoom out to explore trends and patterns in your data Kibana • Window into Elasticsearch • Enables visual exploration and analysis of data stored in Elasticsearch
  • 40.
    InfluxDB / Grafanaor Chronograf ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 41.
    InfluxDB InfluxDB • Popular TimeSeries Database • Open source as well as Commercial offering Chronograf
  • 42.
    Grafana Grafana allows toquery, visualize, alert and understand metrics independent of their storage Supports various datasources • Elasticsearch • InfluxDB • Prometheus • OpenTSDB • MySQL • …
  • 43.
    Technology on itsown won't help you. You need to know how to use it properly.