The document discusses three blueprints for streaming visualization:
1. Using a fast datastore with regular polling from consumers, which introduces some delay but allows using data stores' full capabilities. Example technologies are Elasticsearch/Kibana and InfluxDB/Grafana.
2. Directly streaming data to consumers with minimal latency but more complex client-side processing. Examples are Kafka Connect to Slack and WebSockets/SSE apps.
3. Streaming SQL results to consumers, providing SQL query capabilities with minimal latency but limiting historical data access. KSQL and Spark Streaming are discussed.
2. Guido Schmutz
Working at Trivadis for more than 22 years
Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data
Oracle Groundbreaker Ambassador & Oracle ACE Director
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
167th edition
3. Agenda
1. Motivation / Introduction
2. Stream Data Integration & Stream Analytics Ecosystem
3. Three Blueprints for Streaming Visualization
End-to-End Demo available here:
https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
6. Keep the data in motion …
Data at Rest Data in Motion
Store
(Re)Act
Visualize/
Analyze
StoreAct
Analyze
11101
01010
10110
11101
01010
10110
vs.
Visualize
7. Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
HubData
Flow
Data
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
8. Two Types of Stream Processing
(by Gartner)
Stream Data Integration
• focuses on the ingestion and processing of
data sources targeting real-time extract-
transform-load (ETL) and data integration
use cases
• filter and enrich the data
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
15. Demo: KSQL for Streaming ETL
CREATE STREAM tweet_s
WITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS
SELECT id , createdAt , text , user->screenName
FROM tweet_raw_s;
CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1',
VALUE_FORMAT='AVRO');
SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word
FROM tweet_raw_s
WHERE lang = 'en' or lang = 'de';
SELECT id, LCASE(hashtagentities[0]->text)
FROM tweet_raw_s
WHERE hashtagentities[0] IS NOT NULL;
16. Demo using Kafka Stack for Stream Data Integration
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data Flow
??
Filter: #voxxeddaysbanff,#java,#kafka,….
User: @VoxxedDaysBanff, @gschmutz
19. BP1: Fast datastore with regular polling from
consumer
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
20. BP1-1: Elasticsearch / Kibana
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
SOLR & Banana
21. BP1-2: InfluxDB / Grafana or Chronograf
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Prometheus & Grafana
Druid & Superset
22. BP1-3: NoSQL & Custom Web
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
23. BP-1: Demo Redis NoSQL & Custom Web
https://opensky-network.org/
24. BP1-4: Kafka Streams Interactive Query & Custom App
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Flink
…
25. BP2: Direct Streaming to the Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
26. BP2-1: Kafka Connect to Slack / WhatsApp
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Twitter
SMS
…
28. BP2-2: Kafka to Tipboard (Dashboard Solution)
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Dashing
Geckoboard
…
29. BP2-2: Demo Kafka to Tipboard (Dashboard Solution)
http://allegro.tech/tipboard/
31. BP2-3: Web Sockets / SSE & Custom Modern Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Sever Sent Event (SSE)
32. BP3: Streaming SQL Result to Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
33. BP3-1: KSQL and Arcadia Data
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
35. BP3-2: KSQL with REST API to Custom Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
36. BP3-2: Demo KSQL with REST API
curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’
-i http://analyticsplatform:8088/query --data '{
"ksql": "SELECT text FROM tweet_raw_s;",
"streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }
}'
{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to
@Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy nn#SaturdayThoughts
#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics
#…"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her
while you can! @ARUKscientist @S_Bauermeister #bigdata #ARUKConf
https://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["Blockchain Competitive Innovation
Advantage"]},"errorMessage":null,"finalMessage":null}
37. BP3-3: Spark Streaming & Oracle Stream Analytics
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
39. Summary
BP1: Fast Store & Polling
• “classic” pattern
• Not end-to-end “data-in-
motion” -> “Data-at-rest”
before visualization
• Slight delay might not be
acceptable for monitoring
dashboard
• Can use full power of data
store(s) => NoSQL
• In-memory reduces overhead
BP2: Stream to Consumer
• minimal latency
• More difficult on “client side”
• good if stream holds directly
what should be displayed
• More difficult if data in
stream needs to be analyzed
before visualization
• No historical info available
BP3: Streaming SQL
• Minimal latency
• Power of SQL query engine
available for visualization
• possibility for “self-service”
style visualization
• Some analytics are more
difficult on streaming data
• No historical info available