Event Streaming Architecture
for Industry 4.0
Abdelkrim Hadjidj - Sr. Data Streaming Specialist
Jan Kunigk - Principal Architect & Field CTO EMEA
© 2019 Cloudera, Inc. All rights reserved. 2
The Industry 4.0 economics
Source: American Society of Quality
Cost of Quality
20 Percent of Sales
Source: Deloitte
Plant Downtime Costs
$50 Billion Per Year
Source: AlixPartners
Quality & Recall Costs
$22B Recall Costs (US/16)
Source: Nielsen Research
Stopped production cost
$22,000 Per Minute
Source: McKinsey
Big Data, Streaming
IOT-Enabled Analytics
10% - 20% cost of quality
reduction
Source: Deloitte
5% - 20% equipment cost
reduction
© 2019 Cloudera, Inc. All rights reserved. 3
Key Industry 4.0 Use Cases
•Harmonization of screw
tightening in all plants
•Re-calibrate manufacturing
robots
•Saving in-fab sensor points
forever (batch data)
Process 360
Process
Monitoring
Predictive
Maintenance
Quality Event
Forensic Analysis
Quality & Yield
Optimization
•Optimize concentration of
cutting fluids
Use Case Examples
•Single-point access to critical
information and control
•Reduce, downtime, scrap and
late shipment costs
•Reduce equipment downtime
and maintenance costs
•Reduce scope of service
campaigns and warranty costs
•Optimize process variables to
improve yields and quality
Benefits
•Cycle time monitoring of CNC
machines
© 2019 Cloudera, Inc. All rights reserved. 4
DATA-IN-MOTION REFERENCE ARCHITECTURE
MiNiFi Apache Kafka Apache NiFi Apache Kafka Apache Flink
DATA SYNDICATION
SERVICE BY KAFKA
Kafka Topic
syndicate-
transmission
Kafka Topic
syndicate-
speed
Kafka Topic
syndicate-
temp
Kafka Topic
syndicate-
geo
DATA COLLECTION
AT THE EDGE
C++ agent
US-West Fleet
C++ agent
US-Central Fleet
C++ agent
US-East Fleet
INGEST GATEWAY
POWERED BY KAFKA
gateway-west-
raw-sensors
gateway-central-
raw-sensors
gateway-east-
raw-sensors
DATA FLOW APPS
POWERED BY NIFI
Kafka Topic
syndicate-
battery
Kafka Topic
syndicate-
start/stop
Kafka Topic
syndicate-
acceleration
Kafka Topic
syndicate-
idle
SUBSCRIBING STREAM
PROCESSING APPS
PROCESSING APP 1
PROCESSING APP 2
PROCESSING APP 3
Apache
Flink
Structured
Streaming
Kafka Topic
syndicate-
oil
Kafka Topic
syndicate-
breaks
© 2019 Cloudera, Inc. All rights reserved. 5
Two main challenges
Traditional solutions are local and limited
● Single process focused analysis
○ Analytics performed at each plant, refinery..
○ Missed opportunity to detect globally
connected (eg. quality, optimizations)
● No condition-based analysis
○ Existing edge analytics check if a sensor is
within control (quasi hardcoded thresholds)
○ Sensor data should be correlated to
reference data
(productivity data, maintenance schedule)
● Simplicity and Manageability
○ No tooling = embedded coding nightmare
Many data sources increase complexity
● Data from different time universes
○ Alignment of timeseries and timestamped
data (Sensor to Quality inspection)
○ Network conditions makes time
management even harder (late arrivals)
● Real time analytics & CEP at scale
○ Trends and aggregates are more
meaningful than events
● Prediction and RCA requires AA/ML
○ High data volume/cardinality
○ Data calibration is a requirement for ML/AI
© 2019 Cloudera, Inc. All rights reserved. 6
Apache NiFi
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Full data provenance from acquisition to
delivery
• Diverse, Non-Traditional Sources
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
© 2019 Cloudera, Inc. All rights reserved. 7
How NiFi can help?
Large set of OOTB connectors
MQTT, OPC-UA, AMQP, ..
S3, ADLS, PubSub, ..
FTP, JDBC, NoSQL, Search, ..
UI based fast development
Salable distributed system
MiNiFi agents
Java / C++ lightweight agents
Edge collection (NiFi connectors)
Edge processing (Filtering,
compression, encryption, etc)
C&C: Central command and control
Security
Hub & Spoke architecture
From Edge to Cloud/DC
Site to Site protocol (S2S)
Backpressure, Latency, Throughput,
queuing
End-to-End lineage and security
© 2019 Cloudera, Inc. All rights reserved. 8
How Flink can help?
Time Management
IoT network challenges
Event time management
Late arrival management
State Management
Enrichment
Combining knowledge
State of components
Performance at scale
Industrial Internet to grow 2X faster
than any other data
Data preparation
Filtering, enriching, aggregation,
joining
© 2019 Cloudera, Inc. All rights reserved. 9
NiFi-Flink integration
• Direct integration via Site to Site
• Simple, NiFi is just a Flink source
• How to handle data spike or Flink
outage. NiFi is not a data store!
• Point to point: what if the same data
is needed by several Flink apps
• Native integration via Kafka API
• Requires installing/managing
another distributed system
• Kafka retention can save your life
with data spikes or Flink outage
• Easy to build pipelines with several
steps and intermediate topics
© 2019 Cloudera, Inc. All rights reserved. 10
INTELLIGENT EDGE
Event Streaming Edge2AI Architecture for Industry 4.0
Analyze
•Self-Service Business
Intelligence (BI)
•Enterprise Analytics
Learn
• Historical sensor data
• Historical maintenance records
• Historical usage characteristics
• Historical failures
Model Inputs
Enterprise Transaction
Data
MES, ERP, Maintenance, Supply
Chain, Warranty, Design, etc.
E
N
R
I
C
H
Edge Collection/Analytics
Transmission
Connected Process/Plant 1
Sensors
PLCs
Historians
SCADAs
Connected Process/Plant N
Sensors
PLCs
Historians
SCADAs
Feedback
REAL-TIME
ACTION
ACT CDSW
Standard plants solutions
Edge to Cloud
© 2019 Cloudera, Inc. All rights reserved. 11
End to End pipeline
Plant 2
Plant 1
Plant 3
Enterprise
sources
IoT
Errors
Aggregates
Alertes
Other data
ETL
Analytics
Cross Plants Enterprise Analysis Real Time Analytics
Complexity Reduction
© 2019 Cloudera, Inc. All rights reserved. 12
Why is it important?
And what does it have to do with ML?
item item itemitem item item
Demo
© 2019 Cloudera, Inc. All rights reserved. 14
Standards solutions and EdgeToAI architecture
Characteristic Standard solutions Cloudera
Corporate Positioning Real-Time Analytics in the Factory Streaming Platform for Enterprise
Analytics
Market Position IOT Platform Event and Streaming Management
Platform
Analytics Scope Edge Enterprise and cross-factory
Data Ingestion Factory Edge: Specialized for
Machine data
From Edge to Cloud/Data Center,
Enterprise flow management
(MiNiFi/NiFi)
Data Storage None Enterprise Data Lake
Data Processing Edge Batch and Real Time, with advanced
time and state management capabilities
15
Conclusion
© 2019 Cloudera, Inc. All rights reserved. 16
TH N Y U

Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kunigk, Cloudera

  • 1.
    Event Streaming Architecture forIndustry 4.0 Abdelkrim Hadjidj - Sr. Data Streaming Specialist Jan Kunigk - Principal Architect & Field CTO EMEA
  • 2.
    © 2019 Cloudera,Inc. All rights reserved. 2 The Industry 4.0 economics Source: American Society of Quality Cost of Quality 20 Percent of Sales Source: Deloitte Plant Downtime Costs $50 Billion Per Year Source: AlixPartners Quality & Recall Costs $22B Recall Costs (US/16) Source: Nielsen Research Stopped production cost $22,000 Per Minute Source: McKinsey Big Data, Streaming IOT-Enabled Analytics 10% - 20% cost of quality reduction Source: Deloitte 5% - 20% equipment cost reduction
  • 3.
    © 2019 Cloudera,Inc. All rights reserved. 3 Key Industry 4.0 Use Cases •Harmonization of screw tightening in all plants •Re-calibrate manufacturing robots •Saving in-fab sensor points forever (batch data) Process 360 Process Monitoring Predictive Maintenance Quality Event Forensic Analysis Quality & Yield Optimization •Optimize concentration of cutting fluids Use Case Examples •Single-point access to critical information and control •Reduce, downtime, scrap and late shipment costs •Reduce equipment downtime and maintenance costs •Reduce scope of service campaigns and warranty costs •Optimize process variables to improve yields and quality Benefits •Cycle time monitoring of CNC machines
  • 4.
    © 2019 Cloudera,Inc. All rights reserved. 4 DATA-IN-MOTION REFERENCE ARCHITECTURE MiNiFi Apache Kafka Apache NiFi Apache Kafka Apache Flink DATA SYNDICATION SERVICE BY KAFKA Kafka Topic syndicate- transmission Kafka Topic syndicate- speed Kafka Topic syndicate- temp Kafka Topic syndicate- geo DATA COLLECTION AT THE EDGE C++ agent US-West Fleet C++ agent US-Central Fleet C++ agent US-East Fleet INGEST GATEWAY POWERED BY KAFKA gateway-west- raw-sensors gateway-central- raw-sensors gateway-east- raw-sensors DATA FLOW APPS POWERED BY NIFI Kafka Topic syndicate- battery Kafka Topic syndicate- start/stop Kafka Topic syndicate- acceleration Kafka Topic syndicate- idle SUBSCRIBING STREAM PROCESSING APPS PROCESSING APP 1 PROCESSING APP 2 PROCESSING APP 3 Apache Flink Structured Streaming Kafka Topic syndicate- oil Kafka Topic syndicate- breaks
  • 5.
    © 2019 Cloudera,Inc. All rights reserved. 5 Two main challenges Traditional solutions are local and limited ● Single process focused analysis ○ Analytics performed at each plant, refinery.. ○ Missed opportunity to detect globally connected (eg. quality, optimizations) ● No condition-based analysis ○ Existing edge analytics check if a sensor is within control (quasi hardcoded thresholds) ○ Sensor data should be correlated to reference data (productivity data, maintenance schedule) ● Simplicity and Manageability ○ No tooling = embedded coding nightmare Many data sources increase complexity ● Data from different time universes ○ Alignment of timeseries and timestamped data (Sensor to Quality inspection) ○ Network conditions makes time management even harder (late arrivals) ● Real time analytics & CEP at scale ○ Trends and aggregates are more meaningful than events ● Prediction and RCA requires AA/ML ○ High data volume/cardinality ○ Data calibration is a requirement for ML/AI
  • 6.
    © 2019 Cloudera,Inc. All rights reserved. 6 Apache NiFi • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 7.
    © 2019 Cloudera,Inc. All rights reserved. 7 How NiFi can help? Large set of OOTB connectors MQTT, OPC-UA, AMQP, .. S3, ADLS, PubSub, .. FTP, JDBC, NoSQL, Search, .. UI based fast development Salable distributed system MiNiFi agents Java / C++ lightweight agents Edge collection (NiFi connectors) Edge processing (Filtering, compression, encryption, etc) C&C: Central command and control Security Hub & Spoke architecture From Edge to Cloud/DC Site to Site protocol (S2S) Backpressure, Latency, Throughput, queuing End-to-End lineage and security
  • 8.
    © 2019 Cloudera,Inc. All rights reserved. 8 How Flink can help? Time Management IoT network challenges Event time management Late arrival management State Management Enrichment Combining knowledge State of components Performance at scale Industrial Internet to grow 2X faster than any other data Data preparation Filtering, enriching, aggregation, joining
  • 9.
    © 2019 Cloudera,Inc. All rights reserved. 9 NiFi-Flink integration • Direct integration via Site to Site • Simple, NiFi is just a Flink source • How to handle data spike or Flink outage. NiFi is not a data store! • Point to point: what if the same data is needed by several Flink apps • Native integration via Kafka API • Requires installing/managing another distributed system • Kafka retention can save your life with data spikes or Flink outage • Easy to build pipelines with several steps and intermediate topics
  • 10.
    © 2019 Cloudera,Inc. All rights reserved. 10 INTELLIGENT EDGE Event Streaming Edge2AI Architecture for Industry 4.0 Analyze •Self-Service Business Intelligence (BI) •Enterprise Analytics Learn • Historical sensor data • Historical maintenance records • Historical usage characteristics • Historical failures Model Inputs Enterprise Transaction Data MES, ERP, Maintenance, Supply Chain, Warranty, Design, etc. E N R I C H Edge Collection/Analytics Transmission Connected Process/Plant 1 Sensors PLCs Historians SCADAs Connected Process/Plant N Sensors PLCs Historians SCADAs Feedback REAL-TIME ACTION ACT CDSW Standard plants solutions Edge to Cloud
  • 11.
    © 2019 Cloudera,Inc. All rights reserved. 11 End to End pipeline Plant 2 Plant 1 Plant 3 Enterprise sources IoT Errors Aggregates Alertes Other data ETL Analytics Cross Plants Enterprise Analysis Real Time Analytics Complexity Reduction
  • 12.
    © 2019 Cloudera,Inc. All rights reserved. 12 Why is it important? And what does it have to do with ML? item item itemitem item item
  • 13.
  • 14.
    © 2019 Cloudera,Inc. All rights reserved. 14 Standards solutions and EdgeToAI architecture Characteristic Standard solutions Cloudera Corporate Positioning Real-Time Analytics in the Factory Streaming Platform for Enterprise Analytics Market Position IOT Platform Event and Streaming Management Platform Analytics Scope Edge Enterprise and cross-factory Data Ingestion Factory Edge: Specialized for Machine data From Edge to Cloud/Data Center, Enterprise flow management (MiNiFi/NiFi) Data Storage None Enterprise Data Lake Data Processing Edge Batch and Real Time, with advanced time and state management capabilities
  • 15.
  • 16.
    © 2019 Cloudera,Inc. All rights reserved. 16 TH N Y U