This talk presents the web application that calculates real-time health scores at a very rapid speed using Spark on Kubernates. A health score represents a machine’s lifetime and it is commonly used as a landmark for making a decision on whether to replace the machine with new one for high productivity maintenance. Therefore, it is very important to observe the health scores of the large number of machines in a factory without a delay. To cope with this issue, the BISTel has applied the stream processing using Spark and services the real-time health score application.
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Real-time health score app using Spark on Kubernetes
1. Real-time health score application
using Spark on Kubernetes
Daeyoung Kim - BISTel Research
Seungchul Lee - BISTel Research
2. Agenda
Introduction to BISTel and GrandView APM
Real-time health score application
▪ What is a health score?
▪ Real-time streaming service
▪ Spark on Kubernetes
Conclusions
9. What is a health score in smart manufacturing?
▪ A health score represents a machine’s status by analyzing multiple sensor
data records
▪ It can be used to be a core metric of the prognostics and health
management (PHM) system for predicting machine’s lifespan.
▪ Various machine learning algorithms can be used to compute a health
score in manufacturing industry.
10. Defect Point based on Asset Health Score
Algorithm
+
deep asset
knowledge
Defect Identified Monotonically
increasing section
Source : XenonStack
https://www.xenonstack.com/blog/log-analytics-deep-machine-learning/
12. Data flow: Real-time health score application
Unbounded sensor
data from Kafka Main data stream
Interactively
Monitoring status
Event stream
- Train models offline
- Model change ETL into time series storage
- Prevent data loss
- Be able to query for need
- Summarizing statistics
- Anomaly detection
- Aggregating data records on demand
13. Stateful Operation - UpdateStateByKey
▪ Model context should be cached while
an application is maintained
▪ Know nothing about the previous
records on DStreams of key-value pairs
▪ UpdateStateByKey can maintain state
across mini batches even if there is no
data input afterwards.
modelPairStream
.UpdateStateByKey(modelStateFunc)
.join(tracePairStream)
Function2<List<V>, Optional<S>, Optional<S>>
modelStateFunc = (v, s) -> {
// update or remove logics
// return value
}
14. Stateful Streaming for Operating Models
State
UpdateStateByKey
Batch 1
RDD @ t
Batch 2
RDD @ t+1
State
State
Batch 3
RDD @ t+2
State
State
Batch 4
RDD @ t+3
State
State
Event DStream
Main DStream
15. Problem with updateStateByKey
Big data for predictive maintenance
▪ The number of assets are greatly increasing with predictive maintenance
powered by the Internet of Things (IOT).
Performance
▪ The UpdateStateByKey is invoked on every key in Spark Streaming.
▪ This can affect performance degrading when dealing with a large amount
of state.
16. Almost empty batches in model stream
▪ Contrast to the mainstream, the model stream is always resting unless
model change occurs.
▪ fullOuterJoin + MapWithState
t
t+1
t+2
assetId values assetId models
assetId values
assetId values
assetId values model
assetId values
assetId values
Joined
Stream
absent
absent
assetId models
- State -
18. Is standalone mode sufficient?
Case1 (very common case)
# of assets : up to 10
# of parameters : up to 10
Case2 (Big data analytics)
# of assets : 1,000,000
# of parameters : 10,000
20. Consideration Points in multi clusters
Communication between workers
▪ Needs to shuffle data over the networks
▪ No broadcast operation for small data in Dstream.
▪ Join or .groupByKey() – Need to think before use them
Are the sensor data records is easily split across the worker nodes?
▪ Time sequence is important to predict failure of the machines
▪ Watermarks to discard the late sensor data records
27. Acknowledgements
▪ This work was supported by the ICT
R&D program oh MSIP/IITP
[2020(2020-0-00358),
Development of Knowledge & AI
based decision support system for
Manufacturing full automation]