Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Pranav Prakash, Quartic.ai
Application and challenges of streaming
analytics and machine learning on multi-variate
time series data for smart manufacturing
#UnifiedDataAnalytics #SparkAISummit

Pranav Prakash
• Co-Founder, VP Engineering at
Quartic.ai
• Ex- LinkedIn SlideShare
• Passionate about
– A.I., Computer Vision, 3D
Printing
– Music, Caffeine
3

What
you’ll
learn in
next 40
mins
4
A cool startup
solving some real-
life use cases
Downtime
Reduction use
case of a critical
asset in Pharma
world
•And a “secret” to
solve such problems
Challenges in
Industrial Stream
Processing
Spark specific stuff
that we learned

We enable Industry 4.0
• AI powered smart manufacturing platform
• Processing Billions of sensor data every
day
• Work with top Pharma companies on
multiple use cases
• Team of 22 techies including Engineers &
Data Scientists + 4 Domain Veterans
#UnifiedDataAnalytics #SparkAISummit 5

We started by
building
solutions for
pharmaceutical
manufacturing
And created a
DIY platform
• Increased uptime of sterilization autoclave by 7 days
• Increased yield of protein from fermentation process
• Incubated egg harvester – increase uptime during
critical flu season
• Cold-chain monitoring for pharma refrigeration –
reduced downtime and waste
• Predictive health monitoring of air handlers for clean
rooms in pharma
• Enable continuous validation of biologic production
process
• Medical Device Assembly – reduce recalls caused by
poor quality.

Case study – an Intelligent
Asset Health Monitoring system
for an Industrial Autoclave
• Mission - Improve the
reliability of a complex asset.
• Details - 13 differentmodes
(cycles)
• Runs 24/7
• CriticalAsset

Equipment
Reliability
• Capture process, condition data
• Establish baseline and measure
deviations
• Forecast the future
• Classify errors early
• “Advisory Mode” AI

SCADA = Supervisory Control and Data Acquisition
PLC = Programable Logic Controller

System
Design
Params
• Data
– Speed: 10ms – 2 hours
– Volume: Couple 1,000s sensors per
asset. 10,000s of asset per
enterprise
– Data Type: String, Numeric,
Boolean, Array
– Timeseries, Discrete

System
Design
Params
• Deployment
– Edge (80%)
• Hardware Limit
• Many cloud-only solutions won’t
work
• High Uptime, Low Response
Time
– Cloud (20%)

System
Design
Params
• Use Cases
– Automatic Model Param Tuning,
Model Training
– 1000s of ML Models Deployment
– Complex Event Processing (CEP)
– Statistical & Analytical Processing
• Rule Recommendation
• Near Real Time Stream Processing

Challenges
• ML
– Multiple granularities
– Late Data Arrival
– Model Deployment on a
heterogenous data stream
– Flash Flood of Data

Multiple Granularities
15
TS Sensor A Sensor B
12:03:01.198
12:03:02.283
12:03:03.316
12:03:04.572
12:03:05.283
12:03:06.342
TS Sensor C Sensor D
12:03:01.230
12:03:06.233
12:03:11.316
12:03:16.520
12:03:21.283
- Both belong to same “Asset”
- Target Feature – C/D or A/B
Poll Frequency = 1s Poll Frequency = 5s

Multiple
Granularities
• Approximation (Roundoff)
• Aggregation
• Filling - Forward or Backward or
Average

Late Data Arrival
17
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking

Late Data
Arrival
• Watermarking
– Homogenous stream: One
watermark per Stream
– Heterogenous stream: multiple
watermark per “Usage
Condition”

- Watermarking time automatically
and dynamically chosen
- Data later than threshold is
discarded

Flash Flood
of Data
• Backpressure enabled
• Allows Ingestion rate to be chosen dynamically and
automatically
• PID Controller
22

Complex
Event
Processing
• Insights
– PySpark + yahoo/graphkit
• Rules
– Scala Spark + drools

Summing
it up
• Industrial IoT is different
• Context = Process Data + Condition Data
• Techniques for processing heterogenous
stream

We’re hiring
2
5
helloworld@quartic.ai

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

Similar to Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing