Successfully reported this slideshow.
Your SlideShare is downloading. ×

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 26 Ad

More Related Content

Slideshows for you (20)

Similar to Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Pranav Prakash, Quartic.ai Application and challenges of streaming analytics and machine learning on multi-variate time series data for smart manufacturing #UnifiedDataAnalytics #SparkAISummit
  3. 3. Pranav Prakash • Co-Founder, VP Engineering at Quartic.ai • Ex- LinkedIn SlideShare • Passionate about – A.I., Computer Vision, 3D Printing – Music, Caffeine 3
  4. 4. What you’ll learn in next 40 mins 4 A cool startup solving some real- life use cases Downtime Reduction use case of a critical asset in Pharma world •And a “secret” to solve such problems Challenges in Industrial Stream Processing Spark specific stuff that we learned
  5. 5. We enable Industry 4.0 • AI powered smart manufacturing platform • Processing Billions of sensor data every day • Work with top Pharma companies on multiple use cases • Team of 22 techies including Engineers & Data Scientists + 4 Domain Veterans #UnifiedDataAnalytics #SparkAISummit 5
  6. 6. 6
  7. 7. We started by building solutions for pharmaceutical manufacturing And created a DIY platform • Increased uptime of sterilization autoclave by 7 days • Increased yield of protein from fermentation process • Incubated egg harvester – increase uptime during critical flu season • Cold-chain monitoring for pharma refrigeration – reduced downtime and waste • Predictive health monitoring of air handlers for clean rooms in pharma • Enable continuous validation of biologic production process • Medical Device Assembly – reduce recalls caused by poor quality.
  8. 8. Case study – an Intelligent Asset Health Monitoring system for an Industrial Autoclave • Mission - Improve the reliability of a complex asset. • Details - 13 differentmodes (cycles) • Runs 24/7 • CriticalAsset
  9. 9. Equipment Reliability • Capture process, condition data • Establish baseline and measure deviations • Forecast the future • Classify errors early • “Advisory Mode” AI
  10. 10. SCADA = Supervisory Control and Data Acquisition PLC = Programable Logic Controller
  11. 11. System Design Params • Data – Speed: 10ms – 2 hours – Volume: Couple 1,000s sensors per asset. 10,000s of asset per enterprise – Data Type: String, Numeric, Boolean, Array – Timeseries, Discrete
  12. 12. System Design Params • Deployment – Edge (80%) • Hardware Limit • Many cloud-only solutions won’t work • High Uptime, Low Response Time – Cloud (20%)
  13. 13. System Design Params • Use Cases – Automatic Model Param Tuning, Model Training – 1000s of ML Models Deployment – Complex Event Processing (CEP) – Statistical & Analytical Processing • Rule Recommendation • Near Real Time Stream Processing
  14. 14. Challenges • ML – Multiple granularities – Late Data Arrival – Model Deployment on a heterogenous data stream – Flash Flood of Data
  15. 15. Multiple Granularities 15 TS Sensor A Sensor B 12:03:01.198 12:03:02.283 12:03:03.316 12:03:04.572 12:03:05.283 12:03:06.342 TS Sensor C Sensor D 12:03:01.230 12:03:06.233 12:03:11.316 12:03:16.520 12:03:21.283 - Both belong to same “Asset” - Target Feature – C/D or A/B Poll Frequency = 1s Poll Frequency = 5s
  16. 16. Multiple Granularities • Approximation (Roundoff) • Aggregation • Filling - Forward or Backward or Average
  17. 17. Late Data Arrival 17 https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking
  18. 18. Late Data Arrival • Watermarking – Homogenous stream: One watermark per Stream – Heterogenous stream: multiple watermark per “Usage Condition”
  19. 19. - Watermarking time automatically and dynamically chosen - Data later than threshold is discarded
  20. 20. Model Deployment
  21. 21. Flash Flood of Data • Backpressure enabled • Allows Ingestion rate to be chosen dynamically and automatically • PID Controller 22
  22. 22. Complex Event Processing • Insights – PySpark + yahoo/graphkit • Rules – Scala Spark + drools
  23. 23. Summing it up • Industrial IoT is different • Context = Process Data + Condition Data • Techniques for processing heterogenous stream
  24. 24. We’re hiring 2 5 helloworld@quartic.ai
  25. 25. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×