SlideShare a Scribd company logo
© 2018 MapR TechnologiesMapR Confidential 1
Building Data Pipelines for AI-
enhanced Industrial Automation
Ian Downard idownard@mapr.com
Will Ochandarena wochandarena@mapr.com
© 2018 MapR TechnologiesMapR Confidential 2
About Us
Ian Downard Will Ochandarena
• Current: Product Management @ MapR
– Focus: IoT, Cloud, Streaming
• Past: Product Manager @ Cisco
• Schools:
– Engineering @ Rensselaer
Polytechnic Institute
– Business @ Santa Clara
University
• Current: Software Engineer @ MapR
– Focus: developer enablement
• Past: Software Engineer @ US Navy,
Rockwell Automation
• Schools:
– Post-grad Engineering @ Missouri
University of Science and
Technology
© 2018 MapR TechnologiesMapR Confidential 3
Agenda
1. Trends in Industrial IoT & AI
2. A Practitioner’s Guide to IoT Data Pipelines
3. Demo: HVAC Monitoring and Predictive Maintenance
© 2018 MapR TechnologiesMapR Confidential 4
Industrial IoT Trends
© 2018 MapR TechnologiesMapR Confidential 5
• Manufacturer’s adoption of ML/AI will increase 38% in the next five years.
Source: Digital Factories 2020: Shaping the future of manufacturing (48 pp., PDF, no opt-in) PriceWaterhouseCoopers
• ML will reduce supply chain forecasting errors by 50% and lost sales by 65%.
Source: Smartening up with Artificial Intelligence (AI) - What’s in it for Germany and its Industrial Sector? (52 pp., PDF, no opt-in) McKinsey &
Company.
• Manufacturers are improving semiconductor yields up to 30%
Source: Smartening up with Artificial Intelligence (AI) - What’s in it for Germany and its Industrial Sector? (52 pp., PDF, no opt-in) McKinsey &
Company.
Why You Should Care:
AI technologies are yielding real business results for Manufacturers
(Source: https://www.forbes.com/sites/louiscolumbus/2018/03/11/10-ways-machine-learning-is-revolutionizing-manufacturing-in-2018/#3532c8ec23ac)
© 2018 MapR TechnologiesMapR Confidential 6
Data Science is Leaving the Playground
Data
Analysis
Data
Science
Operationalization
of AI
Backward
Looking
Forward
Looking
Forward
Looking
Long-term
Decision
Making
Short-term
Decision
Making
In-the-moment
Decision
Making
© 2018 MapR TechnologiesMapR Confidential 7
Successful AI requires robust data pipelines
An AI
use-case
Models
running in
production
E.g. Predicting
remaining useful life
(RUL) of an
equipment
Requires
Feature(s) extraction,
model dev, algo
selection, supervised
& unsupervised
learning
Requires
Partition the
deployment logic
between equipment,
sensors, cloud, and
on-prem
Ability to tap into ALL
data silos
Build a system of
record from real-time
and batch data
Establish data
security, governance,
stewardship
A flexible, extensible
data framework
Requires
Iterative process, piloting with
a smaller data sample, explore
classification, regression,
toolkit selection and more.
Fundamental tenets of a
robust data platform
Gartner estimated that 85% percent of big data projects fail due to challenges in integrating with
existing business processes and applications, … , and security and governance challenges.
www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
© 2018 MapR TechnologiesMapR Confidential 8
Data-Driven Trends – From Yesterday to Today to Tomorrow
Manual assembly
Inadequate visibility into the
supply chain
Un-connected equipment.
Manual prognosis.
Manual business processes
for selling goods & services
Semi-automated assembly
Robots-based fully automated
assembly
Partially connected via Supply Chain
Management Software (SCMS)
IoT-ized equipment
E-commerce & e-retailers
enabled via the Internet
Automated supply ordering
based on depletion monitoring
Predictive maintenance and
proactive health monitoring
Quicker and seamless selling using
bots, voice-services
Drones for delivery. Autonomous
delivery vehicles.
Handover to the “delivery guys”
Connected delivery vehicles with
accurate delivery alerts to consumers
Factories with
monolithic apps
Connected factories
with connected apps
Smart factories with data-
driven apps
YESTERDAY TODAY TOMORROW
© 2018 MapR TechnologiesMapR Confidential 9
Rise of Edge Computing
Need for real-time
collection drives
move to pub/sub
data protocols
Modern vision
algorithms spur
rise of video as
the uber-sensor
Moore’s law putting
unprecedented
compute and storage
capacity @ the edge
Remote locations
have unreliable, low-
bandwidth network
connectivity
© 2018 MapR TechnologiesMapR Confidential 10
• “Act locally, learn globally”
• Small footprint at the edge for local data
services, processing and core connectivity
• Collect, process, classify data @ edge
• Pub-sub streaming of summary data &
metadata from edge to core
• Core persists streams as stream-of-record
for centralized use cases & learning
• Raw data held at edge, available for recall
from core for period of time
Architectural Trend #1 - Edge-first Processing
StreamT
o
pi
c
Stream
T
o
pi
c
Stream
T
o
pi
c
Edge
Edge
Edge
Cloud Core
On-prem Core
Core
External
Application
© 2018 MapR TechnologiesMapR Confidential 11
Architectural Trend #2 - Stream of Record
Sensor Readings
& Events
Persisted Stream
of Record
Materialized
Views
Apps
• Development Agility - New apps access new and historical data in one place.
• Compliance - Persistent, tamper-proof record of historical activity.
• Architecture Simplicity - Less data copies and pipelines.
Yield Mgmt
Remaining
Useful Life
Exploration
temperatures
positions
events
© 2018 MapR TechnologiesMapR Confidential 12
A Practitioner’s Guide to
IoT Data Pipelines
© 2018 MapR Technologies 13
Roadmap to building AI into Factory IoT
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
Monitoring
Data Exploration
Applied Machine Learning
© 2018 MapR Technologies 14
Data Acquisition
CONTROL NETWORK
CORPORATE NETWORK
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
PLC PLC
Gateway to data.
MQTT, REST…
Data Pipelines
Ingest Persist Analyze
Data Flow
IDEs, notebooks,
online AI platforms
Files, Tables, Streams
© 2018 MapR Technologies 16
Data Pipelines
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
© 2018 MapR Technologies 17
Monitoring and Data Exploration
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
© 2018 MapR Technologies 18
Data Exploration, Feature Engineering, and AI
IDEs, notebooks
platforms
Programming
Libraries
Data
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
Files, Tables, Streams
© 2018 MapR Technologies 19
What AI techniques are viable?
• Linear Regression
– predict a value (e.g. remaining useful life)
• Logistic Regression
– predict a probability (e.g. chance that failure will occur within 30
seconds)
• Anomaly Detection
– Alert when time-series prediction != actual
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
© 2018 MapR Technologies 20
Sensor Data
Device ID time x y z
1 8:00:00 .431 .123 .145
1 8:00:01 .735 .112 .672
1 8:00:02 .932 .141 .431
1 8:00:03 .988 .241 .625
© 2018 MapR Technologies 21
Device ID time x y z _operator _weekend
1 11:59:58 .431 .123 .145 Joe False
1 11:59:59 .735 .112 .672 Joe False
1 12:00:00 .932 .141 .431 Moe True
1 12:00:01 .988 .241 .625 Moe True
Derived Features
Deriving new properties that correlate to failures requires flexible
schema data storage.
© 2018 MapR Technologies 22
Sensor Data
Device
ID
time x y z Remaining
Life
30s to
failure
1 8:00:00 .431 .123 .145
1 8:00:01 .735 .112 .672
1 8:00:02 .932 .141 .431
1 8:00:03 .988 .241 .625
Lagging Features
© 2018 MapR Technologies 23
Device
ID
time x y z Remaining
Life
30s to
failure
1 8:00:00 .431 .123 .145 3 true
1 8:00:01 .735 .112 .672 2 true
1 8:00:02 .932 .141 .431 1 true
1 8:00:03 --- --- --- 0 true
Lagging Features
When a failure happens…
…then lagging features get labeled.
© 2018 MapR Technologies 24
NoSQL DB
Spark DB Connectors
Operate on
data in Spark
without data
movement.
r
DB pushdown =
fast filtering and
sorting.
Objective Solution
“Database pushdown” means you can update
feature stores without data movement.
Feature table size
Number of Lagging variable
records to update.
Alert sent to Grafana
Listening to stream
topic for failure events.
© 2018 MapR Technologies 26
Architecting for Fast Data
• Continuous time signals require
high speed sampling.
• Full resolution is required.
– Aggregation hides important things.
– High fidelity makes machine
learning more effective.
• Streams and compute must scale.
© 2018 MapR Technologies 27
Case Study: Vibration analysis
• Vibrations give the first clue that a machine is failing
• Vibration sensors measure physical displacement
• Capturing a 10kHz vibration requires > 20k samples / second
Detecting vibration anomalies requires continuously
processing high speed data streams.
© 2018 MapR Technologies 28
Case Study: Vibration analysis
(1 record / sec)
>20k samples/sec
Anomaly
notifications
Feature
Store
Spark can consume high speed streams and
persist derived signals to new streams or tables.
© 2018 MapR Technologies 29
Case Study: Vibration analysis
Anomaly
notifications
Feature
Store
Spark can filter streams by device id
and parallel compute transformations.
As long as the ingest stream can keep
up, this will scale. What if it doesn’t?
© 2018 MapR Technologies 30
Case Study: Vibration analysis
Anomaly
notifications
Feature
Store
Affinity of producers to topics
can significantly improve
throughput in MapR Streams
and Kafka.
Spark consumers can
subscribe to multiple
topics, so just subscribe
to all and scale away!
© 2018 MapR Technologies 31
Machine Learning in Production
A traditional starting point:
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
Model
Inference
Request
Remaining
Useful Life
Inferences
Streams ensure requests
and responses are saved,
replicated, and re-playable.
© 2018 MapR Technologies 32
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities
Models
Telemetry
Remaining
Useful Life
Model Diffs
Inferences
Rendezvous
Models must be monitored, too!
Machine Learning in Production
© 2018 MapR Technologies 33
Build Data Pipelines
Run AI experiments
Analyze Data
Instrument Machinery
Automate AI utilities http://bit.ly/ml-logistics
Machine Learning in Production
© 2018 MapR Technologies 34
IoT Data Pipelines in Action
© 2018 MapR Technologies 35
Predictive
Maintenance
Real-Time
Monitoring
Running AI experiments
Data Exploration and
Feature Engineering
Building Data Pipelines
Predictive Maintenance Demonstration
https://github.com/mapr-demos/predictive-maintenance
Streams, JSON DB
© 2018 MapR Technologies 36
Data Pipeline for HVAC Monitoring
Ingest
Stream
Stream-to-DB
consumer
Interactive monitoring
in Grafana
Kafka API REST APISensor data Time-series
storage in
OpenTSDB
© 2018 MapR Technologies 37
Data Pipeline for HVAC Analytics and Predictive
Maintenance
Ingest
Stream
Feature
Engineering
in Spark
Feature
storage in
NoSQL DB
SQL analytics in Drill
Anomaly detection,
failure prediction,
and data science
tools.
Kafka API OJAI API
Failure events
Sensor data

More Related Content

What's hot

Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
DataWorks Summit
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
DataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
DataWorks Summit
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
DataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 

What's hot (20)

Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 

Similar to Designing data pipelines for analytics and machine learning in industrial settings

Predictive Maintenance - Portland Machine Learning Meetup
Predictive Maintenance - Portland Machine Learning MeetupPredictive Maintenance - Portland Machine Learning Meetup
Predictive Maintenance - Portland Machine Learning Meetup
Ian Downard
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
Cloudera, Inc.
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
Justin Brandenburg
 
Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...
J On The Beach
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
Ellen Friedman
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
MapR Technologies
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
DataWorks Summit
 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of Things
HarshitParkar6677
 
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 20187 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
Ellen Friedman
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceLIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
Jason Creadore 🌐
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
Joshua Patterson
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
ridhav
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
mattdenesuk
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Matt Stubbs
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 

Similar to Designing data pipelines for analytics and machine learning in industrial settings (20)

Predictive Maintenance - Portland Machine Learning Meetup
Predictive Maintenance - Portland Machine Learning MeetupPredictive Maintenance - Portland Machine Learning Meetup
Predictive Maintenance - Portland Machine Learning Meetup
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of Things
 
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 20187 Habits for Big Data in Production - keynote Big Data London Nov 2018
7 Habits for Big Data in Production - keynote Big Data London Nov 2018
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceLIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Designing data pipelines for analytics and machine learning in industrial settings

  • 1. © 2018 MapR TechnologiesMapR Confidential 1 Building Data Pipelines for AI- enhanced Industrial Automation Ian Downard idownard@mapr.com Will Ochandarena wochandarena@mapr.com
  • 2. © 2018 MapR TechnologiesMapR Confidential 2 About Us Ian Downard Will Ochandarena • Current: Product Management @ MapR – Focus: IoT, Cloud, Streaming • Past: Product Manager @ Cisco • Schools: – Engineering @ Rensselaer Polytechnic Institute – Business @ Santa Clara University • Current: Software Engineer @ MapR – Focus: developer enablement • Past: Software Engineer @ US Navy, Rockwell Automation • Schools: – Post-grad Engineering @ Missouri University of Science and Technology
  • 3. © 2018 MapR TechnologiesMapR Confidential 3 Agenda 1. Trends in Industrial IoT & AI 2. A Practitioner’s Guide to IoT Data Pipelines 3. Demo: HVAC Monitoring and Predictive Maintenance
  • 4. © 2018 MapR TechnologiesMapR Confidential 4 Industrial IoT Trends
  • 5. © 2018 MapR TechnologiesMapR Confidential 5 • Manufacturer’s adoption of ML/AI will increase 38% in the next five years. Source: Digital Factories 2020: Shaping the future of manufacturing (48 pp., PDF, no opt-in) PriceWaterhouseCoopers • ML will reduce supply chain forecasting errors by 50% and lost sales by 65%. Source: Smartening up with Artificial Intelligence (AI) - What’s in it for Germany and its Industrial Sector? (52 pp., PDF, no opt-in) McKinsey & Company. • Manufacturers are improving semiconductor yields up to 30% Source: Smartening up with Artificial Intelligence (AI) - What’s in it for Germany and its Industrial Sector? (52 pp., PDF, no opt-in) McKinsey & Company. Why You Should Care: AI technologies are yielding real business results for Manufacturers (Source: https://www.forbes.com/sites/louiscolumbus/2018/03/11/10-ways-machine-learning-is-revolutionizing-manufacturing-in-2018/#3532c8ec23ac)
  • 6. © 2018 MapR TechnologiesMapR Confidential 6 Data Science is Leaving the Playground Data Analysis Data Science Operationalization of AI Backward Looking Forward Looking Forward Looking Long-term Decision Making Short-term Decision Making In-the-moment Decision Making
  • 7. © 2018 MapR TechnologiesMapR Confidential 7 Successful AI requires robust data pipelines An AI use-case Models running in production E.g. Predicting remaining useful life (RUL) of an equipment Requires Feature(s) extraction, model dev, algo selection, supervised & unsupervised learning Requires Partition the deployment logic between equipment, sensors, cloud, and on-prem Ability to tap into ALL data silos Build a system of record from real-time and batch data Establish data security, governance, stewardship A flexible, extensible data framework Requires Iterative process, piloting with a smaller data sample, explore classification, regression, toolkit selection and more. Fundamental tenets of a robust data platform Gartner estimated that 85% percent of big data projects fail due to challenges in integrating with existing business processes and applications, … , and security and governance challenges. www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
  • 8. © 2018 MapR TechnologiesMapR Confidential 8 Data-Driven Trends – From Yesterday to Today to Tomorrow Manual assembly Inadequate visibility into the supply chain Un-connected equipment. Manual prognosis. Manual business processes for selling goods & services Semi-automated assembly Robots-based fully automated assembly Partially connected via Supply Chain Management Software (SCMS) IoT-ized equipment E-commerce & e-retailers enabled via the Internet Automated supply ordering based on depletion monitoring Predictive maintenance and proactive health monitoring Quicker and seamless selling using bots, voice-services Drones for delivery. Autonomous delivery vehicles. Handover to the “delivery guys” Connected delivery vehicles with accurate delivery alerts to consumers Factories with monolithic apps Connected factories with connected apps Smart factories with data- driven apps YESTERDAY TODAY TOMORROW
  • 9. © 2018 MapR TechnologiesMapR Confidential 9 Rise of Edge Computing Need for real-time collection drives move to pub/sub data protocols Modern vision algorithms spur rise of video as the uber-sensor Moore’s law putting unprecedented compute and storage capacity @ the edge Remote locations have unreliable, low- bandwidth network connectivity
  • 10. © 2018 MapR TechnologiesMapR Confidential 10 • “Act locally, learn globally” • Small footprint at the edge for local data services, processing and core connectivity • Collect, process, classify data @ edge • Pub-sub streaming of summary data & metadata from edge to core • Core persists streams as stream-of-record for centralized use cases & learning • Raw data held at edge, available for recall from core for period of time Architectural Trend #1 - Edge-first Processing StreamT o pi c Stream T o pi c Stream T o pi c Edge Edge Edge Cloud Core On-prem Core Core External Application
  • 11. © 2018 MapR TechnologiesMapR Confidential 11 Architectural Trend #2 - Stream of Record Sensor Readings & Events Persisted Stream of Record Materialized Views Apps • Development Agility - New apps access new and historical data in one place. • Compliance - Persistent, tamper-proof record of historical activity. • Architecture Simplicity - Less data copies and pipelines. Yield Mgmt Remaining Useful Life Exploration temperatures positions events
  • 12. © 2018 MapR TechnologiesMapR Confidential 12 A Practitioner’s Guide to IoT Data Pipelines
  • 13. © 2018 MapR Technologies 13 Roadmap to building AI into Factory IoT Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities Monitoring Data Exploration Applied Machine Learning
  • 14. © 2018 MapR Technologies 14 Data Acquisition CONTROL NETWORK CORPORATE NETWORK Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities PLC PLC Gateway to data. MQTT, REST…
  • 15. Data Pipelines Ingest Persist Analyze Data Flow IDEs, notebooks, online AI platforms Files, Tables, Streams
  • 16. © 2018 MapR Technologies 16 Data Pipelines Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities
  • 17. © 2018 MapR Technologies 17 Monitoring and Data Exploration Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities
  • 18. © 2018 MapR Technologies 18 Data Exploration, Feature Engineering, and AI IDEs, notebooks platforms Programming Libraries Data Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities Files, Tables, Streams
  • 19. © 2018 MapR Technologies 19 What AI techniques are viable? • Linear Regression – predict a value (e.g. remaining useful life) • Logistic Regression – predict a probability (e.g. chance that failure will occur within 30 seconds) • Anomaly Detection – Alert when time-series prediction != actual Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities
  • 20. © 2018 MapR Technologies 20 Sensor Data Device ID time x y z 1 8:00:00 .431 .123 .145 1 8:00:01 .735 .112 .672 1 8:00:02 .932 .141 .431 1 8:00:03 .988 .241 .625
  • 21. © 2018 MapR Technologies 21 Device ID time x y z _operator _weekend 1 11:59:58 .431 .123 .145 Joe False 1 11:59:59 .735 .112 .672 Joe False 1 12:00:00 .932 .141 .431 Moe True 1 12:00:01 .988 .241 .625 Moe True Derived Features Deriving new properties that correlate to failures requires flexible schema data storage.
  • 22. © 2018 MapR Technologies 22 Sensor Data Device ID time x y z Remaining Life 30s to failure 1 8:00:00 .431 .123 .145 1 8:00:01 .735 .112 .672 1 8:00:02 .932 .141 .431 1 8:00:03 .988 .241 .625 Lagging Features
  • 23. © 2018 MapR Technologies 23 Device ID time x y z Remaining Life 30s to failure 1 8:00:00 .431 .123 .145 3 true 1 8:00:01 .735 .112 .672 2 true 1 8:00:02 .932 .141 .431 1 true 1 8:00:03 --- --- --- 0 true Lagging Features When a failure happens… …then lagging features get labeled.
  • 24. © 2018 MapR Technologies 24 NoSQL DB Spark DB Connectors Operate on data in Spark without data movement. r DB pushdown = fast filtering and sorting. Objective Solution “Database pushdown” means you can update feature stores without data movement.
  • 25. Feature table size Number of Lagging variable records to update. Alert sent to Grafana Listening to stream topic for failure events.
  • 26. © 2018 MapR Technologies 26 Architecting for Fast Data • Continuous time signals require high speed sampling. • Full resolution is required. – Aggregation hides important things. – High fidelity makes machine learning more effective. • Streams and compute must scale.
  • 27. © 2018 MapR Technologies 27 Case Study: Vibration analysis • Vibrations give the first clue that a machine is failing • Vibration sensors measure physical displacement • Capturing a 10kHz vibration requires > 20k samples / second Detecting vibration anomalies requires continuously processing high speed data streams.
  • 28. © 2018 MapR Technologies 28 Case Study: Vibration analysis (1 record / sec) >20k samples/sec Anomaly notifications Feature Store Spark can consume high speed streams and persist derived signals to new streams or tables.
  • 29. © 2018 MapR Technologies 29 Case Study: Vibration analysis Anomaly notifications Feature Store Spark can filter streams by device id and parallel compute transformations. As long as the ingest stream can keep up, this will scale. What if it doesn’t?
  • 30. © 2018 MapR Technologies 30 Case Study: Vibration analysis Anomaly notifications Feature Store Affinity of producers to topics can significantly improve throughput in MapR Streams and Kafka. Spark consumers can subscribe to multiple topics, so just subscribe to all and scale away!
  • 31. © 2018 MapR Technologies 31 Machine Learning in Production A traditional starting point: Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities Model Inference Request Remaining Useful Life Inferences Streams ensure requests and responses are saved, replicated, and re-playable.
  • 32. © 2018 MapR Technologies 32 Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities Models Telemetry Remaining Useful Life Model Diffs Inferences Rendezvous Models must be monitored, too! Machine Learning in Production
  • 33. © 2018 MapR Technologies 33 Build Data Pipelines Run AI experiments Analyze Data Instrument Machinery Automate AI utilities http://bit.ly/ml-logistics Machine Learning in Production
  • 34. © 2018 MapR Technologies 34 IoT Data Pipelines in Action
  • 35. © 2018 MapR Technologies 35 Predictive Maintenance Real-Time Monitoring Running AI experiments Data Exploration and Feature Engineering Building Data Pipelines Predictive Maintenance Demonstration https://github.com/mapr-demos/predictive-maintenance Streams, JSON DB
  • 36. © 2018 MapR Technologies 36 Data Pipeline for HVAC Monitoring Ingest Stream Stream-to-DB consumer Interactive monitoring in Grafana Kafka API REST APISensor data Time-series storage in OpenTSDB
  • 37. © 2018 MapR Technologies 37 Data Pipeline for HVAC Analytics and Predictive Maintenance Ingest Stream Feature Engineering in Spark Feature storage in NoSQL DB SQL analytics in Drill Anomaly detection, failure prediction, and data science tools. Kafka API OJAI API Failure events Sensor data