SlideShare a Scribd company logo
1 of 20
Download to read offline
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Technology team‡:
Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha
Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan*
‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can
provide insights.
Insights - Outlier identification, capacity planning and prioritization of maintenance tasks.
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
4
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
5
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
6
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
2
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
• Inconsistent data quality. Data input is
not comparable. Example:
• Large variability in how we enter
information in the
maintenance/service logs:
“Replace the TX – it is corrorde”.)
• Data is disconnected.
Data quality
2 3
1
7
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Solution
NLP focused applied ML product:
• Ingests batch and streaming data (operational ML pipeline) from legacy systems.
• Sifts through 60 MM+ records (growing nonlinearly) to extract insights using
NLP.
• Example: Given maintenance log such as “Replace the TX – it is corrorde”,
answer questions such as what happened, why it happened and when it
happened.
8
Architecture
Store
Azure Data Factory
Batch pipeline Orchestration
Azure
ML
Serve
Prep and train
Ingest
Frontend
QLik
Streaming data
Model Serving
Batch data
Azure Event Hubs
Azure Data Explorer
Real-Time Analysis
Data
Engineering
Azure Databricks
Data Science & Machine
Learning
Azure Databricks
+
Model Repository &
Deployment
9
• Model development
• Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model
registry.
• ML pipeline development
• ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing
common utilities as well as the models published by the data scientists.
• ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote
standardization and reusability.
• Pipeline runtime
• The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps.
• The ADF pipeline is triggered on a daily schedule.
Model development, ML pipeline setup and pipeline runtime.
ELO architecture
10
11
Model development
12
ML pipeline development
13
Operational ML pipeline at runtime
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
Input data
1. The xyz pump has failed
2. P-1234 to the seal is down
3. Replace the TX – it is corrorde
4. t/s/r old rod
5. Look broke – maybe fix
6. c/o old seal on v/v
7. 2 seal on psv-123 fail
….
….
REGEX Cleanup & Tokenization
1. [the, xyz, pump, has, failed]
2. [p , to, the, seal, is, down]
3. [replace, the, tx, it, is,
corroded]
4. [tsr, old, rod]
5. [look, broke, maybe, fix]
6. [co, old, seal, on, vv]
7. [2, seal, on, psv, fail]
….
….
FastText
Ingestion
NLP
Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using
FastText) followed by deep learning classifier.
Feature vector generation using FastText for a sentence with N
ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded
and averaged to form the hidden variable
Output
Hidden layers
x1 x2 xN
………………..
15
1. Generate word embeddings for input
text by appending the feature vectors
for each token. Padding with zero is
followed to handle input text of
different length.
2. Multiclass classification using deep
neural network.
3. Switch to linguistic (unsupervised
model) if the predictions do not have
enough confidence.
4. If step 7 is initiated, the predictions are
used for reinforcement learning to
update training steps on the deep
neural net.
Step Overview
NLP Workflow
16
FastText
Word
Embeddings
Deep Neural
Net for
Predictions
Confidence
> 95% or
Unidentified
prediction?
FastText
Training
Display Output
from Deep
Neural Net
Display Output
from Linguistic
Model
Work
Order
Input
Deep Neural
Net training
Update
Training
Step 1 Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Linguistic model attempts to understand failure items like a human.
• It learns what words actually mean from seeing them used in the past (such as TX and P-1234).
• It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.).
• It understands dependencies (how positions of words in a sentence relate to each other).
• It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion.
Simple Example
Input Text Prediction
The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor
1. Semantics – it knows that TX means
transmitter as it has seen both
words used in similar context. It
knows P-1234 means pump as it
has seen both words used in similar
context.
2. Context – the linguistic model
identifies nouns, prepositions
(which link two parts of speech),
verbs (action taken on noun) and
conjunctions, which identify two
nouns that are talked about in the
same manner.
Linguistic (Unsupervised) Model
17
Conclusion
1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy
infrastructure and data models.
• Scaled application horizontally using Databricks.
• ML model training and serving done using MLflow.
2. Product includes extracting contextual information (what, when and why) from structured and unstructured
text. The contextual information together generate insights.
3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The
data driven guidance is projected to help save millions of dollars on annual basis.
18
Abstract/Summary
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data
models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest
and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is
projected to be millions of dollars per annum.
To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and
unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to
extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost
reduction opportunities, and the discovery process for cross-functional teams.
19
• Python and any related marks are trademarks are of the Python Software Foundation
• Pytorch and any related marks are trademarks are of Facebook, Inc.
• Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
• Docker and any related marks are trademarks are of Docker, Inc
• Parquet and any related marks are trademarks are of Apache Software Foundation
• Snowflake and any related marks are trademarks are of Snowflake Inc.
• Databricks and any related marks are trademarks are of Databricks
• Azure and any related marks are trademarks are of Microsoft Corporation
• Scikit Learn is trademarks are of Scikit-learn consortium
• Numpy and any related marks are trademarks are of The SciPy community
• pandas is trademark for Python Pandas Package released under BSD 3 license
• Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d.
Logos
20

More Related Content

What's hot

Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowDatabricks
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreDatabricks
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Databricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowDatabricks
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Databricks
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriDatabricks
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Databricks
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Databricks
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkSpark Summit
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 

What's hot (20)

Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML Pipeline
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 

Similar to Leveraging NLP and ML to optimize global fleet maintenance at scale

Database performance management
Database performance managementDatabase performance management
Database performance managementscottaver
 
Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish Jain
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjay Mane
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Nisha Talagala
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesMigrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesJade Global
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 

Similar to Leveraging NLP and ML to optimize global fleet maintenance at scale (20)

Database performance management
Database performance managementDatabase performance management
Database performance management
 
Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_Profile
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Veera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_ProfileVeera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_Profile
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
oracle-complex-event-processing-066421
oracle-complex-event-processing-066421oracle-complex-event-processing-066421
oracle-complex-event-processing-066421
 
Siraj_DBA
Siraj_DBASiraj_DBA
Siraj_DBA
 
Siraj_DBA
Siraj_DBASiraj_DBA
Siraj_DBA
 
cchoubey_resume
cchoubey_resumecchoubey_resume
cchoubey_resume
 
Daya_DBA
Daya_DBADaya_DBA
Daya_DBA
 
Resume_Raj Ganesh Subramanian
Resume_Raj Ganesh SubramanianResume_Raj Ganesh Subramanian
Resume_Raj Ganesh Subramanian
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesMigrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
Resume new no
Resume new noResume new no
Resume new no
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Navendu_Resume
Navendu_ResumeNavendu_Resume
Navendu_Resume
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Leveraging NLP and ML to optimize global fleet maintenance at scale

  • 1. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
  • 2. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Technology team‡: Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan* ‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
  • 3. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  • 4. Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can provide insights. Insights - Outlier identification, capacity planning and prioritization of maintenance tasks. NLP focused applied ML at scale for global fleet analytics at ExxonMobil 4
  • 5. Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 5
  • 6. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 6 • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data 2 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  • 7. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data • Inconsistent data quality. Data input is not comparable. Example: • Large variability in how we enter information in the maintenance/service logs: “Replace the TX – it is corrorde”.) • Data is disconnected. Data quality 2 3 1 7 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  • 8. Solution NLP focused applied ML product: • Ingests batch and streaming data (operational ML pipeline) from legacy systems. • Sifts through 60 MM+ records (growing nonlinearly) to extract insights using NLP. • Example: Given maintenance log such as “Replace the TX – it is corrorde”, answer questions such as what happened, why it happened and when it happened. 8
  • 9. Architecture Store Azure Data Factory Batch pipeline Orchestration Azure ML Serve Prep and train Ingest Frontend QLik Streaming data Model Serving Batch data Azure Event Hubs Azure Data Explorer Real-Time Analysis Data Engineering Azure Databricks Data Science & Machine Learning Azure Databricks + Model Repository & Deployment 9
  • 10. • Model development • Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model registry. • ML pipeline development • ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing common utilities as well as the models published by the data scientists. • ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote standardization and reusability. • Pipeline runtime • The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps. • The ADF pipeline is triggered on a daily schedule. Model development, ML pipeline setup and pipeline runtime. ELO architecture 10
  • 14. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  • 15. Input data 1. The xyz pump has failed 2. P-1234 to the seal is down 3. Replace the TX – it is corrorde 4. t/s/r old rod 5. Look broke – maybe fix 6. c/o old seal on v/v 7. 2 seal on psv-123 fail …. …. REGEX Cleanup & Tokenization 1. [the, xyz, pump, has, failed] 2. [p , to, the, seal, is, down] 3. [replace, the, tx, it, is, corroded] 4. [tsr, old, rod] 5. [look, broke, maybe, fix] 6. [co, old, seal, on, vv] 7. [2, seal, on, psv, fail] …. …. FastText Ingestion NLP Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using FastText) followed by deep learning classifier. Feature vector generation using FastText for a sentence with N ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded and averaged to form the hidden variable Output Hidden layers x1 x2 xN ……………….. 15
  • 16. 1. Generate word embeddings for input text by appending the feature vectors for each token. Padding with zero is followed to handle input text of different length. 2. Multiclass classification using deep neural network. 3. Switch to linguistic (unsupervised model) if the predictions do not have enough confidence. 4. If step 7 is initiated, the predictions are used for reinforcement learning to update training steps on the deep neural net. Step Overview NLP Workflow 16 FastText Word Embeddings Deep Neural Net for Predictions Confidence > 95% or Unidentified prediction? FastText Training Display Output from Deep Neural Net Display Output from Linguistic Model Work Order Input Deep Neural Net training Update Training Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7
  • 17. Linguistic model attempts to understand failure items like a human. • It learns what words actually mean from seeing them used in the past (such as TX and P-1234). • It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.). • It understands dependencies (how positions of words in a sentence relate to each other). • It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion. Simple Example Input Text Prediction The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor 1. Semantics – it knows that TX means transmitter as it has seen both words used in similar context. It knows P-1234 means pump as it has seen both words used in similar context. 2. Context – the linguistic model identifies nouns, prepositions (which link two parts of speech), verbs (action taken on noun) and conjunctions, which identify two nouns that are talked about in the same manner. Linguistic (Unsupervised) Model 17
  • 18. Conclusion 1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy infrastructure and data models. • Scaled application horizontally using Databricks. • ML model training and serving done using MLflow. 2. Product includes extracting contextual information (what, when and why) from structured and unstructured text. The contextual information together generate insights. 3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The data driven guidance is projected to help save millions of dollars on annual basis. 18
  • 19. Abstract/Summary Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum. To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams. 19
  • 20. • Python and any related marks are trademarks are of the Python Software Foundation • Pytorch and any related marks are trademarks are of Facebook, Inc. • Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. • Docker and any related marks are trademarks are of Docker, Inc • Parquet and any related marks are trademarks are of Apache Software Foundation • Snowflake and any related marks are trademarks are of Snowflake Inc. • Databricks and any related marks are trademarks are of Databricks • Azure and any related marks are trademarks are of Microsoft Corporation • Scikit Learn is trademarks are of Scikit-learn consortium • Numpy and any related marks are trademarks are of The SciPy community • pandas is trademark for Python Pandas Package released under BSD 3 license • Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d. Logos 20