Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Delive...
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Techno...
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven deci...
Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual info...
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
...
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Le...
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Le...
Solution
NLP focused applied ML product:
• Ingests batch and streaming data (operational ML pipeline) from legacy systems....
Architecture
Store
Azure Data Factory
Batch pipeline Orchestration
Azure
ML
Serve
Prep and train
Ingest
Frontend
QLik
Stre...
• Model development
• Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow m...
11
Model development
12
ML pipeline development
13
Operational ML pipeline at runtime
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven deci...
Input data
1. The xyz pump has failed
2. P-1234 to the seal is down
3. Replace the TX – it is corrorde
4. t/s/r old rod
5....
1. Generate word embeddings for input
text by appending the feature vectors
for each token. Padding with zero is
followed ...
Linguistic model attempts to understand failure items like a human.
• It learns what words actually mean from seeing them ...
Conclusion
1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy
infrastruc...
Abstract/Summary
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and...
• Python and any related marks are trademarks are of the Python Software Foundation
• Pytorch and any related marks are tr...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil

Download to read offline

Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum.



To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams.

  • Be the first to like this

NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil

  1. 1. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
  2. 2. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Technology team‡: Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan* ‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
  3. 3. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  4. 4. Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can provide insights. Insights - Outlier identification, capacity planning and prioritization of maintenance tasks. NLP focused applied ML at scale for global fleet analytics at ExxonMobil 4
  5. 5. Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 5
  6. 6. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 6 • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data 2 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  7. 7. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data • Inconsistent data quality. Data input is not comparable. Example: • Large variability in how we enter information in the maintenance/service logs: “Replace the TX – it is corrorde”.) • Data is disconnected. Data quality 2 3 1 7 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  8. 8. Solution NLP focused applied ML product: • Ingests batch and streaming data (operational ML pipeline) from legacy systems. • Sifts through 60 MM+ records (growing nonlinearly) to extract insights using NLP. • Example: Given maintenance log such as “Replace the TX – it is corrorde”, answer questions such as what happened, why it happened and when it happened. 8
  9. 9. Architecture Store Azure Data Factory Batch pipeline Orchestration Azure ML Serve Prep and train Ingest Frontend QLik Streaming data Model Serving Batch data Azure Event Hubs Azure Data Explorer Real-Time Analysis Data Engineering Azure Databricks Data Science & Machine Learning Azure Databricks + Model Repository & Deployment 9
  10. 10. • Model development • Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model registry. • ML pipeline development • ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing common utilities as well as the models published by the data scientists. • ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote standardization and reusability. • Pipeline runtime • The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps. • The ADF pipeline is triggered on a daily schedule. Model development, ML pipeline setup and pipeline runtime. ELO architecture 10
  11. 11. 11 Model development
  12. 12. 12 ML pipeline development
  13. 13. 13 Operational ML pipeline at runtime
  14. 14. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  15. 15. Input data 1. The xyz pump has failed 2. P-1234 to the seal is down 3. Replace the TX – it is corrorde 4. t/s/r old rod 5. Look broke – maybe fix 6. c/o old seal on v/v 7. 2 seal on psv-123 fail …. …. REGEX Cleanup & Tokenization 1. [the, xyz, pump, has, failed] 2. [p , to, the, seal, is, down] 3. [replace, the, tx, it, is, corroded] 4. [tsr, old, rod] 5. [look, broke, maybe, fix] 6. [co, old, seal, on, vv] 7. [2, seal, on, psv, fail] …. …. FastText Ingestion NLP Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using FastText) followed by deep learning classifier. Feature vector generation using FastText for a sentence with N ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded and averaged to form the hidden variable Output Hidden layers x1 x2 xN ……………….. 15
  16. 16. 1. Generate word embeddings for input text by appending the feature vectors for each token. Padding with zero is followed to handle input text of different length. 2. Multiclass classification using deep neural network. 3. Switch to linguistic (unsupervised model) if the predictions do not have enough confidence. 4. If step 7 is initiated, the predictions are used for reinforcement learning to update training steps on the deep neural net. Step Overview NLP Workflow 16 FastText Word Embeddings Deep Neural Net for Predictions Confidence > 95% or Unidentified prediction? FastText Training Display Output from Deep Neural Net Display Output from Linguistic Model Work Order Input Deep Neural Net training Update Training Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7
  17. 17. Linguistic model attempts to understand failure items like a human. • It learns what words actually mean from seeing them used in the past (such as TX and P-1234). • It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.). • It understands dependencies (how positions of words in a sentence relate to each other). • It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion. Simple Example Input Text Prediction The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor 1. Semantics – it knows that TX means transmitter as it has seen both words used in similar context. It knows P-1234 means pump as it has seen both words used in similar context. 2. Context – the linguistic model identifies nouns, prepositions (which link two parts of speech), verbs (action taken on noun) and conjunctions, which identify two nouns that are talked about in the same manner. Linguistic (Unsupervised) Model 17
  18. 18. Conclusion 1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy infrastructure and data models. • Scaled application horizontally using Databricks. • ML model training and serving done using MLflow. 2. Product includes extracting contextual information (what, when and why) from structured and unstructured text. The contextual information together generate insights. 3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The data driven guidance is projected to help save millions of dollars on annual basis. 18
  19. 19. Abstract/Summary Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum. To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams. 19
  20. 20. • Python and any related marks are trademarks are of the Python Software Foundation • Pytorch and any related marks are trademarks are of Facebook, Inc. • Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. • Docker and any related marks are trademarks are of Docker, Inc • Parquet and any related marks are trademarks are of Apache Software Foundation • Snowflake and any related marks are trademarks are of Snowflake Inc. • Databricks and any related marks are trademarks are of Databricks • Azure and any related marks are trademarks are of Microsoft Corporation • Scikit Learn is trademarks are of Scikit-learn consortium • Numpy and any related marks are trademarks are of The SciPy community • pandas is trademark for Python Pandas Package released under BSD 3 license • Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d. Logos 20

Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum. To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams.

Views

Total views

135

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×