SlideShare a Scribd company logo
1 of 21
Download to read offline
AAPG Geoscience Technical Workshop:
Boosting reserves and recovery using ML and Analytics
January 15-17, 2019
Marathon Oil Tower - Houston, TX
Challenges Faced with Processing
Petrophysical Big Data for Assessing
Viable Opportunities
CJ Ejimuda-MS, Emenike Ejimuda-PhD
Hybrid Data Solutions, Los Angeles, CA, USA
web:https://hybridata.us
CJ Ejimuda
Full Stack Data Scientist / Principal
Hybrid Data Solutions
Mine more value leveraging AI, IIoT, Big Data
Domain Expertise in Reservoir and Production Engineering
ExxonMobil, Aera Energy
2
A little about us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
3
Outline
● Why process petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● ETL Automation
● Conclusion
● References
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
4
Why process petrophysical Big Data?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features
5
What Big Data processing challenges?
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
6
What Big Data processing challenges?
● For 1 to 10 well log files?
● For 1000 well log files ???
● Link to ~ 1000 well log data from 5 fields in excel sheets
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
7
ETL Workflow
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep
Learning Pipeline
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
8
ETL Automation
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
9
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
10
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
11
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
12
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
13
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
14
ETL Automation
Why Apache Arrow?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
15
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning
Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account
for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
16
● Do not Repeat Yourself
● Apache Airflow to orchestrate ETL process
- Define DAG
- You may use Dummy, Sensor and Python operators (* with XCom)
- Use AWS Services (S3,EMR…) / Azure / GCP service
ETL Automation - Potential Next Steps
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
17
● Moving towards real time data processing:
- WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
Conclusion
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
18
References
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
19
Questions?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
20
Thank you!
email: cj@hybriData.us
web: https://hybriData.us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
21
Back Up
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

More Related Content

What's hot

Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects listNEWZEN INFOTECH
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda
 
Railroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleRailroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleDataWorks Summit
 
Alan Crosswell Canarie20090304
Alan Crosswell  Canarie20090304Alan Crosswell  Canarie20090304
Alan Crosswell Canarie20090304Bill St. Arnaud
 
HTML Flight Scraper
HTML Flight Scraper HTML Flight Scraper
HTML Flight Scraper Anthony Kilde
 
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"BigData_Europe
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project PresentationM Zubair Iqbal
 
Weather exploratory data analysis
Weather   exploratory data analysisWeather   exploratory data analysis
Weather exploratory data analysismadhucharis
 

What's hot (20)

Data Sources
Data SourcesData Sources
Data Sources
 
AWS_ac_ra_loganalysis_11
AWS_ac_ra_loganalysis_11AWS_ac_ra_loganalysis_11
AWS_ac_ra_loganalysis_11
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects list
 
Advait kulkarni
Advait kulkarniAdvait kulkarni
Advait kulkarni
 
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using HadoopAirline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
 
Railroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleRailroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop Scale
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
 
Alan Crosswell Canarie20090304
Alan Crosswell  Canarie20090304Alan Crosswell  Canarie20090304
Alan Crosswell Canarie20090304
 
HTML Flight Scraper
HTML Flight Scraper HTML Flight Scraper
HTML Flight Scraper
 
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
 
A-ONE consultants
A-ONE consultantsA-ONE consultants
A-ONE consultants
 
Shenoy resume
Shenoy resumeShenoy resume
Shenoy resume
 
DataAnalysis
DataAnalysisDataAnalysis
DataAnalysis
 
GIS #7
GIS #7GIS #7
GIS #7
 
Data Analyst Track
Data Analyst TrackData Analyst Track
Data Analyst Track
 
Time_Series_Assignment
Time_Series_AssignmentTime_Series_Assignment
Time_Series_Assignment
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project Presentation
 
Weather exploratory data analysis
Weather   exploratory data analysisWeather   exploratory data analysis
Weather exploratory data analysis
 

Similar to AAPG Geoscience Technology Workshop 2019

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streamingt_ivanov
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Knoldus Inc.
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway Tu Pham
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Databricks
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffHong-Linh Truong
 
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...Karen Cannell
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DMichael Swanson
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
 

Similar to AAPG Geoscience Technology Workshop 2019 (20)

NAPE 2019 Presentation
NAPE 2019 PresentationNAPE 2019 Presentation
NAPE 2019 Presentation
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
 
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 

More from Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda
 

More from Chijioke “CJ” Ejimuda (11)

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure Connectivity
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
 
Learning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle IndustryLearning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle Industry
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the Edge
 
hybriData Energy Services and Data Products
hybriData Energy Services and Data ProductshybriData Energy Services and Data Products
hybriData Energy Services and Data Products
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short Course
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
 
elasticsearch X react
elasticsearch X reactelasticsearch X react
elasticsearch X react
 

Recently uploaded

microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 

Recently uploaded (20)

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 

AAPG Geoscience Technology Workshop 2019

  • 1. AAPG Geoscience Technical Workshop: Boosting reserves and recovery using ML and Analytics January 15-17, 2019 Marathon Oil Tower - Houston, TX Challenges Faced with Processing Petrophysical Big Data for Assessing Viable Opportunities CJ Ejimuda-MS, Emenike Ejimuda-PhD Hybrid Data Solutions, Los Angeles, CA, USA web:https://hybridata.us
  • 2. CJ Ejimuda Full Stack Data Scientist / Principal Hybrid Data Solutions Mine more value leveraging AI, IIoT, Big Data Domain Expertise in Reservoir and Production Engineering ExxonMobil, Aera Energy 2 A little about us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 3. 3 Outline ● Why process petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● ETL Automation ● Conclusion ● References AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 4. 4 Why process petrophysical Big Data? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features
  • 5. 5 What Big Data processing challenges? ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 6. 6 What Big Data processing challenges? ● For 1 to 10 well log files? ● For 1000 well log files ??? ● Link to ~ 1000 well log data from 5 fields in excel sheets AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 7. 7 ETL Workflow ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 8. 8 ETL Automation ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 9. 9 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 10. 10 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 11. 11 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 12. 12 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 13. 13 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 14. 14 ETL Automation Why Apache Arrow? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 15. 15 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 16. 16 ● Do not Repeat Yourself ● Apache Airflow to orchestrate ETL process - Define DAG - You may use Dummy, Sensor and Python operators (* with XCom) - Use AWS Services (S3,EMR…) / Azure / GCP service ETL Automation - Potential Next Steps AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 17. 17 ● Moving towards real time data processing: - WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark Conclusion AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 18. 18 References ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 19. 19 Questions? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 20. 20 Thank you! email: cj@hybriData.us web: https://hybriData.us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 21. 21 Back Up AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX