E&P organizations are turning more attention to accumulated data to enhance operating efficiency, safety, and recovery. The computing paradigm is shifting, the O&G paradigm is shifting, and the rise of the machine learning paradigm requires careful attention to top-down integrated systems engineering. A system approach will be presented to stimulate out-of-the-box thinking to address the machine learning paradigm.
Machine Learning encompasses data acquisition, transmission, retention, analysis, and reduction. The expected outgrowth of 24x7 data systems and operations centers is Knowledge Engineering and Data Intensive Analytics AKA Machine Learning. This presentation will develop and apply Machine Learning concepts to the Upstream O&G industry. Specific focus will be given to the fundamental concepts and definitions of Machine Learning along with the application of Machine Learning.
Efficient O&G does not suffice in an industry downturn – effective investment in time and effort is required to rise above the pack
Production analysis need not be mystical; it should not be rote
Nuance and subtle variations provide leading indicators into impending production issues
Decline curves, certainly crucial, must be analyzed in context
Case-based, topological analysis, rule inference, curve plotting solutions are common solutions, but fall short
Application of nuance analysis within environment of Data-Intensive Scientific Discovery
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
Describes relations between Big Data and Big Simulation Applications and how this can guide a Big Data - Exascale (Big Simulation) Convergence (as in National Strategic Computing Initiative) and lead to a "complete" set of Benchmarks. Basic idea is to view use cases as "Data" + "Model"
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Machine Learning encompasses data acquisition, transmission, retention, analysis, and reduction. The expected outgrowth of 24x7 data systems and operations centers is Knowledge Engineering and Data Intensive Analytics AKA Machine Learning. This presentation will develop and apply Machine Learning concepts to the Upstream O&G industry. Specific focus will be given to the fundamental concepts and definitions of Machine Learning along with the application of Machine Learning.
Efficient O&G does not suffice in an industry downturn – effective investment in time and effort is required to rise above the pack
Production analysis need not be mystical; it should not be rote
Nuance and subtle variations provide leading indicators into impending production issues
Decline curves, certainly crucial, must be analyzed in context
Case-based, topological analysis, rule inference, curve plotting solutions are common solutions, but fall short
Application of nuance analysis within environment of Data-Intensive Scientific Discovery
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
Describes relations between Big Data and Big Simulation Applications and how this can guide a Big Data - Exascale (Big Simulation) Convergence (as in National Strategic Computing Initiative) and lead to a "complete" set of Benchmarks. Basic idea is to view use cases as "Data" + "Model"
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
This talk supports the Ph.D. in Computational & Data Enabled Science & Engineering at Jackson State University. It describes related educational activities at Indiana University, the Big Data phenomena, jobs and HPC and Big Data computations. It then describes how HPC and Big Data can be converged into a single theme.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
Ogres classify Big Data applications by multiple facets – each with several exemplars and features. This gives a
guide to breadth and depth of Big Data and allows one to examine which ogres a particular architecture/software support.
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
Drawing on cutting edge examples from the University of Bristol and the City of Bristol, Simon will discuss innovative applications of data science that derive business value from open data through enriching and integrating with confidential 'closed data'. He also highlights recent technological advances that are enabling open data science on highly sensitive closed data.
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
This talk supports the Ph.D. in Computational & Data Enabled Science & Engineering at Jackson State University. It describes related educational activities at Indiana University, the Big Data phenomena, jobs and HPC and Big Data computations. It then describes how HPC and Big Data can be converged into a single theme.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
Ogres classify Big Data applications by multiple facets – each with several exemplars and features. This gives a
guide to breadth and depth of Big Data and allows one to examine which ogres a particular architecture/software support.
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
Drawing on cutting edge examples from the University of Bristol and the City of Bristol, Simon will discuss innovative applications of data science that derive business value from open data through enriching and integrating with confidential 'closed data'. He also highlights recent technological advances that are enabling open data science on highly sensitive closed data.
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Machine Learning in Oil and Gas - April 18-19, 2018Mark Reynolds
The New IT Paradigm in Data Driven Energy
Three years ago, Machine Learning, 4th Scientific Paradigm, and eScience were seldom discussed in O&G. Today, these topics and the role of data, data knowledge, and artificial intelligence are topics found in planning, engineering, and improvement. The New IT Paradigm and the pragmatic realities of Data Driven Energy are explored in this presentation.
Azure Machine Learning is a cloud predictive analytics service that makes it possible to quickly create and deploy predictive models as analytics solutions. Getting started is easy. The first working prototype is an easy evening project. But Azure Machine Learning will grow to extremely complex projects. This session will demonstrate initial projects utilizing multiple data science principals.
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
At eScience in the Cloud 2014, Redmond WA, April 30 2014
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations.
We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on combining HPC and the Apache software stack that is well used in modern cloud computing.
Initial results on Azure and HPC Clusters are presented
How Data Science Can Grow Your Business?Noam Cohen
What is data science?
How is it used in the industry?
DS methodology and life cycle
Who are the Data-team members?
Limitations and caveats
(**Google slides upload didn't go well)
DataOps @ Scale: A Modern Framework for Data Management in the Public SectorTamrMarketing
Within the last 6 months, the U.S. agencies have begun defining a “Data Science Occupational Series”.
This means adding the term “(Data Scientist)” at the end of a job title to increase the odds of finding a candidate that understands data.
Watch the full presentation: https://resources.tamr.com/govdataops
Introduction to Data Mining(Chapter 1)......Data Mining concepts and techniques by R. Deepa (IT) ..Batch(2016-2019) published on Oct-13 2018 from NS college of Arts and Science,Theni
Data mining and Machine learning expained in jargon free & lucid languageq-Maxim
Data mining and Machine learning explained in jargon free & lucid language.
By reading one can get some intuition about what data mining and machine learning is all about
APPLY IT IN THEIR OWN WORK
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. 1
Introduction to Southwestern Energy
Southwestern Energy Company (NYSE: SWN) is a
leading natural gas and oil company with operations
predominantly in the United States, engaged in
exploration, development and production activities,
including related natural gas gathering and marketing.
Source: http://www.swn.com/
3. 2
Machine Learning, Deep Learning, AI
Roadmap to Constructing a
Top Down Machine Learning Paradigm
E&P organizations are turning more attention to accumulated data to
enhance operating efficiency, safety, and recovery. The computing
paradigm is shifting, the O&G paradigm is shifting, and the rise of the
machine learning paradigm requires careful attention to top-down
integrated systems engineering. A system approach will be presented to
stimulate out-of-the-box thinking to address the machine learning
paradigm.
4. 3
Past Paradigm Shifts
• Seismic
• Horizontal Drilling
• Off Shore
• Factory Drilling
Paradigm Shifts in Process
• Big Crew Change
• Mobility (anytime,
anywhere)
• Big Data
• Machine Learning
The Shifting O&G Paradigm
Source: Mark Reynolds, compilation
5. 4
Changing Paradigms
• Computing Paradigm
(4th Paradigm / eScience)
• O&G Paradigm
(Shale 2.0)
New Paradigms
• Machine Learning Paradigm
Paradigms We Are Discussing Today
6. 5
The Structure of Scientific Revolutions
• Normal Science
– Equilibrium, harmony
• Model Drift
– Outliers cease to be outliers
– Ripples turn to discontinuity
• Model Crisis
– Alternate methods permitted
– Out-of-the-box reconsidered
• Model Revolution
– New model becomes the new-normal
• Paradigm Change
– (Textbooks play catch-up)
Source: Thomas Kuhn, (1962) The Structure of Scientific Revolutions. University of Chicago Press
Mark Reynolds, compilation
Normal
Science
Model Drift
(Anomaly)
Model
CrisisModel
Revolution
Paradigm
Change Kuhn
Cycle
7. 6
The Shifting Computing Paradigm
Descriptive
and
Formulaic
Hypothetical
and
Investigative
Expertise
Driven
Models and
Cases
Multivariant
Differential
Modelling
Source: Mark Reynolds, compilation
eScience
Traditional Science
8. 7
The Shifting Computing Paradigms
• O&G is where we found itEmpirical
• O&G is where we expect itTheoretical
• O&G is where we estimate itComputational
• O&G is where we infer it
Data
Exploration
Source: Mark Reynolds, compilation
9. 8
The Machine Learning Paradigm
“ A computer program is said to learn from experience
(E) with respect to some class of tasks (T) and
performance measure (P), if its performance at tasks in
T, as measured by P, improves with experience E. ”
~Tom Mitchell
Source: Tom Mitchell, Mitchell, T. (1997). Machine Learning, McGraw Hill.
Mark Reynolds, compilation
Machine Learning is the “Extraction of Wisdom
by Understanding the underlying Data”
10. 9
The Catalyst
• Data captured by
instruments
• Data generated by
simulations
• Data acquired by
sensor networks
The Destination
• Solutions from data analysis
• Solutions from data mining
• Solutions from visualization
• Solutions from drill down
• Solutions for bottom line
• Solutions using eScience
Machine Learning in the 4th Paradigm
Source: Mark Reynolds, compilation
eScience and the Fourth Paradigm: Data-Intensive Scientific Discovery and Digital Preservation, Tony Hey, Microsoft Research
http://www.alliancepermanentaccess.org/wp-content/uploads/2011/12/apa2011/15_%28Nov11%29TonyHey-APA%20Meeting.pdf
“ eScience is the set of tools and technologies
to support data federation and collaboration ”
~ Jim Grey
11. 10
Predictive Analytics
• Focuses on Prediction
– Based on Known Properties
– Learned from Training Data
Data Mining
• Focuses on Discovery
– Unknown Properties in Data
– The Analysis Phase of
Knowledge Discovery
Precursors to Machine Learning
Machine Learning is the “Extraction of Wisdom
by Understanding the underlying Data”
~Mark Reynolds
Source: Mark Reynolds, compilation
12. 11
The Machine Learning Paradigm
Unsupervised Learning
Supervised Learning
Semi-Supervised Learning
Reinforcement Learning
24/7
Predictive
Analytics
Data
Mining
Machine
Learning
AI
Source: Mark Reynolds, compilation
13. 12
Principal Concepts in Machine Learning
• Unsupervised Learning
– Data is unlabeled
• Supervised Learning
– Teach and train with data that is well labeled with a
defined output
• Reinforcement Learning
– Validity of data alignment is served as feedback
• Semi-Supervised Learning
– Some of the data is labeled, some is unlabeled
Source: Mark Reynolds, compilation
15. 14
The Bridge Into Machine Learning
Today Tomorrow
Integrated Systems Engineering
16. 15
Integrated Systems Engineering
Systems &
Knowledge
Engineer
O&G
Systems
Control
Systems
Remote
Systems
Information
Systems
Embedded
Systems
Robotic
Systems
Data
Fusion
Real-Time
Systems
Look-Back
Analysis
Look-
Ahead
Systems
Land and Regulatory
Geology Geophysics
Drilling Engineering
Completion Engineering
Production Engineering
Reservoir Engineering
Systems Engineering
Source: Mark Reynolds, compilation
17. 16
Integrated Engineering – Top-Down
• Engineering the Source
– Signals, content, and
characterizations
• Engineering the Data
– Address errant data
– Address valid spurious data
– Address data quality
• Engineering the Store
– Repository
– Recall and Reporting
– Representations
Data Acquisition
Data Transmission
Data Retention
Data Analysis
Data Reduction
Source: Mark Reynolds, compilation
18. 17
Integrated Engineering – Top-Down
• Engineering the Store
– Data distribution
– Data staging
• Engineering the Recall
– Simple query
– Cube v Matrix
• Engineering the Use Case
– Destination: human
– Destination: machine
Classification
Regression
Clustering
Density Estimation
Dimensional Reduction
19. 18
Integrated Engineering – System Flow
Acquire Analyze Annunciate Archive Analyze Anticipate Apply
Data
Information
Visualization
Knowledge
Forensics
Understanding
Analysis &
Mining
Wisdom
Anticipating
Application
Creating Informational Accessibility and Transparency
Discovering Experiential Performance Improvements
Segmenting Processes and Process Results
Replacing Human Decision w/ Automated Algorithms
Innovating New Models, Products, Services
Source: Mark Reynolds, compilation
20. 19
Integrated Engineering – Top-Down
19
Data
Modeling
Proactive &
Closed-Loop
Systems
Mining and
Analytics
Forensics
Control
Visualization
and
Observation
Source
Capture and
Utilization
• Intelligence during operations (Observation and Anticipation)
• Intelligence reviewing operations (Forensic)
• Intelligence planning operations (Historical and Analytical)
Source: Mark Reynolds, compilation
Well
Plan RT
Prod
RT
Drill
Geo-
steer
RT
Frac
Daily
Rpts
AFE
21. 20
Applied Machine Learning 101
Training
Data
Pre-
Processing
Learning
Error
Analysis
Model
Learning (Phase 1)
Prediction (Phase 2)
New Data Model
Predictable
Result
22. 21
Representative Algorithms
• Decision Tree Learning
– Maps observation to conclusions
• Association Rule Learning
– Discovering interesting relations
• Artificial Neural Networks
– Incremental function modules
• Inductive Logic Programming
– Rule based representations for
input --> output
• Support Vector Machines
– Classification and regression
• Clustering
– Assignment of observations to
clusters
• Bayesian Networks
– Probabilistic models correlating
variables
• Reinforcement Learning
– Finds policy to map states to
desired outcome
• Representation Learning
– Principal component analysis
• Similarity & Metric Learning
– Pairs of examples train others
• Sparse Dictionary Learning
– Datum as linear combinations
• Genetic Algorithms
– Mimics natural heuristics
23. 22
Machine Learning: Data Diversity
• Macro (or field-level)
– Spatial
– Temporal
• Pad (or offset)
– Spatial
– Temporal
• Well (or wellbore)
– Spatial
– Temporal
• External
– Uploads
– Political, Climate, etc
• The 3 Cs of Data Quality
– Consistency
– Correctness
– Completeness
– [#4] Currency
– [#5] Conformity
Source: Mark Reynolds, compilation
Data Diversity - Spatial, Temporal, Referential
24. 23
The Fast Data ecosystem in O&G
Land
Drilling
Reservoir Completion
Water
Production
Steering Regulatory
Midstream
Source: Assorted web images
25. 24
Algorithmic Approaches (revisited)
• Decision Tree Learning
– Maps observation to conclusions
• Association Rule Learning
– Discovering interesting relations
• Artificial Neural Networks
– Incremental function modules
• Inductive Logic Programming
– Rule based representations for
input --> output
• Support Vector Machines
– Classification and regression
• Clustering
– Assignment of observations to
clusters
• Bayesian Networks
– Probabilistic models correlating
variables
• Reinforcement Learning
– Finds policy to map states to
desired outcome
• Representation Learning
– Principal component analysis
• Similarity & Metric Learning
– Pairs of examples train others
• Sparse Dictionary Learning
– Datum as linear combinations
• Genetic Algorithms
– Mimics natural heuristics
26. 25
Keep Your Eye on the Prize
Data
Information
Knowledge
Understanding
Wisdom
Application
The question is NOT
“How can we … ?”
But instead
“What is the objective?”
( or “Why?” )
27. 26
Mark Reynolds
Mark Reynolds Vitae
• Southwestern Energy
• Lone Star College
• Intent Driven Designs
• Scan Systems
• Sikorsky Aircraft
• General Dynamics
• Southwestern Energy Email
– Mark_Reynolds@swn.com