SlideShare a Scribd company logo
Big Data, little data, whatever…
Making the world a little smarter
Matt Denesuk
Manager, Natural Resources Modeling and Social Analytics, IBM Research
Partner, IBM Venture Capital Group
Launch of SPE Technical Section, Petroleum
Data-Driven Analytics (PD2A), October 8, 2012
3 big things
• Physical-meets-Digital
• Data-driven approach
• Heterogeneity & integration (data &
approaches)
Physical-meets-digital is driving highly physical industries toward
being more about moving & manipulating data.
INSTRUMENTED
meters, sensors, actuators, IP enablement, ...
INTERCONNECTED
transmitters, networks, taxonomies, ...
+
+
=
3 key things:
Physical-meets-Digital,
Smarter Planet,
Cyber-physical systems, …transmitters, networks, taxonomies, ...
INTELLIGENT
reporting, visualization, predictive analytics &
modeling, decision mgmnt, closed-loop
automation, ...
+
= Cyber-physical systems, …
Heavy, physical industries are increasingly infusing their operations
with information technology, and this will result in higher growth &
productivity trajectories.
2009 – 20102009
ITSpending/Revenue(%)
A 0.5pt increase in IT spend ratio would drive
$31B in incremental IT spend.
Operating Margin (%)
ITSpending/Revenue(%)
Industries where value is generated by moving and manipulating data
have high IT-spend ratios (and high productivity growth)
Data-driven approach
How Big the data are is just one factor…
Analytical
&/or Data
Complexity
Watson
Computer
Chess
Customer
Data Size
Search Engines
Statistical
Translation
Customer
Churn
But bigger data sets let us use a whole new set of
“dumb” tools that can deliver high-value, with
remarkable speed.
Example: Google & Statistical Translation
• Employ language experts to codify
rules, exceptions, vocabulary
mappings, etc.
• Gather and classify lots of
translated docs (websites, UN,
books, …)
Regular Science approach Statistical (data-driven)
approach
Use of language is infinitely
complex, but you can teach a
computer all the rules and
content.
People say the same kind of
things over and over. And
somebody has already
translated it.
mappings, etc.
• Apply transformation to user’s
query.
books, …)
• Identify & match patterns
• Map to user’s translation query.
• Costly, hard to scale
• Can translate nearly any statement
(but accuracy variable)
• In theory, could be better than
human.
• Incrementally low cost, highly
scalable.
• Limited in scope to digitized
docs that have been translated
before
• Limited by skill of human
translators
Heterogeneity & Integration
Two ways of seeing a data set (and the world)
• The data set is record of everything that happened, e.g.,
– All customer transactions last month
– All friendship links between members of social networking site
• Goal is to find interesting patterns, rules, and/or
associations.
Regular Scientist – “get the knowledge”
Computer Scientist – “get the knowledge locked in the data”
Regular Scientist – “get the knowledge”
(See D. Lambert, or R. Mahoney, e.g.)
• The data set is an partial, and often very noisy
reflection of some underlying phenomenon, e.g.,
– Emission spectra from stars
– Battery voltage varying with current, time, and temperature
• Goal is better understanding or ability to predict,
often through a mathematical model
But the approaches & skill sets can
be joined…
Examples of hybrid, integrated approaches
• Simple, well-defined rules, but computationally impossible
to solve (today)
• Relies on position evaluation function.
– Use human-derived chess theory to set up initially.
– But tune by comparing to the best games humans have
played.
• Better than any human (1997)
• Issues
– Saturation, fatigue, psychology, …
Computer Chess
• People’s opinions reflected in many digitized forms
• Articles, blogs, social media, playlists, …
• “Big Data” search & transform capabilities can generate
buzz metrics (“ink”, sentiment, category, …)
• BUT WHAT DO WITH THEM? Need to apply traditional,
small-data modeling approaches.
• Examples
• Pre-launch promotion management for albums
• Movie trailer management
Buzz & the CMO
Hybrid example: “equipment health” models driving operational
optimization
Oil & Gas Scenario
Gas compressor showing signs of trouble
3 months before a scheduled turnaround.
The system indicates that lowering
pressure by 20% will extend health
enough to make it to turnaround.
–But then production levels will not be
sufficient to fulfill scheduled shipment.
11
sufficient to fulfill scheduled shipment.
The system identifies that another
platform can be run for 30 days at 115%
throughput without significant risk before
its next scheduled turnaround.
Coordinated actions taken, and $40M
production loss avoided.
Trying to combine 3 different kinds of modeling
• Data-driven / Machine-learning
– Early days, often not enough data
– Bias limited region of parameter spaces explored (by
management design)
• Knowledge-based
– Rule capture, experience
Initial use to generate hypotheses for other approaches.– Initial use to generate hypotheses for other approaches.
• Physics-based
– Difficult to scale
– Use for seed models
– Locked-up in OEMs?
12
Also simulation, for what-if
analyses, and verification See Peng et al.
Example: Condition-based Management
Multiple sensor data
streams
Outcomes
Environmental data
Higher-
order
“Events”
&
measures
Probabilistic Models /
Rule Mining
Actionable
Rules,
measures,
& options
Management system
• Maintenance optimization
• Use / output optimization
• Energy / comfort / safety
balancing
Physical Models
Example process:
Text data
Image data
13
Broad range of applications.
Bridges
Water
Infrastructure
Railroads
Aircraft
Mining
Equipment
Oil
Pipelines
Oil
Platforms
Steel
manufacture
Trucking Mobile
ComputersIT Infrastructure
Heavy Infrastructure Business Equipment /
Consumer Products
Human Health?
Home
AppliancesBuildings
(HVAC, Elevators,
Lighting, …)
Photocopiers
Refrigeration
Business value requires both Modeling and Process
Integration
• Many organization not used
to making data-driven
decisions.
– Culturally
– Process-wise
• Mathematical proof of
business value not initially
ProcessIntegration
1. Integration pilot &
evaluation.
2. Deploy/scale
Capability & value
growth
business value not initially
compelling
• Example: CbM & false
positives.
• Initial deployment very
risky!
14
Modeling & Analytics
ProcessIntegration
Models developed &
tested
2. Deploy/scale
14
Key points
• Physical-meets-Digital is happening
• This makes data-driven approaches much more
important
• But most real problems require integration of• But most real problems require integration of
very different approaches and data types
– Not easy to build these teams
• The realities of current culture & process must be
addressed early.

More Related Content

What's hot

Building Analytics: Energy Information Systems
Building Analytics: Energy Information SystemsBuilding Analytics: Energy Information Systems
Building Analytics: Energy Information Systems
E Source Companies, LLC
 
G.E.T. Smart - Smart Grid: IBM Presentation
G.E.T. Smart - Smart Grid: IBM PresentationG.E.T. Smart - Smart Grid: IBM Presentation
G.E.T. Smart - Smart Grid: IBM Presentation
Washington Technology Industry Association
 
The Soft Grid 2013 Opening Presentation
The Soft Grid 2013 Opening PresentationThe Soft Grid 2013 Opening Presentation
The Soft Grid 2013 Opening Presentation
GTMevents
 
Top 10 Green IT Initiatives
Top 10 Green IT InitiativesTop 10 Green IT Initiatives
Top 10 Green IT Initiatives
SalesQuest
 
Green Computing Emerging Issues in IT
Green Computing Emerging Issues in ITGreen Computing Emerging Issues in IT
Green Computing Emerging Issues in IT
ijtsrd
 
IT OT Integration_Vishnu_Murali_05262016_UPDATED
IT OT Integration_Vishnu_Murali_05262016_UPDATEDIT OT Integration_Vishnu_Murali_05262016_UPDATED
IT OT Integration_Vishnu_Murali_05262016_UPDATED
Vishnu Murali
 
Big trends, practical implications
Big trends, practical implicationsBig trends, practical implications
Big trends, practical implications
Schneider Electric
 
Lift 2016 - Denis Slieker's slides
Lift 2016 - Denis Slieker's slidesLift 2016 - Denis Slieker's slides
Lift 2016 - Denis Slieker's slides
Fing
 
Big Data Techcon 2014
Big Data Techcon 2014Big Data Techcon 2014
Big Data Techcon 2014
Samir Lad
 
A “Smart” Approach to Big Data in the Energy Industry
A “Smart” Approach to Big Data in the Energy IndustryA “Smart” Approach to Big Data in the Energy Industry
A “Smart” Approach to Big Data in the Energy Industry
SAP Analytics
 
GE Total Efficiency Datacenter vFINAL
GE Total Efficiency Datacenter vFINALGE Total Efficiency Datacenter vFINAL
GE Total Efficiency Datacenter vFINAL
Trent Waterhouse
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & Utilities
Anders Quitzau
 
Green IT Market Trends
Green IT Market TrendsGreen IT Market Trends
Green IT Market Trends
petefoster
 
Green IT
Green ITGreen IT
Green IT
ACC626_gl
 
Sustainability in IT
Sustainability in ITSustainability in IT
Sustainability in IT
melodysmithjones
 
From primitive to connected building cre&more ifma_realty 19-05-15
From primitive to connected building cre&more ifma_realty 19-05-15From primitive to connected building cre&more ifma_realty 19-05-15
From primitive to connected building cre&more ifma_realty 19-05-15
Muriel Walter
 
Smart Grid Analytics
Smart Grid AnalyticsSmart Grid Analytics
Smart Grid Analytics
NSW Environment and Planning
 
ENG3329_Environmental_Writing_greening_of_datacenters
ENG3329_Environmental_Writing_greening_of_datacentersENG3329_Environmental_Writing_greening_of_datacenters
ENG3329_Environmental_Writing_greening_of_datacenters
Eric Roberson
 
Green IT: Moving Beyond the 2% Solution - Doug Neal
Green IT: Moving Beyond the 2% Solution - Doug NealGreen IT: Moving Beyond the 2% Solution - Doug Neal
Green IT: Moving Beyond the 2% Solution - Doug Neal
catherinewall
 
Safety and asset integrity for deepwater
Safety and asset integrity for deepwaterSafety and asset integrity for deepwater
Safety and asset integrity for deepwater
Advisian
 

What's hot (20)

Building Analytics: Energy Information Systems
Building Analytics: Energy Information SystemsBuilding Analytics: Energy Information Systems
Building Analytics: Energy Information Systems
 
G.E.T. Smart - Smart Grid: IBM Presentation
G.E.T. Smart - Smart Grid: IBM PresentationG.E.T. Smart - Smart Grid: IBM Presentation
G.E.T. Smart - Smart Grid: IBM Presentation
 
The Soft Grid 2013 Opening Presentation
The Soft Grid 2013 Opening PresentationThe Soft Grid 2013 Opening Presentation
The Soft Grid 2013 Opening Presentation
 
Top 10 Green IT Initiatives
Top 10 Green IT InitiativesTop 10 Green IT Initiatives
Top 10 Green IT Initiatives
 
Green Computing Emerging Issues in IT
Green Computing Emerging Issues in ITGreen Computing Emerging Issues in IT
Green Computing Emerging Issues in IT
 
IT OT Integration_Vishnu_Murali_05262016_UPDATED
IT OT Integration_Vishnu_Murali_05262016_UPDATEDIT OT Integration_Vishnu_Murali_05262016_UPDATED
IT OT Integration_Vishnu_Murali_05262016_UPDATED
 
Big trends, practical implications
Big trends, practical implicationsBig trends, practical implications
Big trends, practical implications
 
Lift 2016 - Denis Slieker's slides
Lift 2016 - Denis Slieker's slidesLift 2016 - Denis Slieker's slides
Lift 2016 - Denis Slieker's slides
 
Big Data Techcon 2014
Big Data Techcon 2014Big Data Techcon 2014
Big Data Techcon 2014
 
A “Smart” Approach to Big Data in the Energy Industry
A “Smart” Approach to Big Data in the Energy IndustryA “Smart” Approach to Big Data in the Energy Industry
A “Smart” Approach to Big Data in the Energy Industry
 
GE Total Efficiency Datacenter vFINAL
GE Total Efficiency Datacenter vFINALGE Total Efficiency Datacenter vFINAL
GE Total Efficiency Datacenter vFINAL
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & Utilities
 
Green IT Market Trends
Green IT Market TrendsGreen IT Market Trends
Green IT Market Trends
 
Green IT
Green ITGreen IT
Green IT
 
Sustainability in IT
Sustainability in ITSustainability in IT
Sustainability in IT
 
From primitive to connected building cre&more ifma_realty 19-05-15
From primitive to connected building cre&more ifma_realty 19-05-15From primitive to connected building cre&more ifma_realty 19-05-15
From primitive to connected building cre&more ifma_realty 19-05-15
 
Smart Grid Analytics
Smart Grid AnalyticsSmart Grid Analytics
Smart Grid Analytics
 
ENG3329_Environmental_Writing_greening_of_datacenters
ENG3329_Environmental_Writing_greening_of_datacentersENG3329_Environmental_Writing_greening_of_datacenters
ENG3329_Environmental_Writing_greening_of_datacenters
 
Green IT: Moving Beyond the 2% Solution - Doug Neal
Green IT: Moving Beyond the 2% Solution - Doug NealGreen IT: Moving Beyond the 2% Solution - Doug Neal
Green IT: Moving Beyond the 2% Solution - Doug Neal
 
Safety and asset integrity for deepwater
Safety and asset integrity for deepwaterSafety and asset integrity for deepwater
Safety and asset integrity for deepwater
 

Similar to bigdatalittledataspe-pd2aoct2012denesuk-140321031823-phpapp02

Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
mattdenesuk
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain management
Luis Cabrera
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
Chris Dwan
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Chief Analytics Officer Forum
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015a
vonmcconnell
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
Editor IJMTER
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
JhimarPeredoJurado
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
EDRG12_Re.doc
EDRG12_Re.docEDRG12_Re.doc
EDRG12_Re.doc
butest
 
EDRG12_Re.doc
EDRG12_Re.docEDRG12_Re.doc
EDRG12_Re.doc
butest
 
12209508.ppt
12209508.ppt12209508.ppt
12209508.ppt
RCTan1
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
Shiva Krishna Chandra Shekar
 
Big Data
Big DataBig Data
Big Data
Seminar Links
 
inaugural lecture Kang
inaugural lecture Kanginaugural lecture Kang
inaugural lecture Kang
Jing Deng
 
MIS.pptx
MIS.pptxMIS.pptx

Similar to bigdatalittledataspe-pd2aoct2012denesuk-140321031823-phpapp02 (20)

Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain management
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015a
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
EDRG12_Re.doc
EDRG12_Re.docEDRG12_Re.doc
EDRG12_Re.doc
 
EDRG12_Re.doc
EDRG12_Re.docEDRG12_Re.doc
EDRG12_Re.doc
 
12209508.ppt
12209508.ppt12209508.ppt
12209508.ppt
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
 
Big Data
Big DataBig Data
Big Data
 
inaugural lecture Kang
inaugural lecture Kanginaugural lecture Kang
inaugural lecture Kang
 
MIS.pptx
MIS.pptxMIS.pptx
MIS.pptx
 

bigdatalittledataspe-pd2aoct2012denesuk-140321031823-phpapp02

  • 1. Big Data, little data, whatever… Making the world a little smarter Matt Denesuk Manager, Natural Resources Modeling and Social Analytics, IBM Research Partner, IBM Venture Capital Group Launch of SPE Technical Section, Petroleum Data-Driven Analytics (PD2A), October 8, 2012
  • 2. 3 big things • Physical-meets-Digital • Data-driven approach • Heterogeneity & integration (data & approaches)
  • 3. Physical-meets-digital is driving highly physical industries toward being more about moving & manipulating data. INSTRUMENTED meters, sensors, actuators, IP enablement, ... INTERCONNECTED transmitters, networks, taxonomies, ... + + = 3 key things: Physical-meets-Digital, Smarter Planet, Cyber-physical systems, …transmitters, networks, taxonomies, ... INTELLIGENT reporting, visualization, predictive analytics & modeling, decision mgmnt, closed-loop automation, ... + = Cyber-physical systems, …
  • 4. Heavy, physical industries are increasingly infusing their operations with information technology, and this will result in higher growth & productivity trajectories. 2009 – 20102009 ITSpending/Revenue(%) A 0.5pt increase in IT spend ratio would drive $31B in incremental IT spend. Operating Margin (%) ITSpending/Revenue(%) Industries where value is generated by moving and manipulating data have high IT-spend ratios (and high productivity growth)
  • 6. How Big the data are is just one factor… Analytical &/or Data Complexity Watson Computer Chess Customer Data Size Search Engines Statistical Translation Customer Churn But bigger data sets let us use a whole new set of “dumb” tools that can deliver high-value, with remarkable speed.
  • 7. Example: Google & Statistical Translation • Employ language experts to codify rules, exceptions, vocabulary mappings, etc. • Gather and classify lots of translated docs (websites, UN, books, …) Regular Science approach Statistical (data-driven) approach Use of language is infinitely complex, but you can teach a computer all the rules and content. People say the same kind of things over and over. And somebody has already translated it. mappings, etc. • Apply transformation to user’s query. books, …) • Identify & match patterns • Map to user’s translation query. • Costly, hard to scale • Can translate nearly any statement (but accuracy variable) • In theory, could be better than human. • Incrementally low cost, highly scalable. • Limited in scope to digitized docs that have been translated before • Limited by skill of human translators
  • 9. Two ways of seeing a data set (and the world) • The data set is record of everything that happened, e.g., – All customer transactions last month – All friendship links between members of social networking site • Goal is to find interesting patterns, rules, and/or associations. Regular Scientist – “get the knowledge” Computer Scientist – “get the knowledge locked in the data” Regular Scientist – “get the knowledge” (See D. Lambert, or R. Mahoney, e.g.) • The data set is an partial, and often very noisy reflection of some underlying phenomenon, e.g., – Emission spectra from stars – Battery voltage varying with current, time, and temperature • Goal is better understanding or ability to predict, often through a mathematical model But the approaches & skill sets can be joined…
  • 10. Examples of hybrid, integrated approaches • Simple, well-defined rules, but computationally impossible to solve (today) • Relies on position evaluation function. – Use human-derived chess theory to set up initially. – But tune by comparing to the best games humans have played. • Better than any human (1997) • Issues – Saturation, fatigue, psychology, … Computer Chess • People’s opinions reflected in many digitized forms • Articles, blogs, social media, playlists, … • “Big Data” search & transform capabilities can generate buzz metrics (“ink”, sentiment, category, …) • BUT WHAT DO WITH THEM? Need to apply traditional, small-data modeling approaches. • Examples • Pre-launch promotion management for albums • Movie trailer management Buzz & the CMO
  • 11. Hybrid example: “equipment health” models driving operational optimization Oil & Gas Scenario Gas compressor showing signs of trouble 3 months before a scheduled turnaround. The system indicates that lowering pressure by 20% will extend health enough to make it to turnaround. –But then production levels will not be sufficient to fulfill scheduled shipment. 11 sufficient to fulfill scheduled shipment. The system identifies that another platform can be run for 30 days at 115% throughput without significant risk before its next scheduled turnaround. Coordinated actions taken, and $40M production loss avoided.
  • 12. Trying to combine 3 different kinds of modeling • Data-driven / Machine-learning – Early days, often not enough data – Bias limited region of parameter spaces explored (by management design) • Knowledge-based – Rule capture, experience Initial use to generate hypotheses for other approaches.– Initial use to generate hypotheses for other approaches. • Physics-based – Difficult to scale – Use for seed models – Locked-up in OEMs? 12 Also simulation, for what-if analyses, and verification See Peng et al.
  • 13. Example: Condition-based Management Multiple sensor data streams Outcomes Environmental data Higher- order “Events” & measures Probabilistic Models / Rule Mining Actionable Rules, measures, & options Management system • Maintenance optimization • Use / output optimization • Energy / comfort / safety balancing Physical Models Example process: Text data Image data 13 Broad range of applications. Bridges Water Infrastructure Railroads Aircraft Mining Equipment Oil Pipelines Oil Platforms Steel manufacture Trucking Mobile ComputersIT Infrastructure Heavy Infrastructure Business Equipment / Consumer Products Human Health? Home AppliancesBuildings (HVAC, Elevators, Lighting, …) Photocopiers Refrigeration
  • 14. Business value requires both Modeling and Process Integration • Many organization not used to making data-driven decisions. – Culturally – Process-wise • Mathematical proof of business value not initially ProcessIntegration 1. Integration pilot & evaluation. 2. Deploy/scale Capability & value growth business value not initially compelling • Example: CbM & false positives. • Initial deployment very risky! 14 Modeling & Analytics ProcessIntegration Models developed & tested 2. Deploy/scale 14
  • 15. Key points • Physical-meets-Digital is happening • This makes data-driven approaches much more important • But most real problems require integration of• But most real problems require integration of very different approaches and data types – Not easy to build these teams • The realities of current culture & process must be addressed early.