Big data, little data, whatever

Big Data, little data, whatever…
Making the world a little smarter
Matt Denesuk
Manager, Natural Resources Modeling and Social Analytics, IBM Research
Partner, IBM Venture Capital Group
Launch of SPE Technical Section, Petroleum
Data-Driven Analytics (PD2A), October 8, 2012

3 big things
• Physical-meets-Digital
• Data-driven approach
• Heterogeneity & integration (data &
approaches)

Physical-meets-digital is driving highly physical industries toward
being more about moving & manipulating data.
INSTRUMENTED
meters, sensors, actuators, IP enablement, ...
INTERCONNECTED
transmitters, networks, taxonomies, ...
+
+
=
3 key things:
Physical-meets-Digital,
Smarter Planet,
Cyber-physical systems, …transmitters, networks, taxonomies, ...
INTELLIGENT
reporting, visualization, predictive analytics &
modeling, decision mgmnt, closed-loop
automation, ...
+
= Cyber-physical systems, …

Heavy, physical industries are increasingly infusing their operations
with information technology, and this will result in higher growth &
productivity trajectories.
2009 – 20102009
ITSpending/Revenue(%)
A 0.5pt increase in IT spend ratio would drive
$31B in incremental IT spend.
Operating Margin (%)
ITSpending/Revenue(%)
Industries where value is generated by moving and manipulating data
have high IT-spend ratios (and high productivity growth)

How Big the data are is just one factor…
Analytical
&/or Data
Complexity
Watson
Computer
Chess
Customer
Data Size
Search Engines
Statistical
Translation
Customer
Churn
But bigger data sets let us use a whole new set of
“dumb” tools that can deliver high-value, with
remarkable speed.

Example: Google & Statistical Translation
• Employ language experts to codify
rules, exceptions, vocabulary
mappings, etc.
• Gather and classify lots of
translated docs (websites, UN,
books, …)
Regular Science approach Statistical (data-driven)
approach
Use of language is infinitely
complex, but you can teach a
computer all the rules and
content.
People say the same kind of
things over and over. And
somebody has already
translated it.
mappings, etc.
• Apply transformation to user’s
query.
books, …)
• Identify & match patterns
• Map to user’s translation query.
• Costly, hard to scale
• Can translate nearly any statement
(but accuracy variable)
• In theory, could be better than
human.
• Incrementally low cost, highly
scalable.
• Limited in scope to digitized
docs that have been translated
before
• Limited by skill of human
translators

Two ways of seeing a data set (and the world)
• The data set is record of everything that happened, e.g.,
– All customer transactions last month
– All friendship links between members of social networking site
• Goal is to find interesting patterns, rules, and/or
associations.
Regular Scientist – “get the knowledge”
Computer Scientist – “get the knowledge locked in the data”
Regular Scientist – “get the knowledge”
(See D. Lambert, or R. Mahoney, e.g.)
• The data set is an partial, and often very noisy
reflection of some underlying phenomenon, e.g.,
– Emission spectra from stars
– Battery voltage varying with current, time, and temperature
• Goal is better understanding or ability to predict,
often through a mathematical model
But the approaches & skill sets can
be joined…

Examples of hybrid, integrated approaches
• Simple, well-defined rules, but computationally impossible
to solve (today)
• Relies on position evaluation function.
– Use human-derived chess theory to set up initially.
– But tune by comparing to the best games humans have
played.
• Better than any human (1997)
• Issues
– Saturation, fatigue, psychology, …
Computer Chess
• People’s opinions reflected in many digitized forms
• Articles, blogs, social media, playlists, …
• “Big Data” search & transform capabilities can generate
buzz metrics (“ink”, sentiment, category, …)
• BUT WHAT DO WITH THEM? Need to apply traditional,
small-data modeling approaches.
• Examples
• Pre-launch promotion management for albums
• Movie trailer management
Buzz & the CMO

Hybrid example: “equipment health” models driving operational
optimization
Oil & Gas Scenario
Gas compressor showing signs of trouble
3 months before a scheduled turnaround.
The system indicates that lowering
pressure by 20% will extend health
enough to make it to turnaround.
–But then production levels will not be
sufficient to fulfill scheduled shipment.
11
sufficient to fulfill scheduled shipment.
The system identifies that another
platform can be run for 30 days at 115%
throughput without significant risk before
its next scheduled turnaround.
Coordinated actions taken, and $40M
production loss avoided.

Trying to combine 3 different kinds of modeling
• Data-driven / Machine-learning
– Early days, often not enough data
– Bias limited region of parameter spaces explored (by
management design)
• Knowledge-based
– Rule capture, experience
Initial use to generate hypotheses for other approaches.– Initial use to generate hypotheses for other approaches.
• Physics-based
– Difficult to scale
– Use for seed models
– Locked-up in OEMs?
12
Also simulation, for what-if
analyses, and verification See Peng et al.

Example: Condition-based Management
Multiple sensor data
streams
Outcomes
Environmental data
Higher-
order
“Events”
&
measures
Probabilistic Models /
Rule Mining
Actionable
Rules,
measures,
& options
Management system
• Maintenance optimization
• Use / output optimization
• Energy / comfort / safety
balancing
Physical Models
Example process:
Text data
Image data
13
Broad range of applications.
Bridges
Water
Infrastructure
Railroads
Aircraft
Mining
Equipment
Oil
Pipelines
Oil
Platforms
Steel
manufacture
Trucking Mobile
ComputersIT Infrastructure
Heavy Infrastructure Business Equipment /
Consumer Products
Human Health?
Home
AppliancesBuildings
(HVAC, Elevators,
Lighting, …)
Photocopiers
Refrigeration

Business value requires both Modeling and Process
Integration
• Many organization not used
to making data-driven
decisions.
– Culturally
– Process-wise
• Mathematical proof of
business value not initially
ProcessIntegration
1. Integration pilot &
evaluation.
2. Deploy/scale
Capability & value
growth
business value not initially
compelling
• Example: CbM & false
positives.
• Initial deployment very
risky!
14
Modeling & Analytics
ProcessIntegration
Models developed &
tested
2. Deploy/scale
14

Key points
• Physical-meets-Digital is happening
• This makes data-driven approaches much more
important
• But most real problems require integration of• But most real problems require integration of
very different approaches and data types
– Not easy to build these teams
• The realities of current culture & process must be
addressed early.

Big data, little data, whatever

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Big data, little data, whatever

Similar to Big data, little data, whatever (20)

Recently uploaded

Recently uploaded (20)

Big data, little data, whatever