A Retrospective of
the Future
Trends in the data industry
July, 2017
Mark Madsen
www.ThirdNature.net
@markmadsen
© Third Nature, Inc.
Most BI tech is a commodity, a cost of doing business
© Third Nature, Inc.
Adoption and decline – everything gets old
For most businesses, more than 80% of IT budget is
dedicated to basic infrastructure
…and more than 60% of IT labor cost goes to keep things
running, i.e. basic operations and support.
Strategic
Commodity
© Third Nature, Inc.
It Wasn’t Always This Way
As technologies mature and spread to competitors, they
cease to be differentiators. Unfortunately, this is what
packaged software vendors do to your “best practice.”
CommodityCommodity
The old advantages becomes the new focus of cost reduction.
For example, your data warehouse.
Strategic Strategic
© Third Nature, Inc.
Time
Cumulative
Adoption
Market Adoption
Hard work
Tipping point
© Third Nature, Inc.
Product
Maturity
Some Ideas Aren’t That Good
End of LifeTimeNew innovation
Some ideas aren’t that
good, like object
databases in the 1990s
© Third Nature, Inc.
These Curves Can Explain a Lot
Time
Product
Maturity
Analyst revenue
predictions
Executive interest“Gartner Gap”
© Third Nature, Inc.
The “experts” often have a foreshortened view
“Open source is not worth paying attention to.”
A Gartner analyst talking about the database and analytics market, January, 2006.
Multiple OSS databases exists, Hadoop project is official Apache project in 2008.
Where the analysts are on the
adoption curve
© Third Nature, Inc.
The problem with bad framing
s
Leads to bad assumptions about use, inappropriate features,
poor understanding of substitutability and the impacts it will have.
© Third Nature, Inc.
Auto-mobile?
The better framing
leads to a more
intuitive understanding,
and to more clear
reasoning about it.
It took decades to
standardize on a
steering wheel and
brake & gas pedals.
© Third Nature, Inc.
Technology doesn’t just fulfill a need. It generates
new needs and new problems. Business practices
and technology co-evolve. Innovation is change.
© Third Nature, Inc.
Value is not in the product, it’s in the practice
So are the costs
© Third Nature, Inc.
Open source is an example of practice change
It’s a means of production, not a technology
© Third Nature, Inc.
Practices have to catch up to technologies
© Third Nature, Inc.
Practice evolution of computing over time
1930s-1950s: Calculate
1960s-1980s: Automate
1990s-2000s: Informate
2010s+: Analyze and
Actuate
Computing technology has become a tool of observation
Risingorganizationalcomplexity
© Third Nature, Inc.
Evolution of views on data
50s-60s: data as product
70s-80s: data as byproduct
90s-00s: data as asset
2010s +: data as substrate
The real data revolution is in
business structure and
processes and how they use
the information.
© Third Nature, Inc.
Types of innovation
Incremental or “sustaining”
▪ Incremental is based on existing concepts; smaller
changes within the same framework; “improvement”
Disruptive or invention
▪ Based on new concepts, science, principles; requires
new knowledge, skills; over time has significant
consequences to market; “invention”
Architectural – the third path
▪ Changes how the parts are related. It devalues the
advantage of experience, knowledge, usefulness of
prior knowledge, but doesn’t affect the existing
knowledge. (Christensen missed this one)
© Third Nature, Inc.
We are in another round of infrastructure change
Mainframe  c/s  cloud
Batch  online  event driven
Infrastructure takes a long time.
Value is driven by new capabilities
used to do new things, less by
doing old things better or cheaper
© Third Nature, Inc.
STORING DATA
© Third Nature, Inc.
Data warehouse: centralize, that solves all problems!
Creates bottlenecks
Causes scale problems
Enforces a single model
© Third Nature, Inc.
The data lake solution: no central authority
wtf, it was fully
operational!
© Third Nature, Inc.
The data lake solution?
There’s a problem: as
the lake is envisioned,
it is still a centralized
data architecture, but
this time there is no
single global model.
Instead it’s files and
not modeled. It can be
operational while
under construction.
It’s still a death star.
© Third Nature, Inc.
Eventually we run into the same problems
Seriously, wtf?
It was agile
and operational
Rising complexity and scale break centralized models
© Third Nature, Inc.
Data isn’t just in tables, it’s inside other things
© Third Nature, Inc.
More important anything can be treated as data
Data isn’t just inside
things, it’s also the
thing itself. And further
data can be derived
from that thing.
© Third Nature Inc.© Third Nature Inc.
Data structure and format versus form
Structure: image
Format: bitmap, PNG, base64-encoded PNG
Collections of data have a structural form
Set List Graph
ID Name Salary Position
1 Marge Inovera $150,000 Statistician
2 Anita Bath $120,000 Sewer inspector
3 Ivan Awfulitch $160,000 Dermatologist
4 Nadia Geddit $36,000 DBA
ID Name Salary Position
1 Marge Inovera $150,000 Statistician
2 Anita Bath $120,000 Sewer inspector
3 Ivan Awfulitch $160,000 Dermatologist
4 Nadia Geddit $36,000 DBA
© Third Nature Inc.© Third Nature Inc.
Each form requires a different engine*
Just like freight requires different transportation,
data requires different storage and processing.
Set List Graph
© Third Nature, Inc.
Hard reality: workload incompatibility
As the BI workload increases, the OLTP response time
increases due to asymmetric resource consumption.
Analytics workloads disrupt BI workloads in the same way
The problem in the 1990s The problem now
© Third Nature, Inc.
ACQUIRING AND PROCESSING DATA
© Third Nature, Inc.
Events and sensors are a relatively new data source
This data doesn’t fit well with current methods of collection and
storage, or with the technology to process and analyze it.
© Third Nature, Inc.
Old models assumed extraction of data
© Third Nature, Inc.
Old market says: There’s nothing wrong with what
you have, just keep buying new products from us
© Third Nature, Inc.
The emerging big data market has an answer…
© Third Nature, Inc.
The data lake: just dump the data in!
© Third Nature, Inc.
Combine
with self-
service:
we’ll figure
it all out
later!
Aren’t we
back where
we started?
© Third Nature, Inc.
Data curation
The problem with so many
sources, types, formats and
latencies of data is that it is
now impossible to create
one model for all of it in
advance.
Data modeling is about the
inside of a dataset. Curation
is about the set.
Data curation, rather than
data modeling, is becoming
the more important data
management practice.
© Third Nature, Inc.
The missing ingredient from most big data
Specifically,
metadata kept
separate from
the data.
© Third Nature, Inc.
You need a system of record for analytics
© Third Nature, Inc.
The solution to our problems isn’t
technology, it’s architecture.
© Third Nature, Inc.
New materials lead to new architectures
© Third Nature, Inc.
IT reality is multiple data stores and systems
Separate, purpose-built databases and processing systems for
different types of data and query / computing workloads, plus
any access method, is the new norm for information delivery.
BI, Reporting,
Dashboards, apps
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
Data
Warehouse
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Data
processing
Stream
processing
© Third Nature, Inc.
DATA ARCHITECTURE
We’re so focused on the light switch that we’re not
talking about the light
© Third Nature, Inc.
How about another way of organizing data?
© Third Nature, Inc.
Splitting the architecture addresses three data goals
Production
Creation,
collection, storage
of new data
Distribution
Organization and
distribution of data
to multiple points
of use
Consumption
Direct support of
data use
Separation of concerns, coordination of process
© Third Nature, Inc.
The full analytic environment subsumes all the functions
of the data warehouse, and extends them
Data Acquisition
Collect & Store
Incremental
Batch
One-time copy
Real time
Platform Services
Data Management
Process & Integrate
Data Access
Deliver & Use
Data storage
The platform has to do more than serve queries; it has to be read-write.
© Third Nature, Inc.
USING DATA
© Third Nature Inc.© Third Nature Inc.
More data stores = more user complexity
Where to go?
© Third Nature Inc.© Third Nature Inc.
More forms of data = more user complexity
© Third Nature, Inc.
The old problem was access, the new one is analysis
© Third Nature, Inc.
The analysis process at a high level
Diagram: Kate Matsudaira
© Third Nature, Inc.
The nature of analytics problems is researching the
unknown rather than accessing the known.
Repeat for each new problem
Base diagram: Kate Matsudaira
© Third Nature, Inc.
Important: no two analytics projects are entirely alike
Different goals = different data, preparation, algorithm
Different algorithms have different resource consumption
profiles and scaling ability.
Each requires it’s own custom engineered data features
© Third Nature, Inc.
B I
© Third Nature, Inc.
Analysis requires more interaction than querying
BI tool can display this chart type. But it can’t make an
interactive visual of it to enable exploration of the dataset.
This example does un-BI things like show an entire
set/subset at once, with other analytics in it.
© Third Nature, Inc.
More modes of interaction are required from tools
Query Search Browse
Often, searching and browsing is done not just on data,
but on metadata and datasets.
© Third Nature, Inc.
Analytics is often embedded and purpose-built, not a tool
© Third Nature, Inc.
“Real-time” BI: do not confuse the two models
On-demand (persisted)
▪ Use to see current
state, analyze history
▪ Request model
▪ One-time
▪ Human oriented
Streaming (continuous )
▪ Use to see constant
state, monitor, react
▪ Streaming model
▪ Continuous
▪ Machine oriented
Modern tools are able to fuse the two. Older products and
custom coding (common in big data market) aren’t.
© Third Nature Inc.© Third Nature Inc.
New tools and tool architectures are required
One tool for all jobs doesn’t work any more*
* it never did, we just had fewer jobs
© Third Nature, Inc.
NAVIGATING THE MARKET CHANGES
© Third Nature, Inc.
Don’t follow the market
Some people can’t resist
getting the next new thing
because it’s new and new is
always better.
Many IT organizations are like
this, promoting a solution and
hunting for the problem that
matches it.
Better to ask “What is the
problem for which this
technology is the answer?”
Copyright Third Nature, Inc.
© Third Nature, Inc.
As a technology moves from emerging to commodity the
nature of acquiring, using and managing it should change
Generate
options
Innovation
Novel practice
Maximize value
Maturation
Standardize /
minimize choice
Acquisition
Best practice
Minimize costs
SaturationInnovation
e.g. BI which went from many tools to a few vendors, now being
disrupted by new technologies and capabilities
Constrain
choices
Adaptation
Good practice
Optimize
© Third Nature, Inc.
Should you be a first mover or fast follower?
Time
Little product
substitution is
possible here.
Few competitive
bids or RFPs.
Maturation
Uncertain
tradeoffs here.
Competitive
bids for unlike
products. Early
it’s less “what
feature” and
more “how to
accomplish my
task”, later it’s
the opposite.
Predictable
cost and
feature
comparison
until practices
change. That
change can
take a long
time to occur.
SaturationInnovation
Market
growth
© Third Nature, Inc.
Time
Rule of thumb: when a product is in phase…
Maturation SaturationInnovation
Market
growth
Build Integrate Buy
© Third Nature, Inc.
We are in a transition from technology to practice change
This is the turbulent
phase of the market
as it goes through
rapid development,
then product and
service changes.
Copyright Third Nature, Inc.
Commodity computing and commodity networking has forced a a
new architectural evolution, already well underway.
Maturation SaturationInnovation
© Third Nature, Inc.
“Now is not the end.
It is not even the
beginning of the end.
But it is, perhaps,
the end of the
beginning.”
Winston Churchill
© Third Nature, Inc.
CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
shady_puppy_sales.jpg - http://www.flickr.com/photos/brizzlebornandbred/5001120150
cuneiform_proto_3000bc.jpg - http://www.flickr.com/photos/takomabibelot/3124619443/
cuneiform_undo.jpg - http://www.flickr.com/photos/charlestilford/2552654321/
scroll_kerouac.jpg - http://www.flickr.com/photos/ari/93966538/
House on fire - http://flickr.com/photos/oldonliner/1485881035/
Manuscripts on shelf - http://flickr.com/photos/peterkaminski/1688635/
manuscript_illum.jpg - http://www.flickr.com/photos/diorama_sky/2975796332/
manuscript_page.jpg - http://www.flickr.com/photos/calliope/306564541/
subway dc metro - http://flickr.com/photos/musaeum/509899161/Circos, Hierarchical Edge
Bundles:Visualization of Adjacency Relations in Hierarchical Data, Danny Holten
text composition - http://flickr.com/photos/candiedwomanire/60224567/
twitter_network_bw.jpg - http://www.flickr.com/photos/dr/2048034334/
donuts_4_views.jpg - http://www.flickr.com/photos/le_hibou/76718773/
subway dc metro - http://flickr.com/photos/musaeum/509899161/
© Third Nature, Inc.
CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
cuneiform_undo.jpg - http://www.flickr.com/photos/charlestilford/2552654321/
cuneiform_proto_3000bc.jpg - http://www.flickr.com/photos/takomabibelot/3124619443/
scroll_kerouac.jpg - http://www.flickr.com/photos/ari/93966538/
firemen not noticing fire.jpg - http://flickr.com/photos/oldonliner/1485881035/
outdated gumshoe.jpg - http://flickr.com/photos/olivander/372385317/
manuscript_page.jpg - http://www.flickr.com/photos/calliope/306564541/
manuscript_illum.jpg - http://www.flickr.com/photos/diorama_sky/2975796332
well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/
pyramid_camel_rider.jpg - http://www.flickr.com/photos/khalid-almasoud/1528054134/
uniform_umbrellas.jpg - http://www.flickr.com/photos/mortimer/221051561/
open air market - http://flickr.com/photos/baboon/309793875/
train_to_sea.jpg - http://www.flickr.com/photos/innoxiuss/457069767/
wheat_field.jpg - http://www.flickr.com/photos/ecstaticist/1120119742/
Open air market - http://flickr.com/photos/baboon/309793875/
changing of the guard.jpg - http://flickr.com/photos/mambo1935/160739264/
Gare do Oriente Lisbon airport bridge.jpg - http://flickr.com/photos/higaara/228673603/
winding_road.jpg - http://www.flickr.com/photos/batt_57/4000701633/
Tokyo forum - http://flickr.com/photos/fukagawa/2004106475/
riot police line small - http://flickr.com/photos/73594239@N00/25719098/
© Third Nature, Inc.
About Third Nature
Third Nature is a research and consulting firm focused on new and
emerging technology and practices in analytics, business intelligence,
information strategy and data management. If your question is related to
data, analytics, information strategy and technology infrastructure then
you‘re at the right place.
Our goal is to help organizations solve problems using data. We offer
advisory services to help plan and develop data-related strategy and
plans, as well as education, consulting and research services.
© Third Nature, Inc.
About the Presenter
Mark Madsen is president of Third
Nature, a technology research and
consulting firm focused on analytics,
business intelligence, and data
management. Mark is an award-winning
author, architect and CTO whose work
has been featured in numerous industry
publications. Over the past ten years
Mark received awards for his work from
the American Productivity & Quality
Center, TDWI, and the Smithsonian
Institute. He is an international speaker,
a contributor to Forbes Online and
member of the O’Reilly Strata program
committee. For more information or to
contact Mark, follow @markmadsen on
Twitter or visit http://ThirdNature.net

How to understand trends in the data & software market

  • 1.
    A Retrospective of theFuture Trends in the data industry July, 2017 Mark Madsen www.ThirdNature.net @markmadsen
  • 2.
    © Third Nature,Inc. Most BI tech is a commodity, a cost of doing business
  • 3.
    © Third Nature,Inc. Adoption and decline – everything gets old For most businesses, more than 80% of IT budget is dedicated to basic infrastructure …and more than 60% of IT labor cost goes to keep things running, i.e. basic operations and support. Strategic Commodity
  • 4.
    © Third Nature,Inc. It Wasn’t Always This Way As technologies mature and spread to competitors, they cease to be differentiators. Unfortunately, this is what packaged software vendors do to your “best practice.” CommodityCommodity The old advantages becomes the new focus of cost reduction. For example, your data warehouse. Strategic Strategic
  • 5.
    © Third Nature,Inc. Time Cumulative Adoption Market Adoption Hard work Tipping point
  • 6.
    © Third Nature,Inc. Product Maturity Some Ideas Aren’t That Good End of LifeTimeNew innovation Some ideas aren’t that good, like object databases in the 1990s
  • 7.
    © Third Nature,Inc. These Curves Can Explain a Lot Time Product Maturity Analyst revenue predictions Executive interest“Gartner Gap”
  • 8.
    © Third Nature,Inc. The “experts” often have a foreshortened view “Open source is not worth paying attention to.” A Gartner analyst talking about the database and analytics market, January, 2006. Multiple OSS databases exists, Hadoop project is official Apache project in 2008. Where the analysts are on the adoption curve
  • 9.
    © Third Nature,Inc. The problem with bad framing s Leads to bad assumptions about use, inappropriate features, poor understanding of substitutability and the impacts it will have.
  • 10.
    © Third Nature,Inc. Auto-mobile? The better framing leads to a more intuitive understanding, and to more clear reasoning about it. It took decades to standardize on a steering wheel and brake & gas pedals.
  • 11.
    © Third Nature,Inc. Technology doesn’t just fulfill a need. It generates new needs and new problems. Business practices and technology co-evolve. Innovation is change.
  • 12.
    © Third Nature,Inc. Value is not in the product, it’s in the practice So are the costs
  • 13.
    © Third Nature,Inc. Open source is an example of practice change It’s a means of production, not a technology
  • 14.
    © Third Nature,Inc. Practices have to catch up to technologies
  • 15.
    © Third Nature,Inc. Practice evolution of computing over time 1930s-1950s: Calculate 1960s-1980s: Automate 1990s-2000s: Informate 2010s+: Analyze and Actuate Computing technology has become a tool of observation Risingorganizationalcomplexity
  • 16.
    © Third Nature,Inc. Evolution of views on data 50s-60s: data as product 70s-80s: data as byproduct 90s-00s: data as asset 2010s +: data as substrate The real data revolution is in business structure and processes and how they use the information.
  • 17.
    © Third Nature,Inc. Types of innovation Incremental or “sustaining” ▪ Incremental is based on existing concepts; smaller changes within the same framework; “improvement” Disruptive or invention ▪ Based on new concepts, science, principles; requires new knowledge, skills; over time has significant consequences to market; “invention” Architectural – the third path ▪ Changes how the parts are related. It devalues the advantage of experience, knowledge, usefulness of prior knowledge, but doesn’t affect the existing knowledge. (Christensen missed this one)
  • 18.
    © Third Nature,Inc. We are in another round of infrastructure change Mainframe  c/s  cloud Batch  online  event driven Infrastructure takes a long time. Value is driven by new capabilities used to do new things, less by doing old things better or cheaper
  • 19.
    © Third Nature,Inc. STORING DATA
  • 20.
    © Third Nature,Inc. Data warehouse: centralize, that solves all problems! Creates bottlenecks Causes scale problems Enforces a single model
  • 21.
    © Third Nature,Inc. The data lake solution: no central authority wtf, it was fully operational!
  • 22.
    © Third Nature,Inc. The data lake solution? There’s a problem: as the lake is envisioned, it is still a centralized data architecture, but this time there is no single global model. Instead it’s files and not modeled. It can be operational while under construction. It’s still a death star.
  • 23.
    © Third Nature,Inc. Eventually we run into the same problems Seriously, wtf? It was agile and operational Rising complexity and scale break centralized models
  • 24.
    © Third Nature,Inc. Data isn’t just in tables, it’s inside other things
  • 25.
    © Third Nature,Inc. More important anything can be treated as data Data isn’t just inside things, it’s also the thing itself. And further data can be derived from that thing.
  • 26.
    © Third NatureInc.© Third Nature Inc. Data structure and format versus form Structure: image Format: bitmap, PNG, base64-encoded PNG Collections of data have a structural form Set List Graph ID Name Salary Position 1 Marge Inovera $150,000 Statistician 2 Anita Bath $120,000 Sewer inspector 3 Ivan Awfulitch $160,000 Dermatologist 4 Nadia Geddit $36,000 DBA ID Name Salary Position 1 Marge Inovera $150,000 Statistician 2 Anita Bath $120,000 Sewer inspector 3 Ivan Awfulitch $160,000 Dermatologist 4 Nadia Geddit $36,000 DBA
  • 27.
    © Third NatureInc.© Third Nature Inc. Each form requires a different engine* Just like freight requires different transportation, data requires different storage and processing. Set List Graph
  • 28.
    © Third Nature,Inc. Hard reality: workload incompatibility As the BI workload increases, the OLTP response time increases due to asymmetric resource consumption. Analytics workloads disrupt BI workloads in the same way The problem in the 1990s The problem now
  • 29.
    © Third Nature,Inc. ACQUIRING AND PROCESSING DATA
  • 30.
    © Third Nature,Inc. Events and sensors are a relatively new data source This data doesn’t fit well with current methods of collection and storage, or with the technology to process and analyze it.
  • 31.
    © Third Nature,Inc. Old models assumed extraction of data
  • 32.
    © Third Nature,Inc. Old market says: There’s nothing wrong with what you have, just keep buying new products from us
  • 33.
    © Third Nature,Inc. The emerging big data market has an answer…
  • 34.
    © Third Nature,Inc. The data lake: just dump the data in!
  • 35.
    © Third Nature,Inc. Combine with self- service: we’ll figure it all out later! Aren’t we back where we started?
  • 36.
    © Third Nature,Inc. Data curation The problem with so many sources, types, formats and latencies of data is that it is now impossible to create one model for all of it in advance. Data modeling is about the inside of a dataset. Curation is about the set. Data curation, rather than data modeling, is becoming the more important data management practice.
  • 37.
    © Third Nature,Inc. The missing ingredient from most big data Specifically, metadata kept separate from the data.
  • 38.
    © Third Nature,Inc. You need a system of record for analytics
  • 39.
    © Third Nature,Inc. The solution to our problems isn’t technology, it’s architecture.
  • 40.
    © Third Nature,Inc. New materials lead to new architectures
  • 41.
    © Third Nature,Inc. IT reality is multiple data stores and systems Separate, purpose-built databases and processing systems for different types of data and query / computing workloads, plus any access method, is the new norm for information delivery. BI, Reporting, Dashboards, apps 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA 1 MargeInovera $150,000 Statistician 2 AnitaBath $120,000 Sewerinspector 3 IvanAwfulitch $160,000 Dermatologist 4 NadiaGeddit $36,000 DBA Data Warehouse Databases Documents Flat Files XML Queues ERP Applications Source Environments Data processing Stream processing
  • 42.
    © Third Nature,Inc. DATA ARCHITECTURE We’re so focused on the light switch that we’re not talking about the light
  • 43.
    © Third Nature,Inc. How about another way of organizing data?
  • 44.
    © Third Nature,Inc. Splitting the architecture addresses three data goals Production Creation, collection, storage of new data Distribution Organization and distribution of data to multiple points of use Consumption Direct support of data use Separation of concerns, coordination of process
  • 45.
    © Third Nature,Inc. The full analytic environment subsumes all the functions of the data warehouse, and extends them Data Acquisition Collect & Store Incremental Batch One-time copy Real time Platform Services Data Management Process & Integrate Data Access Deliver & Use Data storage The platform has to do more than serve queries; it has to be read-write.
  • 46.
    © Third Nature,Inc. USING DATA
  • 47.
    © Third NatureInc.© Third Nature Inc. More data stores = more user complexity Where to go?
  • 48.
    © Third NatureInc.© Third Nature Inc. More forms of data = more user complexity
  • 49.
    © Third Nature,Inc. The old problem was access, the new one is analysis
  • 50.
    © Third Nature,Inc. The analysis process at a high level Diagram: Kate Matsudaira
  • 51.
    © Third Nature,Inc. The nature of analytics problems is researching the unknown rather than accessing the known. Repeat for each new problem Base diagram: Kate Matsudaira
  • 52.
    © Third Nature,Inc. Important: no two analytics projects are entirely alike Different goals = different data, preparation, algorithm Different algorithms have different resource consumption profiles and scaling ability. Each requires it’s own custom engineered data features
  • 53.
  • 54.
    © Third Nature,Inc. Analysis requires more interaction than querying BI tool can display this chart type. But it can’t make an interactive visual of it to enable exploration of the dataset. This example does un-BI things like show an entire set/subset at once, with other analytics in it.
  • 55.
    © Third Nature,Inc. More modes of interaction are required from tools Query Search Browse Often, searching and browsing is done not just on data, but on metadata and datasets.
  • 56.
    © Third Nature,Inc. Analytics is often embedded and purpose-built, not a tool
  • 57.
    © Third Nature,Inc. “Real-time” BI: do not confuse the two models On-demand (persisted) ▪ Use to see current state, analyze history ▪ Request model ▪ One-time ▪ Human oriented Streaming (continuous ) ▪ Use to see constant state, monitor, react ▪ Streaming model ▪ Continuous ▪ Machine oriented Modern tools are able to fuse the two. Older products and custom coding (common in big data market) aren’t.
  • 58.
    © Third NatureInc.© Third Nature Inc. New tools and tool architectures are required One tool for all jobs doesn’t work any more* * it never did, we just had fewer jobs
  • 59.
    © Third Nature,Inc. NAVIGATING THE MARKET CHANGES
  • 60.
    © Third Nature,Inc. Don’t follow the market Some people can’t resist getting the next new thing because it’s new and new is always better. Many IT organizations are like this, promoting a solution and hunting for the problem that matches it. Better to ask “What is the problem for which this technology is the answer?” Copyright Third Nature, Inc.
  • 61.
    © Third Nature,Inc. As a technology moves from emerging to commodity the nature of acquiring, using and managing it should change Generate options Innovation Novel practice Maximize value Maturation Standardize / minimize choice Acquisition Best practice Minimize costs SaturationInnovation e.g. BI which went from many tools to a few vendors, now being disrupted by new technologies and capabilities Constrain choices Adaptation Good practice Optimize
  • 62.
    © Third Nature,Inc. Should you be a first mover or fast follower? Time Little product substitution is possible here. Few competitive bids or RFPs. Maturation Uncertain tradeoffs here. Competitive bids for unlike products. Early it’s less “what feature” and more “how to accomplish my task”, later it’s the opposite. Predictable cost and feature comparison until practices change. That change can take a long time to occur. SaturationInnovation Market growth
  • 63.
    © Third Nature,Inc. Time Rule of thumb: when a product is in phase… Maturation SaturationInnovation Market growth Build Integrate Buy
  • 64.
    © Third Nature,Inc. We are in a transition from technology to practice change This is the turbulent phase of the market as it goes through rapid development, then product and service changes. Copyright Third Nature, Inc. Commodity computing and commodity networking has forced a a new architectural evolution, already well underway. Maturation SaturationInnovation
  • 65.
    © Third Nature,Inc. “Now is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.” Winston Churchill
  • 66.
    © Third Nature,Inc. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: shady_puppy_sales.jpg - http://www.flickr.com/photos/brizzlebornandbred/5001120150 cuneiform_proto_3000bc.jpg - http://www.flickr.com/photos/takomabibelot/3124619443/ cuneiform_undo.jpg - http://www.flickr.com/photos/charlestilford/2552654321/ scroll_kerouac.jpg - http://www.flickr.com/photos/ari/93966538/ House on fire - http://flickr.com/photos/oldonliner/1485881035/ Manuscripts on shelf - http://flickr.com/photos/peterkaminski/1688635/ manuscript_illum.jpg - http://www.flickr.com/photos/diorama_sky/2975796332/ manuscript_page.jpg - http://www.flickr.com/photos/calliope/306564541/ subway dc metro - http://flickr.com/photos/musaeum/509899161/Circos, Hierarchical Edge Bundles:Visualization of Adjacency Relations in Hierarchical Data, Danny Holten text composition - http://flickr.com/photos/candiedwomanire/60224567/ twitter_network_bw.jpg - http://www.flickr.com/photos/dr/2048034334/ donuts_4_views.jpg - http://www.flickr.com/photos/le_hibou/76718773/ subway dc metro - http://flickr.com/photos/musaeum/509899161/
  • 67.
    © Third Nature,Inc. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: cuneiform_undo.jpg - http://www.flickr.com/photos/charlestilford/2552654321/ cuneiform_proto_3000bc.jpg - http://www.flickr.com/photos/takomabibelot/3124619443/ scroll_kerouac.jpg - http://www.flickr.com/photos/ari/93966538/ firemen not noticing fire.jpg - http://flickr.com/photos/oldonliner/1485881035/ outdated gumshoe.jpg - http://flickr.com/photos/olivander/372385317/ manuscript_page.jpg - http://www.flickr.com/photos/calliope/306564541/ manuscript_illum.jpg - http://www.flickr.com/photos/diorama_sky/2975796332 well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/ pyramid_camel_rider.jpg - http://www.flickr.com/photos/khalid-almasoud/1528054134/ uniform_umbrellas.jpg - http://www.flickr.com/photos/mortimer/221051561/ open air market - http://flickr.com/photos/baboon/309793875/ train_to_sea.jpg - http://www.flickr.com/photos/innoxiuss/457069767/ wheat_field.jpg - http://www.flickr.com/photos/ecstaticist/1120119742/ Open air market - http://flickr.com/photos/baboon/309793875/ changing of the guard.jpg - http://flickr.com/photos/mambo1935/160739264/ Gare do Oriente Lisbon airport bridge.jpg - http://flickr.com/photos/higaara/228673603/ winding_road.jpg - http://www.flickr.com/photos/batt_57/4000701633/ Tokyo forum - http://flickr.com/photos/fukagawa/2004106475/ riot police line small - http://flickr.com/photos/73594239@N00/25719098/
  • 68.
    © Third Nature,Inc. About Third Nature Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, information strategy and data management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place. Our goal is to help organizations solve problems using data. We offer advisory services to help plan and develop data-related strategy and plans, as well as education, consulting and research services.
  • 69.
    © Third Nature,Inc. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on analytics, business intelligence, and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and member of the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net