SlideShare a Scribd company logo
1 of 44
Download to read offline
Beer, diapers, and
correlation: A tale
of ambiguity
Strata, September 29, 2015
Mark Madsen
www.ThirdNature.net
@markmadsen
Copyright Third Nature, Inc.
"I will set down a tale... it may be
history, it may be only a legend, a
tradition. It may have happened,
it may not have happened: but it
could have happened.“
– Mark Twain
Copyright Third Nature, Inc.
Analysis, art and science all play a role
For example, analytical
cubism paints a picture
in a way similar to how
we see and remember
a subject.
Although cubism seems
abstract it tries to
capture the truth - it
may be a more realistic
way to paint an image.
“Portrait of Dora Maar” Pablo Picasso (1937)
Copyright Third Nature, Inc.
Like art at the turn of the
last century, analytics,
particularly how people
think they should be
used, can get stuck in
absolutist thinking.
Models abstract reality
by capturing dominant
features, the essence of
a situation.
Like cubist paintings.
“Portrait de Ambroise Vollard” Pablo Picasso (1910)
Copyright Third Nature, Inc.
People analyzing market basket data found an unexpected
correlation between the sales of beer and diapers in the
same transaction. They changed their merchandising to put
them next to each other. They made lots of money.
a) How many of you have heard this story?
b) How many of you believe it’s true?
The story of beer and diapers
+ =
Copyright Third Nature, Inc.
Why do this sort of analysis?
Increase co-purchases of products
bought together, e.g. office desk
supplies and the forgotten item
Cross-promotion using directional
relationships, e.g. chips and salsa
Price optimization, particularly
during promotions
Inventory management, e.g.
stocking the proper amount of the
dependent product
Refine marketing, e.g. targeting
segments based on the affinities
Copyright Third Nature, Inc.
How do you do it?
You can just check pairwise correlations if you
know what you’re looking for, otherwise this is
impractical.
Plus, what about arbitrary numbers of items?
▪ Affinity analysis, aka association rule mining - find
frequent itemsets, then build rules
▪ Tune rules based on confidence, lift, conviction so
produce less spurious, more useful ones
What about patterns over time?
▪ Time series clustering
Copyright Third Nature, Inc.
I wonder…Is it true?
I read about the beer
and diapers story in a
retail trade magazine.
In 1994-95-ish I ran a
test, in a grocery store
operating in PA, OH, MD
Specifically searched for
this pattern.
Seems
legit
Copyright Third Nature, Inc.
Is it true? No.
Original: 1992, Correlation
▪ Stated cause: Men buying diapers
My test: Grocery chain, 1994, No correlation
▪ No causal relationship
So began a long journey
Copyright Third Nature, Inc.
A little later: Is it true? Yes.
Wait, what?
Original: Grocery, 1992, Correlation
1. Grocery, ~1994, No correlation
2. Drug store, ~1995, Correlation
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Where did this story come from?
The trail through the press is muddy,
but the internet made it much easier.
Social networks more so.
It goes back to 1992, Osco:
“…point-of-sale data from Osco Drug stores -
1.2 million baskets worth...
…queries revealed that between 5pm and 7pm,
customers tended to co-purchase beer and
diapers.
…does not appear to have exploited the
information by moving the products around.
…we have a correlation between beer, diapers
and time, but no correlations with age, gender
or day.”
Copyright Third Nature, Inc.
Also: history of databases in No-tation
1970: NoSQL = We have no SQL
1980: NoSQL = Not yet SQL
1990: NoSQL = Know SQL
2000: NoSQL = No SQL!
2005: NoSQL = Not only SQL
2013: NoSQL = No, SQL!
In 1992, we didn’t use association rules. We used SQL
Copyright Third Nature, Inc.
Origin: “never let data get in
the way of a good story”
It’s sales propaganda.
“The warehouse notion is so
seductive that information
technology is using it to shake
the purse strings loose,” says
Douglas Hackney (1998)
Just like today, with
analytics and big data.
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Variations of the story continued
11,400 academic papers, 14,000 books, ~1,000,000 web pages
“The discount chain moved the beer and snacks such as peanuts and pretzels next to the
disposable diapers and increased sales on peanuts and pretzels by more that 27%."
“For example, one Midwest grocery chain used the data mining capacity of Oracle software
to analyze local buying patterns. They discovered that when men bought diapers on
Thursdays and Saturdays, they also tended to buy beer.”
“[UK] Dads with newborns cannot go out to pubs to socialize with friends (me including).
So they buy more beers so that they can drink at home!”
“Some time ago, Wal-Mart decided to combine the data from its loyalty card system with
that from its PoS…Once combined, the data was mined extensively …On Friday afternoons,
young American males who buy diapers (nappies) also have a predisposition to buy beer.”
"Shoppers who buy diapers for the first time at a Tesco store can expect to receive
coupons by mail for baby wipes, toys -- and beer. Tesco's analysis showed that new fathers
tend to buy more beer because they are home with the baby and can't go to the pub."
"Sometimes the data can throw up surprises: mining of databases held by 7-Eleven stores
in the US revealed a link between purchases of beer and nappies. When they were moved
together, sales of both increased,”
Copyright Third Nature, Inc.
There are mixed
answers though.
Fragmentation,
ambiguity, what’s
going on here?
Are you asking the
right question?
Not “is it true” but
“under what set of
circumstances is it
true?”
Marianne Faithful's version of a
portrait from 'Destroy Rankin'
Copyright Third Nature, Inc.
Is it true? Yes. No. Maybe. I don’t know.
I kept checking after that second one.
Original: Grocery, 1992, Correlation
1. Grocery, ~1994, No correlation
2. Drug store, ~1995, Correlation
3. Drug store, 1997, Very high correlation
4. Drug store, 1997, No correlation
5. Grocery, 2000, Weak correlation
6. Multiple, 2013, Correlation
Copyright Third Nature, Inc.
Analytic insights that result in no action are expensive trivia.
It’s not the insight, but what you do with it, that matters
As a manager: what would you do?
Copyright Third Nature, Inc.
Reality
You wouldn’t do anything without
some idea of cause or context,
correlation isn’t enough .
sorry guys, causation does matter,
otherwise it might be ice cream & sharks
Copyright Third Nature, Inc.
Maybe it’s a nightcap to sleep well
Copyright Third Nature, Inc.
Forgetful husbands only remember the basics
“I send you to the
store for a few
necessities, and you
come back with chips
and beer instead of
talcum powder and
diapers.”
- MacGyver, 4/18/1988
Copyright Third Nature, Inc.
Or harried dads rewarding themselves with impulse buys
Copyright Third Nature, Inc.
Men can’t go out with their friends any more
Copyright Third Nature, Inc.
Maybe it’s mom, because in the late-1990s…
Copyright Third Nature, Inc.
Sometimes it’s not “purchasing” exactly, but it still
should still count as a correlation
Copyright Third Nature, Inc.
Nobody checked if it was diapers for babies…
Copyright Third Nature, Inc.
‘Cause no presentation is complete without charts
Whazzat?
Men, shopping before Thanksgiving and end of January?
Copyright Third Nature, Inc.
Do you want to know what the answers are?
So do I…
Copyright Third Nature, Inc.
Good news! You are all right.
Bad news! You are all wrong.
1. Grocery, 1992, Correlation – male, TH/SA, 5-7 PM, midwest
2. Grocery, ~1994, No correlation, mid-northeast
3. Drug store, ~1995, Correlation – male, weekends, east
4. Drug store, 1997, Very high correlation, both, always, US (they
decide first, ask questions later - not like science, like myth)
5. Drug store, 1997, No correlation, US
6. Grocery, 2000, Weak correlation, female, weekdays, west/mid
7. Multiple, 2013, Correlation, male, seasonal dates, US
8. Grocery, 2006, Correlation, male, UK, (not 1st hand)
The story is true, and false, and indeterminate.
It’s ambiguous.
Copyright Third Nature, Inc.
Are you asking the right question?
Not “is it true” but “under what circumstances is it true?”
Image credit: unknown
Copyright Third Nature, Inc.
“It is termed analytical cubism
because of its structured
dissection of the subject,
viewpoint-by-viewpoint,
resulting in a fragmentary
image of multiple viewpoints
and overlapping planes. Other
distinguishing features of
analytical cubism were a
simplified palette of colours,
so the viewer was not
distracted from the structure
of the form, and the density
of the image at the centre of
the canvas.”
Le Goûter / Tea Time / Femme à la Cuillère (1911)
Copyright Third Nature, Inc.
What is happening here is
that “the model” is seeing
pieces of a whole because
each of these models is
independent, with its own
observations and data, like
a snapshot that covers only
the focal point of your eye.
Copyright Third Nature, Inc.
You are actually seeing the
pieces individually, with
little context to link them,
Copyright Third Nature, Inc.
…not as independent, in
their appropriate places
within a larger context.
Copyright Third Nature, Inc.
When they are lined up
together the larger scene
emerges, in much the
same way that your brain
puts together a scene by
focusing on individual
pieces and assembling
the picture from them as
your focus moves from
place to place.
Joiner, David Hockney (1980s)
Copyright Third Nature, Inc.
Except that the pieces
you see here are not
nicely bounded. They
are assembled
piecemeal, in
overlapping bits, some
including one element
some another. Just like
David Hockney’s
“joiners” from the
1980s.
This is a problem any
time you generalize
from someone else’s
model.
Joiner, David Hockney (1980s)
Copyright Third Nature, Inc.
Therefore…
You would choose to merchandise, price or
promote in accordance with the causality you
believe or determine is true.
The actions you take and the related outcomes
will vary based on the local context.
Not one way to act but many, just like there are
many attributes, not one.
Copyright Third Nature, Inc.
Key points
1. There is no “single version of the truth”, the word “version” is a
clue. Truth is relative and contextual, not absolute. So don’t
promote models as black and white answer-makers.
2. “All models are wrong, some are useful” – George Box
3. Like a cubist, think of all the angles when you use analytics. “Do
what they did” management strategy doesn’t work when it
comes to contextual problems.
4. You need to have an idea about causation. Ignore the pundits.
5. Data science is one half of the solution, applying the results is
the other half. “Science” because you go into the unknown
when you act. Like Heisenberg , your actions cause effects, and
the choice of action may preclude another action.
Copyright Third Nature, Inc.
People are the link that gives context. Without a
person determining the applicability of analytical
models or interpreting the results, little sense can
be made of them.
'Pearblossom Highway, 11th to 18th April 1986 No.2', David Hockney
Copyright Third Nature, Inc.
The power of myth
The origin story is no less
true for having not
happened. A myth:
▪ Reveals some mystery
▪ Shows the complexity of the
world and explains what you
encounter
▪ Validates a sociological
system
▪ Show that there’s a role, a
place for us
Lose Santa, gain a holiday, just
like the Grinch.
(I still use this story )
Copyright Third Nature, Inc.
The final lesson learned at Osco in the 1990s:
If you put two products next to each other on a
shelf then they are more likely to be sold together.
Profundity.
Copyright Third Nature, Inc.
Some reference material
Predictive Analytics and Data Mining. 2014 (chapter 6, association rules)
The delights of seeing: Cubism, Joiners and The Multiple Viewpoint
What Three Wise Men have to say about diagnosis, BMJ 2011;343:d7769
Discovering temporal association rules: Algorithms, language and systems,
http://www.computer.org/csdl/proceedings/icde/2000/0506/00/05060306.pdf
Beer and Breastfeeding, Pubmed, http://www.ncbi.nlm.nih.gov/pubmed/11065057
Frames, Biases, and Rational Decision-Making in the Human Brain,
http://econ.as.nyu.edu/docs/IO/9877/De_Martino1.pdf
Copyright Third Nature, Inc.
About the Presenter
Mark Madsen is president of Third
Nature, a consulting and advisory firm
focused on analytics, data and
technology strategy and data
management. Mark is an award-winning
author, architect and CTO who has
received awards for his work from the
American Productivity & Quality Center,
Smithsonian Institute and industry
associations. He is an international
speaker, a contributor to Forbes, and
member of the O’Reilly Strata program
committee. For more information or to
contact Mark, follow @markmadsen on
Twitter or visit http://ThirdNature.net
Copyright Third Nature, Inc.
About Third Nature
Third Nature is a research and advisory firm focused on new and emerging
technology and practices in analytics, information strategy, business
intelligence and data management.
Our goal is to help organizations solve problems using data. We offer
education, advisory and research services to support business and IT
organizations. We also provide product-related consulting to software
vendors in the data industry.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in strategy and architecture, so we look at emerging
technologies and markets, evaluating how technologies are applied to
solve problems rather than simply comparing product features.

More Related Content

Viewers also liked

Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...mark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Determine the Right Analytic Database: A Survey of New Data Technologies
Determine the Right Analytic Database: A Survey of New Data TechnologiesDetermine the Right Analytic Database: A Survey of New Data Technologies
Determine the Right Analytic Database: A Survey of New Data Technologiesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
The State of Open Source BI Adoption
The State of Open Source BI AdoptionThe State of Open Source BI Adoption
The State of Open Source BI Adoptionmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 

Viewers also liked (18)

Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Determine the Right Analytic Database: A Survey of New Data Technologies
Determine the Right Analytic Database: A Survey of New Data TechnologiesDetermine the Right Analytic Database: A Survey of New Data Technologies
Determine the Right Analytic Database: A Survey of New Data Technologies
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
The State of Open Source BI Adoption
The State of Open Source BI AdoptionThe State of Open Source BI Adoption
The State of Open Source BI Adoption
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 

More from mark madsen

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good storymark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followersmark madsen
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousingmark madsen
 
Open Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing DataOpen Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing Datamark madsen
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousingmark madsen
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 

More from mark madsen (20)

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good story
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followers
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
 
Open Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing DataOpen Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing Data
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 

Beer, diapers, and correlation: A tale of ambiguity

  • 1. Beer, diapers, and correlation: A tale of ambiguity Strata, September 29, 2015 Mark Madsen www.ThirdNature.net @markmadsen
  • 2. Copyright Third Nature, Inc. "I will set down a tale... it may be history, it may be only a legend, a tradition. It may have happened, it may not have happened: but it could have happened.“ – Mark Twain
  • 3. Copyright Third Nature, Inc. Analysis, art and science all play a role For example, analytical cubism paints a picture in a way similar to how we see and remember a subject. Although cubism seems abstract it tries to capture the truth - it may be a more realistic way to paint an image. “Portrait of Dora Maar” Pablo Picasso (1937)
  • 4. Copyright Third Nature, Inc. Like art at the turn of the last century, analytics, particularly how people think they should be used, can get stuck in absolutist thinking. Models abstract reality by capturing dominant features, the essence of a situation. Like cubist paintings. “Portrait de Ambroise Vollard” Pablo Picasso (1910)
  • 5. Copyright Third Nature, Inc. People analyzing market basket data found an unexpected correlation between the sales of beer and diapers in the same transaction. They changed their merchandising to put them next to each other. They made lots of money. a) How many of you have heard this story? b) How many of you believe it’s true? The story of beer and diapers + =
  • 6. Copyright Third Nature, Inc. Why do this sort of analysis? Increase co-purchases of products bought together, e.g. office desk supplies and the forgotten item Cross-promotion using directional relationships, e.g. chips and salsa Price optimization, particularly during promotions Inventory management, e.g. stocking the proper amount of the dependent product Refine marketing, e.g. targeting segments based on the affinities
  • 7. Copyright Third Nature, Inc. How do you do it? You can just check pairwise correlations if you know what you’re looking for, otherwise this is impractical. Plus, what about arbitrary numbers of items? ▪ Affinity analysis, aka association rule mining - find frequent itemsets, then build rules ▪ Tune rules based on confidence, lift, conviction so produce less spurious, more useful ones What about patterns over time? ▪ Time series clustering
  • 8. Copyright Third Nature, Inc. I wonder…Is it true? I read about the beer and diapers story in a retail trade magazine. In 1994-95-ish I ran a test, in a grocery store operating in PA, OH, MD Specifically searched for this pattern. Seems legit
  • 9. Copyright Third Nature, Inc. Is it true? No. Original: 1992, Correlation ▪ Stated cause: Men buying diapers My test: Grocery chain, 1994, No correlation ▪ No causal relationship So began a long journey
  • 10. Copyright Third Nature, Inc. A little later: Is it true? Yes. Wait, what? Original: Grocery, 1992, Correlation 1. Grocery, ~1994, No correlation 2. Drug store, ~1995, Correlation
  • 11. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Where did this story come from? The trail through the press is muddy, but the internet made it much easier. Social networks more so. It goes back to 1992, Osco: “…point-of-sale data from Osco Drug stores - 1.2 million baskets worth... …queries revealed that between 5pm and 7pm, customers tended to co-purchase beer and diapers. …does not appear to have exploited the information by moving the products around. …we have a correlation between beer, diapers and time, but no correlations with age, gender or day.”
  • 12. Copyright Third Nature, Inc. Also: history of databases in No-tation 1970: NoSQL = We have no SQL 1980: NoSQL = Not yet SQL 1990: NoSQL = Know SQL 2000: NoSQL = No SQL! 2005: NoSQL = Not only SQL 2013: NoSQL = No, SQL! In 1992, we didn’t use association rules. We used SQL
  • 13. Copyright Third Nature, Inc. Origin: “never let data get in the way of a good story” It’s sales propaganda. “The warehouse notion is so seductive that information technology is using it to shake the purse strings loose,” says Douglas Hackney (1998) Just like today, with analytics and big data.
  • 15. Copyright Third Nature, Inc. Variations of the story continued 11,400 academic papers, 14,000 books, ~1,000,000 web pages “The discount chain moved the beer and snacks such as peanuts and pretzels next to the disposable diapers and increased sales on peanuts and pretzels by more that 27%." “For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer.” “[UK] Dads with newborns cannot go out to pubs to socialize with friends (me including). So they buy more beers so that they can drink at home!” “Some time ago, Wal-Mart decided to combine the data from its loyalty card system with that from its PoS…Once combined, the data was mined extensively …On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer.” "Shoppers who buy diapers for the first time at a Tesco store can expect to receive coupons by mail for baby wipes, toys -- and beer. Tesco's analysis showed that new fathers tend to buy more beer because they are home with the baby and can't go to the pub." "Sometimes the data can throw up surprises: mining of databases held by 7-Eleven stores in the US revealed a link between purchases of beer and nappies. When they were moved together, sales of both increased,”
  • 16. Copyright Third Nature, Inc. There are mixed answers though. Fragmentation, ambiguity, what’s going on here? Are you asking the right question? Not “is it true” but “under what set of circumstances is it true?” Marianne Faithful's version of a portrait from 'Destroy Rankin'
  • 17. Copyright Third Nature, Inc. Is it true? Yes. No. Maybe. I don’t know. I kept checking after that second one. Original: Grocery, 1992, Correlation 1. Grocery, ~1994, No correlation 2. Drug store, ~1995, Correlation 3. Drug store, 1997, Very high correlation 4. Drug store, 1997, No correlation 5. Grocery, 2000, Weak correlation 6. Multiple, 2013, Correlation
  • 18. Copyright Third Nature, Inc. Analytic insights that result in no action are expensive trivia. It’s not the insight, but what you do with it, that matters As a manager: what would you do?
  • 19. Copyright Third Nature, Inc. Reality You wouldn’t do anything without some idea of cause or context, correlation isn’t enough . sorry guys, causation does matter, otherwise it might be ice cream & sharks
  • 20. Copyright Third Nature, Inc. Maybe it’s a nightcap to sleep well
  • 21. Copyright Third Nature, Inc. Forgetful husbands only remember the basics “I send you to the store for a few necessities, and you come back with chips and beer instead of talcum powder and diapers.” - MacGyver, 4/18/1988
  • 22. Copyright Third Nature, Inc. Or harried dads rewarding themselves with impulse buys
  • 23. Copyright Third Nature, Inc. Men can’t go out with their friends any more
  • 24. Copyright Third Nature, Inc. Maybe it’s mom, because in the late-1990s…
  • 25. Copyright Third Nature, Inc. Sometimes it’s not “purchasing” exactly, but it still should still count as a correlation
  • 26. Copyright Third Nature, Inc. Nobody checked if it was diapers for babies…
  • 27. Copyright Third Nature, Inc. ‘Cause no presentation is complete without charts Whazzat? Men, shopping before Thanksgiving and end of January?
  • 28. Copyright Third Nature, Inc. Do you want to know what the answers are? So do I…
  • 29. Copyright Third Nature, Inc. Good news! You are all right. Bad news! You are all wrong. 1. Grocery, 1992, Correlation – male, TH/SA, 5-7 PM, midwest 2. Grocery, ~1994, No correlation, mid-northeast 3. Drug store, ~1995, Correlation – male, weekends, east 4. Drug store, 1997, Very high correlation, both, always, US (they decide first, ask questions later - not like science, like myth) 5. Drug store, 1997, No correlation, US 6. Grocery, 2000, Weak correlation, female, weekdays, west/mid 7. Multiple, 2013, Correlation, male, seasonal dates, US 8. Grocery, 2006, Correlation, male, UK, (not 1st hand) The story is true, and false, and indeterminate. It’s ambiguous.
  • 30. Copyright Third Nature, Inc. Are you asking the right question? Not “is it true” but “under what circumstances is it true?” Image credit: unknown
  • 31. Copyright Third Nature, Inc. “It is termed analytical cubism because of its structured dissection of the subject, viewpoint-by-viewpoint, resulting in a fragmentary image of multiple viewpoints and overlapping planes. Other distinguishing features of analytical cubism were a simplified palette of colours, so the viewer was not distracted from the structure of the form, and the density of the image at the centre of the canvas.” Le Goûter / Tea Time / Femme à la Cuillère (1911)
  • 32. Copyright Third Nature, Inc. What is happening here is that “the model” is seeing pieces of a whole because each of these models is independent, with its own observations and data, like a snapshot that covers only the focal point of your eye.
  • 33. Copyright Third Nature, Inc. You are actually seeing the pieces individually, with little context to link them,
  • 34. Copyright Third Nature, Inc. …not as independent, in their appropriate places within a larger context.
  • 35. Copyright Third Nature, Inc. When they are lined up together the larger scene emerges, in much the same way that your brain puts together a scene by focusing on individual pieces and assembling the picture from them as your focus moves from place to place. Joiner, David Hockney (1980s)
  • 36. Copyright Third Nature, Inc. Except that the pieces you see here are not nicely bounded. They are assembled piecemeal, in overlapping bits, some including one element some another. Just like David Hockney’s “joiners” from the 1980s. This is a problem any time you generalize from someone else’s model. Joiner, David Hockney (1980s)
  • 37. Copyright Third Nature, Inc. Therefore… You would choose to merchandise, price or promote in accordance with the causality you believe or determine is true. The actions you take and the related outcomes will vary based on the local context. Not one way to act but many, just like there are many attributes, not one.
  • 38. Copyright Third Nature, Inc. Key points 1. There is no “single version of the truth”, the word “version” is a clue. Truth is relative and contextual, not absolute. So don’t promote models as black and white answer-makers. 2. “All models are wrong, some are useful” – George Box 3. Like a cubist, think of all the angles when you use analytics. “Do what they did” management strategy doesn’t work when it comes to contextual problems. 4. You need to have an idea about causation. Ignore the pundits. 5. Data science is one half of the solution, applying the results is the other half. “Science” because you go into the unknown when you act. Like Heisenberg , your actions cause effects, and the choice of action may preclude another action.
  • 39. Copyright Third Nature, Inc. People are the link that gives context. Without a person determining the applicability of analytical models or interpreting the results, little sense can be made of them. 'Pearblossom Highway, 11th to 18th April 1986 No.2', David Hockney
  • 40. Copyright Third Nature, Inc. The power of myth The origin story is no less true for having not happened. A myth: ▪ Reveals some mystery ▪ Shows the complexity of the world and explains what you encounter ▪ Validates a sociological system ▪ Show that there’s a role, a place for us Lose Santa, gain a holiday, just like the Grinch. (I still use this story )
  • 41. Copyright Third Nature, Inc. The final lesson learned at Osco in the 1990s: If you put two products next to each other on a shelf then they are more likely to be sold together. Profundity.
  • 42. Copyright Third Nature, Inc. Some reference material Predictive Analytics and Data Mining. 2014 (chapter 6, association rules) The delights of seeing: Cubism, Joiners and The Multiple Viewpoint What Three Wise Men have to say about diagnosis, BMJ 2011;343:d7769 Discovering temporal association rules: Algorithms, language and systems, http://www.computer.org/csdl/proceedings/icde/2000/0506/00/05060306.pdf Beer and Breastfeeding, Pubmed, http://www.ncbi.nlm.nih.gov/pubmed/11065057 Frames, Biases, and Rational Decision-Making in the Human Brain, http://econ.as.nyu.edu/docs/IO/9877/De_Martino1.pdf
  • 43. Copyright Third Nature, Inc. About the Presenter Mark Madsen is president of Third Nature, a consulting and advisory firm focused on analytics, data and technology strategy and data management. Mark is an award-winning author, architect and CTO who has received awards for his work from the American Productivity & Quality Center, Smithsonian Institute and industry associations. He is an international speaker, a contributor to Forbes, and member of the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net
  • 44. Copyright Third Nature, Inc. About Third Nature Third Nature is a research and advisory firm focused on new and emerging technology and practices in analytics, information strategy, business intelligence and data management. Our goal is to help organizations solve problems using data. We offer education, advisory and research services to support business and IT organizations. We also provide product-related consulting to software vendors in the data industry. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in strategy and architecture, so we look at emerging technologies and markets, evaluating how technologies are applied to solve problems rather than simply comparing product features.