Big Data and Data Science W's

Big Data &
Data Science
W's
Emanuele Della Valle
@manudellavalle
Prof. @polimi & Founder @fluxedo_

W's
18/06/2018 @manudellavalle - http://emanueledellavalle.org 2

Why?
• In many organizations decisions are made by
"questionable" methodologies such as
– Highest Paid Person Opinion (HiPPO)
– Flipism (all decisions are made by flipping a coin)

Why?
Highest Paid Person Opinion (HiPPO)

Why?
Flipism (all decisions are made by flipping a coin)

Why?
• In many organizations decisions are made by the
• This could have been the right approach in the '70s …
– See the "Theory of Bounded Rationality" by Herbert Simons

Why?
18/06/2018 @manudellavalle - http://emanueledellavalle.org
[source http://www.azquotes.com/quote/139996 ]
7

Why?
• In many organizations decisions are made by the
• This could have been the right approach in the '70s …
– See the "Theory of Bounded Rationality" by Herbert Simons
• … but in the Big Data era one can dream of
data-driven organization

Why?
• Data-Driven Organization

Why?
Decisions no longer have to be made in the dark
or based on gut instinct; they can be based on
evidence, experiments and more accurate
forecasts.
-- McKinsey

Why?
• Data-driven organizations
– perform better
• The data shows where they can streamline their processes
– are operationally more predictable
• Data insights fuel current and future decision making
– are more profitable
• Constant improvements and better predictions help to
outsmart the competition and improve innovation.

Why?
• Moneyball: data + analysis to win games
[source: https://www.imdb.com/title/tt1210166/ ]
12

What's Big Data?
[source: IBM, 2012]

What's Big Data?
• Big Data is "crude oil" … that we have to
– Extract
– Transport in mega-tankers
– Ship through pipelines
– Store in massive silos
– …

What's Data Science?
• Data Science is "refining crude oil"
[source:http://allabtinstru.blogspot.com/2016/09/ProcessofRefiningCrudeOil.html]
19

What's Data Science?
• The Science [and Art] of…
– Discovering what we don’t know from data
– Obtaining predictive, actionable insight from data
– Creating Data Products that have business impact
now
– Communicating relevant business stories from data
– Building conﬁdence in decisions that drive business
value

Who's a Data Scientist?
• Drew Conway, 2010

How?
• Statistics starts with data
• Two goals of analyzing data
– Descriptions: how nature associates responses to inputs
– Predictions: response for future input variables
[source: Statistical Modeling: The Two Cultures. Leo Breiman, 2001]
nature xy
independent
variable
response
variable
22

How?
[source: Marc Andrews, 2014]
Leverage more of the data being captured

How?

How?
Reduce effort required to leverage data
26

How?
27

What?
28

How?
Data-driven exploration looking for correlation
29

How?
Data-driven exploration looking for correlation
30

Your butcher …

… at scale!

How?
Leverage data as it is captured
33

How?
34

How?
35

How?
[sourcehttps://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/]
36

How?
Overall picture by Gartner

Where?
[source https://www.ted.com/talks/anne_milgram_why_smart_statistics_are_the_key_to_fighting_crime ]
Improve public safety and
reduce violent crime
through data analytics
-41% murders | -27% crimes
38

Where?

What about cybersec?

Credits
• Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Carlos Somohano, 2013
– https://www.slideshare.net/datasciencelondon/big-data-sorry-data-
science-what-does-a-data-scientist-do-world
• Becoming a data-driven organization The what, why and how.
SAS, 2018
– https://www.sas.com/en_us/whitepapers/becoming-data-driven-
organization-109150.html
• Never trust summary statistics alone; always visualize your data.
Alberto Cairo, 2016
– http://www.thefunctionalart.com/2016/08/download-datasaurus-
never-trust-summary.html
• 2017 Planning Guide for Data and Analytics. John Hagerty
(Gartner), 2016
– https://www.gartner.com/binaries/content/assets/events/keywords/
catalyst/catus8/2017_planning_guide_for_data_analytics.pdf

Thank you!
Any Question?
Emanuele Della Valle
@manudellavalle
Prof. @polimi & Founder @fluxedo_

Big Data and Data Science W's

Recommended

Recommended

More Related Content

Similar to Big Data and Data Science W's

Similar to Big Data and Data Science W's (20)

More from Emanuele Della Valle

More from Emanuele Della Valle (20)

Recently uploaded

Recently uploaded (20)

Big Data and Data Science W's