Big Data Analytics: The art of the data scientist discusses the evolution of data analytics and roles of data scientists. It explains that while volume is interesting, distinguishing big data requires understanding patterns in incomplete and anonymized data from multiple sources. Effective data science discovers unknown insights, provides business value through predictive models and data products, and builds confidence in decisions through explanation and storytelling.
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Keynote Dubai
1. Big Data Analytics: The Art of the
Data Scientist
Neil Raden
Founder, Hired Brains Research
Twitter: @NeilRaden
Blog: http://hiredbrains.wordpress.com
Website: http://www.hiredbrains.com
Mail: nraden@hiredbrains.com
LinkedIn: http://www.linkedin.com/in/neilraden
2. 1950 1960 1970 1980 1990 2000
Batch Reporting
CICS/OLTP
C/S OLTP
Y2K/ERP
4GL/PC/SS DW/BI
Big Data
Hybrid
2010
Convergence: End of managing from scarcity
2020
2
Copyright 2014 Neil Raden and Hired Brains Research LLC
3. Big Is Relative
This Pace Isn’t New, Just Magnitude
Though Volume is interesting, it isn’t what distinguishes Big Data
Copyright 2014 Neil Raden and Hired Brains Research LLC 3
4. Moore’s Law & Ferrari
Copyright 2014 Neil Raden and Hired Brains Research LLC 4
5. No More Managing from Scarcity
5
Copyright 2014 Neil Raden and Hired Brains Research LLC
6. Even Big Data Doesn’t Speak for Itself
6
• Incomplete
• Behaviors under-
represented
• Anonymizing
disasters
• Single source of
data inadequate
• Harmonization
Not a crystal ball
Copyright 2014 Neil Raden and Hired Brains Research LLC
7. Decisions: A Miracle Happens?
40 years with
decision support
and BI. Are we
making better
decisions
Will Data Science
Lead Us to Better
Decision Processes?
Getting to a culture of decision making requires your business to have
real, solid wins using analytics to make people care from top to
bottom. Copyright 2014 Neil Raden and Hired Brains Research LLC 7
8. What Is Data Science?
• Discovering what we don’t know from data
• Getting predictive and/or actionable insight
• Development of data products that have clear
business value
• Providing value to the organization through
sharing and learning
• Using techniques like storytelling and
metaphor to explain concepts
• Building confidence in decisions
9. Do You Know This Number?
Copyright 2014 Neil Raden and Hired Brains Research LLC 9
2.718281828459...
Why is this important
10. Euler Gave Us the Tools
Copyright 2014 Neil Raden and Hired Brains Research LLC 10
Contribution Example
Graph Theory Graph & Ontology Databases
Infinitesimal Calculus Everything
Topology Topological Data Analysis
Number Theory Encryption
Nothing we do in Big Data would be possible without Euler
11. But Euler Got One Thing Wrong
Copyright 2014 Neil Raden and Hired Brains Research LLC 11
• Tobias Mayer
• A contemporary of Euler
• Famous for his observations of the
libration of the moon
• TONS of observations
• Figured out how to group them
Famous quote:
Because these observation were derived from nine times as
many observations, one can therefore conclude that they are
nine times more more accurate”
12. Euler Not a Data Scientist
Copyright 2014 Neil Raden and Hired Brains Research LLC 12
Euler:“By the combination of two or more
equations, the errors of the combinations and
the calculations multiply themselves.”
The greatest
mathematician of all time
pre-dated the concept of
statistical error
13. Why Does This Matter?
Copyright 2014 Neil Raden and Hired Brains Research LLC 13
Because Data Science is
not the realm of the
most brilliant
mathematicians
It’s for people who know how to do
it and who have the correct training
and tools to do it themselves
14. The Data Scientist
• Term invented by Yahoo
• Super-tech, super-quant
• Business expert too
• Orientation: Search and Web
• We used to call them quants
• Few and far between
• How do you find/train them?
• Hint: like actuaries
14
Copyright 2014 Neil Raden and Hired Brains Research LLC
15. Types of Analytics
Data Mining
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X X
X X
X
X
X
X
X
X
X
X X
X
X
X
X
X X
X
X
X
X
X
X
Who are my best/worst
customers? How do I
turn my data into rules
for better decisions?
Predictive Analytics
How are those
customers likely to
behave in the future?
How do they react to
the myriad ways I can
“touch” them?
Optimization
How do make the
best possible
decisions given my
constraints?
Knowledge - Description Action - Prescription
Business Intelligence
How do I use data to
learn about my
customers? What has
been happening in my
business?
Copyright 2014 Neil Raden and Hired Brains Research LLC 15
16. Descriptive Analytics - Improve Rules
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* * *
*
*
*
*
* *
*
*
*
*
*
*
*
Low-moderate
income, young
High
Income High income,
low-moderate education
Moderate-high education
low-moderate income
High
Moderate education,
low income, middle-aged
Low education,
low income
Education
High
Copyright 2014 Neil Raden and Hired Brains Research LLC 16
17. Predictive Analytics – Add Insight
10
20
30
40
Member completes treatment
Member fails to complete treatment
Copyright 2014 Neil Raden and Hired Brains Research LLC 17
18. Impact May Take Time to Play Out
Copyright 2014 Neil Raden and Hired Brains Research LLC 18
19. Stat Tools Can Be Dangerous
Copyright 2014 Neil Raden and Hired Brains Research LLC 19
• Tests are not the event
• Tests are flawed
Tests detect things that don’t exist
• Tests give test probabilities not the real probabilities
• False positives skew results
• People prefer natural numbers
• Even Science is a test
21. Descriptive Title Quantitative
Sophistication/Numeracy
Sample Roles
Type I Quantitative R&D PhD or equivalent Creation of theory,
development of algorithms.
Academic /research. Work in
business/government for
very specialized roles
Type II Data Scientist or Quantitative
Analyst
Advanced Math/Stat, not
necessarily PhD
Internal expert in statistical
and mathematical modelling
and development, with solid
business domain knowledge.
Type III Operational Analytics Good business domain,
background in statistics
optional
Running and managing
analytical models. Strong
skills in and/or project
management of analytical
systems implementation
Type IV Business Intelligence/
Discovery
Data and numbers oriented,
but no special advanced
statistical skills
Reporting, dashboard, OLAP
and visualization, some
design, posterior analysis of
results from quantitative
methods. Spreadsheets,
“business discovery tools”
21
Analytic Types
Types of Analysis
Copyright 2014 Neil Raden and Hired Brains Research LLC
22. Descriptive Title Quantitative
Sophistication/Numeracy
Sample Roles
Type I Quantitative R&D PhD or equivalent Creation of theory,
development of algorithms.
Academic /research. Work in
business/government for
very specialized roles
Type II Data Scientist or Quantitative
Analyst
Advanced Math/Stat, not
necessarily PhD
Internal expert in statistical
and mathematical modelling
and development, with solid
business domain knowledge.
Type III Operational Analytics Good business domain,
background in statistics
optional
Running and managing
analytical models. Strong
skills in and/or project
management of analytical
systems implementation
Type IV Business Intelligence/
Discovery
Data and numbers oriented,
but no special advanced
statistical skills
Reporting, dashboard, OLAP
and visualization, some
design, posterior analysis of
results from quantitative
methods. Spreadsheets,
“business discovery tools”
22
Analytic Types
Types of Analysis
Type V
Better BI/Viz/Disco
Training/Mentoring/Apps
Training/Mentoring/Apps
3rd Party Services
Type Shifting
Copyright 2014 Neil Raden and Hired Brains Research LLC
23. A Typical Day
• Basic data manipulations to wrangle data
and fit a variety of standard models -40%
• Translate a business problem into the
design of a data analysis strategy - 5%
• Graphically explore data to motivate
modeling choices and improvements– 10%
• Interpret and critically examine standard
model output – 5%
• Test the performance of models on
holdout data - 10%
• Go to meetings – 30%
Copyright 2014 Neil Raden and Hired Brains Research LLC 23
70% is not Data Scientist work
24. Type Shifting
• As much as 80% of “Data Scientist” work can
be done by others
• Data gathering, cleansing, profiling, parsing
and loading
• Data and process stewardship
• Platform availability
• Providing organizational and market domain
expertise
• Creation of presentation material
Copyright 2014 Neil Raden and Hired Brains Research LLC 24
25. The combination of some data and an aching
desire for an answer does not ensure that a
reasonable answer can be extracted from a
given body of data.
John Tukey
Copyright 2014 Neil Raden and Hired Brains Research LLC 25
26. Analytics is hard
Analytics takes resources
Analytics takes effort to create and assimilate
You need to focus your analytics at the key leverage
points of your business
UPS focuses on where the package is
Marriott focuses on yield management
If you try to do everything, you won’t do anything
well.
Copyright 2014 Neil Raden and Hired Brains Research LLC 26
Analytics Is Hard
27. A Final Thought About Analytics
27
The challenge of analytics is communication and
creating a shared understanding.
It’s about focusing on high impact areas, moving
forward one step at a time, being skeptical, being
creative, searching for the truth.
Any company can
“Compete on Analytics.”
But not like this
StockMarket Returns for the “Competing on Analytics” Cohort
-80%
-40%
0%
40%
80%
120%
Amazon
Marriott
Honda
Intel
Novartis
Wal-Mart
UPS
Verizon
P
&
G
Progressive
Capital
One
Yahoo
Dell
Barclays
Average Stock Market Return
Copyright 2014 Neil Raden and Hired Brains Research LLC
28. Five Things to Remember
• Data is an “asset,” people make it valuable
• Your data scientists may well be a team
• Communication, insight and reason more
important than math
• You have lurking data scientists in your firm
• Start with what matters, build confidence
Copyright 2014 Neil Raden and Hired Brains Research LLC 28
29. Thank You
Copyright 2014 Neil Raden and Hired Brains Research LLC 29
Neil Raden
Founder, Hired Brains Research
Twitter: NeilRaden
Blog: http://hiredbrains.wordpress.com
Website: http://www.hiredbrains.com
Mail: nraden@hiredbrains.com
LinkedIn: http://www.linkedin.com/in/neilraden