Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
14. Earliest use of ādata miningā: 1962
(c) KDnuggets 2016 15
Source: Google Books
After eliminating many āfollowing data. Mining cost is ā examples
which refer to Mining of minerals,
and books from ā1958ā that have a CD attached (errors in book year)
The earliest ādata miningā reference I found is
24. Regional Interest in
āData Scienceā in 2015
25(c) KDnuggets 2016
Google Trends
Note: search for āData Scienceā is
different from [Data Science]
38. The best data scientists have one
thing in common ā
unbelievable curiosity
DJ Patil, US First Chief Data Scientist
http://www.sciencefriday.com/articles/10-questions-for-the-
nations-first-chief-data-scientist
April 2016
39
51. Lesson 8: Limits to Predicting Human
Behavior?
ā¢ Inherent randomness, complexity in human
behavior
ā¢ Individual predictions have limited accuracy
(but can still be better than random and very
useful for consumer analytics)
ā¢ Aggregate predictions (eg who will win the
election) more accurate, because individual
randomness cancels out
(c) KDnuggets 2016 52
53. Direct Marketing Lift:
Random and Model-sorted Lists
0
10
20
30
40
50
60
70
80
90
100
5
15
25
35
45
55
65
75
85
95
Random
Model
5% of random list have 5% of hits
5% of model-score ranked list have 21% of hits.
Lift(5%) = 21%/5% = 4.2
Pct list
CPH:CumulativePctHits
54. Most lift curves are surprising similar-
limit to human predictability?
Study of lift curves in banking,
telecom
Best lift curves are similar
Special point T=Target
percentage
Lift(T) ~ sqrt (1/T)
G. Piatetsky-Shapiro, B. Masand,
Estimating Campaign Benefits and
Modeling Lift, in Proceedings of
KDD-99 Conference, ACM Press,
1999.
(c) KDnuggets 2016 55
0
2
4
6
8
10
12
14
0 5 10 15 20 25
100*T%
Lift
Actual lift(T) Est. lift(T)
88. Shortage of Data Scientists?
ā¢ McKinsey (2011): shortage by 2018 in US
ā 140-190,000 people with deep analytical skills
ā 1.5 M managers/analysts with the know-how to
use the analysis of big data to make effective
decisions.
Source:
www.mckinsey.com/mgi/publications/big_data/
93(c) KDnuggets 2016
89. Data Scientist ā
Sexiest Job of the 21st Century?
ā¢ Thomas H. Davenport and D.J. Patil, (Harvard
Business Review, 2012)
94(c) KDnuggets 2016
96. Big Data
ā¢ Next Industrial Revolution
ā¢ Data Science is the Engine of Big Data
101(c) KDnuggets 2016
97. Doing Old Things Better
Application areas
ā Direct marketing/Customer modeling
ā Recommendations
ā Fraud detection
ā Security/Intelligence
ā Healthcare
ā ā¦
ā¢ Competition will level companies
102(c) KDnuggets 2016
98. Big Data Enables New Things !
ā¢ Google ā first big success of big data
ā¢ Social networks (Facebook, Twitter, LinkedIn,
ā¦) success depends on network size, i.e. big
data
ā¢ Big Data in Health-care
ā image analysis, diagnosis,
ā Personalized medicine
ā¢ Recommendations - Netflix streaming
103(c) KDnuggets 2016
Churn: best algorithms for predicting churn have lift of 5-7 ā 5-7 times better than random.
Behavioral advertising: 2-3% CTR ā 10 times better than random
Future is Bright for Big Data, but need use caution when evaluating claims