SlideShare a Scribd company logo
1 of 113
1
https://www.datasciencetech.institute/
Data Science:
Past, Present, and Future
Gregory Piatetsky-Shapiro
KDnuggets
2Ā© KDnuggets 2016
La Science des donnƩes:
passƩ, prƩsent et futur
Predicting Behavior ā€“
Key to Survival
Ā© KDnuggets 2016 3
Better prediction ā€“ better intelligence
ā€œPredictionsā€: Astrology
Ā© KDnuggets 2016 4
My May 26 Horoscope:
So what if things aren't
completely wonderful in your
life right now? Just keep your
hopes high, and your fingers
crossed. ā€¦ Being with the
people who make you feel good
about yourself will help keep
your thoughts bright, so get
together with your closest
friend as soon as you can..
www.astrology.com/horoscope/daily/aries.html
ā€œPredictionsā€ : Turkish Coffee Grinds
Ā© KDnuggets 2016 5
If a big chunk of the coffee
grounds falls down on the saucer
then it is taken as the first positive
sign of your reading. ā€œTrouble and
worries are leaving youā€.
Pundits ā€œPredictionsā€
ā€¢ Nate Silver FiveThirtyEight.com prediction for
Trump winning Republican nomination:
ā€¢ Aug 2015: 2%
ā€¢ Sep 2015: 5%
ā€¢ Nov 2015: 6%
ā€¢ Jan 2016: 12%
ā€¢ May 2016: 99%
Ā© KDnuggets 2016 6
Desire to Predict ā€“ Deep Human Trait
Ā© KDnuggets 2016 7
ā€¢ People are hard-wired to see patterns
ā€¢ People want predictions
ā€¢ Human intuition does not work on large scale
data, for understanding probability
ā€¢ Good story is essential to a convincing
prediction (whether true or false)
Lessons
Data Science
Data-Driven, Scientific
approach to prediction
and data analysis
8
Outline
ā€¢ Intro, Data Science History and Terms
ā€¢ 10 Real-World Data Science Lessons
ā€¢ Data Science Now: Polls & Trends
ā€¢ Data Science Roles
ā€¢ Data Science Job Trends
ā€¢ Data Science Future
Ā© KDnuggets 2016 9
What do we call it?
ā€¢ Statistics
ā€¢ Data Mining
ā€¢ Knowledge Discovery in Data
(KDD)
ā€¢ Predictive Analytics
ā€¢ Data Analytics
ā€¢ Data Science
ā€¢ ā€¦?
Ā© KDnuggets 2016 10
Core Idea:
Finding
Useful
Patterns
in Data
Pre-history (1800-2008): Statistics
Ā© KDnuggets 2016 11
From Google Ngram viewer ā€“ English language books
Search case insensitive.
Other languages need to be considered for full picture
statistics is the biggest term in 20th century,
Analytics is used increasingly thru 20th century
data mining appears in late 1990s
French Books, 1800-2008
Statistiques vs Mathematiques
Ā© KDnuggets 2016 12
ā€œData Miningā€ Surges in 1996
Ā© KDnuggets 2016 13
Advances in Knowledge Discovery and
Data Mining, AAAI/MIT Press, 1996, Eds:
U. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy
Analytics
Data Mining
KDD-95, 1st Conference on Knowledge
Discovery and Data Mining, Montreal
Google N-grams search case insensitive, smoothing 1
Earliest use of ā€œdata miningā€: 1962
(c) KDnuggets 2016 15
Source: Google Books
After eliminating many ā€œfollowing data. Mining cost is ā€ examples
which refer to Mining of minerals,
and books from ā€œ1958ā€ that have a CD attached (errors in book year)
The earliest ā€œdata miningā€ reference I found is
Very Recent History
Using Google Trends
(c) KDnuggets 2016 16
Google Trends, 2005-2016:
After 2006, Analytics > Data Mining
17(c) KDnuggets 2016
Global ā€“ all regions
>50% of ā€œAnalyticsā€ searches are for
ā€œGoogle Analyticsā€
18(c) KDnuggets 2016
Google Analytics introduced,
Dec 2005
Google Trends, 2005-2016
(c) KDnuggets 2016
data
science
analytics - Google
big data
data mining
2010 2012 2014
Google Trends, 2005-2016
(c) KDnuggets 2016
2012: Analytics down, Big Data up
2015
2005
Google Trends, 2005-2016
(c) KDnuggets 2016
2013: Data Science grows
20132005
Google Trends:
Machine Learning, Data Science,
Deep Learning
Ā© KDnuggets 2016 22
2009 2011 2013 2015
Google Trends: Machine Learning
Ā© KDnuggets 2016 23
Machine Learning ~ ā€œMachine Learningā€
Google Trends: Data Science
Ā© KDnuggets 2016 24
[Data Science] != ā€œData Scienceā€
Lesson: Examine assumptions carefully
2009 2011 2013 2015
Regional Interest in
ā€œData Scienceā€ in 2015
25(c) KDnuggets 2016
Google Trends
Note: search for ā€œData Scienceā€ is
different from [Data Science]
KDnuggets Audience by Region, Q1
2016
Ā© KDnuggets 2016 26
Data Science History
ā€¢ < 1900 - Statistics
ā€¢ 1960s Data Mining = bad activity, data ā€œdredgingā€
ā€¢ 1990 - ā€œData Miningā€ is good, surges in 1996
ā€¢ 2003 - ā€œData Miningā€ peaks (bad in press, invasion of
privacy?), slowly declines, but still popular
ā€¢ 2006 - Google Analytics
ā€¢ 2007 - Business/Data/Predictive Analytics
ā€¢ 2012 - Big Data
ā€¢ 2014 - Data Science
ā€¢ 2015 - Deep Learning
ā€¢ 2018 - ??
27Ā© KDnuggets 2016
10 Real-World Lessons
from the Art & Practice
of Data Science &
Data Mining
28Ā© KDnuggets 2016
Lesson 1: It is a Iterative, Circular Process
Ā© KDnuggets 2016 29
Waterfall
model
does NOT
work
for
Data
Science
CRISP-DM: Iterative, Circular Process
Ā© KDnuggets 2016 30
See www.kdnuggets.com/2016/03/data-science-process-rediscovered.html
Data Mining Process ā€“ CRISP-DM, 1998
CRISP-DM, 1998
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
Academic Data Science
Process
Ā© KDnuggets 2016 31
See www.kdnuggets.com/2016/03/data-science-process-rediscovered.html
Harvard, 2013
Machine Learning Workflow, MS Azure
Ā© KDnuggets 2016 32
See
www.kdnuggets.com/2016/04/developers-need-know-about-machine-learning.html
blogs.msdn.microsoft.com/continuous_learning/2014/11/15/end-to-end-predictive-model-in-
azureml-using-linear-regression/
Lesson 2: Data Engineering
Takes The Bulk of Time
ā€¢ Building Machine Learning/Predicting Models
is the key (and most fun) part, but only a small
part of the whole process
ā€¢ 60-80% (?) spent on Data
Preparation/Engineering
Ā© KDnuggets 2016 33
Competitions are different
Ā© KDnuggets 2016 34
March Machine Learning Mania 2016,
Winner's Interview: 1st Place, Miguel Alomar
https://twitter.com/kdnuggets/status/730417186167263232
http://blog.kaggle.com/2016/05/10/march-machine-learning-
mania-2016-winners-interview-1st-place-miguel-alomar/
How #MachineLearning @Kaggle
winner spent time:
35% read forums,
25% build models,
25% evaluate results
15% data preparation,
Lesson 3: Question Assumptions
Ā© KDnuggets 2016 35
Problem:
Deciles not uniform
Decile 1 is too rare,
Decile 0 ā€“ too frequent?
Why ?
* Not actual data
Measurement
Mass Spectrometry
Ā© KDnuggets 2016 36
Mass spectrometry (MS) is an
analytical technique that ionizes
chemical species and sorts the
ions based on their mass to
charge ratio.
Can produce a large number
(~ 20,000) of
m/z values for a sample
Goal: find biomarkers for
disease, test, condition
Question Assumptions
Ā© KDnuggets 2016 37
Instead of Measurement Deciles
Examine actual ranges,
including 0
Nothing between 1 and 14
Value 0 is too frequent
Why ?
* Not actual data
Measurement
Question Assumptions
Ā© KDnuggets 2016 38
Instead of Measurement Deciles
Examine actual ranges,
including 0
Nothing between 1 and 14
Value 0 is too frequent
Why ?
* Not actual data
Measurement
Someone added a rule to round
raw measurement values
below 15 to zero
The best data scientists have one
thing in common ā€“
unbelievable curiosity
DJ Patil, US First Chief Data Scientist
http://www.sciencefriday.com/articles/10-questions-for-the-
nations-first-chief-data-scientist
April 2016
39
Lesson 4: Focus on the Right Metric -
Actionable
ā€¢ Consumer: Churn may depend on age, region,
usage, and rate plan. Rate plan easiest to
change.
ā€¢ Uplift Modeling in Marketing and Politics:
focus on persuadables
Ā© KDnuggets 2016 40
Right Metric: Uplift Modeling
Ā© KDnuggets 2016 41
Donā€™t model if consumer will buy ā€“
Model if consumer will buy in response
to an offer
Right Metric: Uplift Modeling
Ā© KDnuggets 2016 42
From Eric Siegel presentation at PAW, 2011
In Obama 2012 Campaign
www.thefiscaltimes.com/Articles/2013/01/21/The-Real-Story-Behind-Obamas-Election-Victory
Lesson 5: Be a Fox, not a Hedgehog
Ā© KDnuggets 2016 43
Read Isaiah Berlin 1953 essay, The Hedgehog and the Fox
A fox knows many things, but
a hedgehog - one important thing.
Lesson 5: Modeling
No Free Lunch Theorem ā€“ no method is universally the best (Wolpert)
In Kaggle competitions, there are 2 ways to win (Anthony Goldbloom, 2016):
ā€¢ Handcrafted feature engineering
ā€¢ Or Deep Learning Neural Networks
www.kdnuggets.com/2016/01/anthony-goldbloom-secret-winning-kaggle-competitions.html
ā€¢ XGBoost ā€“ winning method in many recent Kaggle competitions
ā€¢ Ensemble methods
For Structured Data (Sebastian Rashka )
ā€¢ SVM (Support Vector Machines) for smaller data
ā€¢ Random Forests ā€“ more data, more automated
www.kdnuggets.com/2016/04/deep-learning-vs-svm-random-forest.html
Unstructured:
ā€¢ Deep Learning
Ā© KDnuggets 2016 44
Lesson 6: Avoid Overfitting
Ā© KDnuggets 2016 45
http://www.kdnuggets.com/2014/06/cardinal-sin-data-mining-data-science.html
Many examples at http://tylervigen.com/spurious-correlations
Avoid Overfitting
Ā© KDnuggets 2016 46
ā€œIrreproducibleā€ results - BIG problem is social
sciences, medicine:
John P. A. Ioannidis famous paper Why Most Published
Research Findings Are False (PLoS Medicine, 2005).
Due to
ā€¢ Small samples
ā€¢ Testing too many hypotheses
ā€¢ Confirmation bias (explicit or implicit)
ā€¢ Poor training
How to Avoid Overfitting
ā€¢ If it is too good to be true, it probably is
ā€¢ Find the simplest possible hypothesis
ā€¢ Adjusting the False Discovery Rate
ā€¢ Randomization Testing
ā€¢ Nested cross-validation (train, test, holdout)
ā€¢ Regularization (adding a penalty for
complexity)
Ā© KDnuggets 2016 47
www.kdnuggets.com/2014/06/cardinal-sin-data-mining-data-science.html
Lesson 7: Tell a story
ā€¢ Combine facts into a story
ā€¢ Combine visual and text presentation
ā€¢ Explanation gives credibility
ā€¢ Dynamic / Interactive
ā€¢ Examples: Kefir, Google Analytics, Quill
Ā© KDnuggets 2016 48
KEFIR (KEy FInding Reporter), 1994
ā€¢ Overview report
www.kdnuggets.com/data_mining_course/kefir/overview.htm
ā€¢ Inpatient admissions
www.kdnuggets.com/data_mining_course/kefir/s2.htm
Ā© KDnuggets 2016 49
Quill report for KDnuggets
ā€¢ Sessions Stay Flat, But Way Higher Than 12-Month Weekly Average
ā€¢ Sessions remained flat compared to the prior week. The 121,040
sessions, however, were above your 85,105-session weekly average
for the year. Your site's total pageviews stayed flat last week at
206,124, while pages per session grew less than a percent to 1.7.
That's equal to your weekly average for the year.
ā€¢ Among all your pages, Analytics, Data Mining, and Data Science had
both the highest bounce rate (43%) and the most pageviews (8,734)
last week.
Ā© KDnuggets 2016 50
La Diseuse de bonne aventure,
Caravaggio, 1595 (Louvre)
Ā© KDnuggets 2016 51
Beware of
Fortune
tellers!
Lesson 8: Limits to Predicting Human
Behavior?
ā€¢ Inherent randomness, complexity in human
behavior
ā€¢ Individual predictions have limited accuracy
(but can still be better than random and very
useful for consumer analytics)
ā€¢ Aggregate predictions (eg who will win the
election) more accurate, because individual
randomness cancels out
(c) KDnuggets 2016 52
Example: Netflix Prize, 2006
ā€¢ Example: Netflix Prize: the most advanced
algorithms were only a few percentages better
than basic algorithms
Ā© KDnuggets 2016 53
See Gregory Piatetsky, ā€œBig Data: Hype & Realityā€, Harvard Business
Review 2012, https://hbr.org/2012/10/big-data-hype-and-reality/
Direct Marketing Lift:
Random and Model-sorted Lists
0
10
20
30
40
50
60
70
80
90
100
5
15
25
35
45
55
65
75
85
95
Random
Model
5% of random list have 5% of hits
5% of model-score ranked list have 21% of hits.
Lift(5%) = 21%/5% = 4.2
Pct list
CPH:CumulativePctHits
Most lift curves are surprising similar-
limit to human predictability?
Study of lift curves in banking,
telecom
Best lift curves are similar
Special point T=Target
percentage
Lift(T) ~ sqrt (1/T)
G. Piatetsky-Shapiro, B. Masand,
Estimating Campaign Benefits and
Modeling Lift, in Proceedings of
KDD-99 Conference, ACM Press,
1999.
(c) KDnuggets 2016 55
0
2
4
6
8
10
12
14
0 5 10 15 20 25
100*T%
Lift
Actual lift(T) Est. lift(T)
More recent data is more predictive!
ā€¢ Real-time behavior data more predictive than
historical, demographic data
ā€¢ Ad retargeting
Ā© KDnuggets 2016 56
Lesson 9: Deployment & Maintenance
ā€¢ Netflix Prize winning algorithm not deployed
ā€¢ Technical debt of Machine Learning
ā€“ (Google research.google.com/pubs/pub43146.html )
Ā© KDnuggets 2016 57
ā€¦ the additional accuracy gains that we
measured did not seem to justify the
engineering effort needed to bring them
into a production environment. Also, our
focus on improving Netflix personalization
had shifted to the next level by then.
http://techblog.netflix.com/2012/04/netflix
-recommendations-beyond-5-stars.html
Modeling in Real World vs Kaggle
ā€¢ ROI of extra accuracy vs cost of maintenance
ā€¢ Is model explainable ? (legal, acceptance reasons)
ā€¢ Does model discriminate on basis of race,
gender,ā€¦?
ā€¢ Netflix Prize algorithm which won $1M - not
implemented
ā€¢ In real-world, simpler is usually better
Ā© KDnuggets 2016 58
Deployment Test and Monitor
ā€¢ Monitor assumptions
ā€“ Do fields have the same value distributions
ā€¢ Detect when model is no longer valid, needs
rebuilding
ā€¢ Automatic model re-build
Ā© KDnuggets 2016 59
Lesson 10: Donā€™t just predict, optimize
ā€¢ Prediction is usually just one part of making a
decision
ā€¢ Consider cost, frequency, latency, human
behavior, etc
ā€¢ Goal: Optimization
ā€¢ From Data Science to Decision Science
Ā© KDnuggets 2016 60
Privacy in the age of Big Data
ā€¢ Privacy laws much stricter in Europe
ā€¢ Individual Privacy vs Benefits for all (eg
aggregated health-care data)
ā€¢ Image and Face recognition (eg Facebook)
ā€¢ Very hard to keep privacy with so many digital
breadcrumbs
ā€¢ Privacy vs Security (eg FBI vs Apple)
ā€¢ Politicians are behind technology curve ā€“
researchers should help society, politicians make
an informed decision
Ā© KDnuggets 2016 61
When It Is Ethical To Analyze
A Particular Dataset?
62Ā© KDnuggets 2016
Data Ethics Golden Rule
Donā€™t do with someone else data
what you donā€™t want done
with your data
Ā© KDnuggets 2016 63
Data Science Now
What, Where, How
KDnuggets Polls Findings
www.KDnuggets.com/polls/
64(c) KDnuggets 2016
65Ā© KDnuggets 2016
www.kdnuggets.com/2016/01/poll-analytics-data-mining-data-science-applied-2015.html
Where did you apply Analytics,
Data Mining, Data Science ?
Avg. Number of Industries 2.7
Most Popular:
- CRM
- Finance
- Banking
- Health Care
- Science
- e-commerce
Highest growth in:
Games, 121%
Entertainment / Music 74%
Social Good/Non-profit, 68%
Finance, 42%
Education, 30%
Data Types
Analyzed/Mined
66Ā© KDnuggets 2016
www.kdnuggets.com/polls/2014/data-types-sources-analyzed.html
Most popular:
- Table data
- Time series
- Text
- itemsets/transactions
Most growing:
- music/audio
- JSON
Largest Dataset Analyzed?
Ā© KDnuggets 2016 67
www.kdnuggets.com/2015/08/largest-dataset-analyzed-more-gigabytes-petabytes.html
Largest Dataset Analyzed?
Ā© KDnuggets 2016 68
Python swallowed an Elephant?
Antoine de Saint-Exupery
Largest Dataset Analyzed?
Ā© KDnuggets 2016 69
Big Data Miners ā€“
elite group .
www.kdnuggets.com/2015/08/largest-dataset-analyzed-more-gigabytes-petabytes.html
Median in 11-100 GB
range, slight increase.
Largest Dataset Analyzed by Region
Ā© KDnuggets 2016 70
Big Data Miners:
TeraBytes and
Petabytes
10-25%
4 Main Languages of Data Science
Ā© KDnuggets 2016 71
www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
4 Main Languages of Data Science, 2
Ā© KDnuggets 2016 72
R vs Python
Ā© KDnuggets 2016 74
http://www.kdnuggets.com/2015/07/poll-primary-analytics-language-r-python.html
Surprising Stability:
88% of R users stayed with R
and 91% stayed with Python.
% of primary R , Python users up,
while % Other or None down.
Data Science Roles
77(c) KDnuggets 2016
Data Science Roles
ā€¢ Data Analyst
ā€¢ (Big) Data Engineer
ā€¢ Data Scientist
ā€¢ Machine Learning Researcher
ā€¢ Data Science Manager/Director
ā€¢ Chief Data Officer
ā€¢ Company Founder
Ā© KDnuggets 2016 78
Data Science Venn Diagram, 2010
Ā© KDnuggets 2016 79
Drew Conway, 2010
LinkedIn Data Skills
LinkedIn has 334,000 Titles with ā€œDataā€
ā€¢ Data Analyst 60,273
ā€¢ Data Scientist 12,680
ā€¢ Database Analyst 4,357
ā€¢ Business Data Analyst 1,709
ā€¢ Senior Data Scientist 1,691
ā€¢ Sr. Data Analyst 1,131
Thanks to Lutz Finger, Director of Analytics at LinkedIn for
this custom study
Ā© KDnuggets 2016 80
LinkedIn: 4 Groups of Skills
Skills of people with ā€œDataā€ in the title grouped into dedicated clusters - using similarity of members with similar skills.
Database Management and Software
ā€¢ Access Database BTEQ Cubes Data Center Data Modeling Database Admin Database Administration Database
Design Databases DB2 Embedded SQL FastExport FastLoad MDX Memcached Microsoft SQL Server MLOAD
MongoDB Multiload MySQL NoSQL OA Framework Oracle Oracle Developer Suite Oracle Discoverer Oracle
Enterprise Manager Oracle PL/SQL Development Oracle RAC Oracle SQL Developer Performance Tuning
PhpMyAdmin PL/SQL PostgreSQL RDBMS Redis Relational Databases Replication RMAN SQL SQL Server
Management Studio SQL*Plus SQL400 SQLite Stored Procedures Sybase T-SQL Teradata Toad TPT TPUMP
Machine Learning
ā€¢ Computational Linguistics Data Visualization Information Retrieval Machine Learning Natural Language Processing
Research Design Sentiment Analysis Structural Bioinformatics Text Mining
Mathematics
ā€¢ Algebra Applied Mathematics Calculus Differential Equations Fortran Geometry Image Analysis LabVIEW Linear
Algebra Maple Mathematica Mathematical Modeling Mathematics Matlab Monte Carlo Simulation Numerical
Analysis Numerical Simulation Operations Research Partial Differential Equations Pre-Calculus Scientific Computing
Simulations Trigonometry
Statistical Analysis and Data Mining
ā€¢ A/B Testing Analytics ANOVA Business Analytics Cluster Analysis Data Analysis Data Mining Decision Trees Design
of Experiments Economic Modeling Experimental Design Factor Analysis Google Analytics JMP Linear Regression
Logistic Regression Marketing Analytics Minitab Pattern Recognition Predictive Analytics Predictive Modeling
Primary Research Questionnaire Design Questionnaires R Sampling SAS SAS Programming SDTM Secondary
Research SPSS Statistical Consulting Statistical Data Analysis Statistical Modeling Statistical Programming Statistics
Survey Research Survival Analysis Time Series Analysis Web Analytics
Ā© KDnuggets 2016 81
LinkedIn Skills
N. Skills
relating to
Data
Number of LinkedIn
Members
1 9,708,214
2 3,870,376
3 2,065,318
4 1,097,849
5 576,310
6 305,266
7 169,351
8 98,284
9 60,419
10 37,689
Ā© KDnuggets 2016 82
Data Science Skills, Updated
Ā© KDnuggets 2016
84
Database,
Coding
Skills
Domain/Business
Expertise
Database,
Coding
Skills
Domain/Business
Expertise
Data Analyst/BI Analyst
Ā© KDnuggets 2016
85
Data Analyst
Glassdoor, Apr 2016
US Avg Salary:
$60-70,000
Positions: 13,000
Database,
Coding
Skills
Data Engineer
Ā© KDnuggets 2016
86
Domain/Business
Expertise
Data Engineer
Glassdoor, Apr 2016
US Salary: $95,500
Jobs: 40,296
IngĆ©nieur ā€¦ Data
France: 5K Jobs
Machine Learning Researcher
Ā© KDnuggets 2016
87
Database,
Coding
Skills
Domain/Business
Expertise
ML Researcher
ā€œUnicornā€ Data Scientist
Ā© KDnuggets 2016
88
Database,
Coding
Skills
Domain/Business
Expertise
Glassdoor, Apr 2016
US Salary: $113,400
Jobs: 2572
France: ā‚¬43,500
Jobs: 180
ā€œUnicornā€
Data Scientist
Data Science Manager/Director
Ā© KDnuggets 2016
89
Database,
Coding
Skills
Domain/
Business
Expertise
People
Management
Skills
Data Science
Leader
Company Founder
Ā© KDnuggets 2016
90
Database,
Coding
Skills
Domain/
Business
Expertise
People
Management
Skills + Vision
Founder
Data Career Progression
Ā© KDnuggets 2016 91
BI/Data Analyst Data Engineer
Data Scientist
Machine Learning
Researcher
Data Science
Manager/Director
Company Founder/CEO
Chief Data Officer
Chief
Scientist
DATA SCIENCE
JOB TRENDS
(c) KDnuggets 2016 92
Shortage of Data Scientists?
ā€¢ McKinsey (2011): shortage by 2018 in US
ā€“ 140-190,000 people with deep analytical skills
ā€“ 1.5 M managers/analysts with the know-how to
use the analysis of big data to make effective
decisions.
Source:
www.mckinsey.com/mgi/publications/big_data/
93(c) KDnuggets 2016
Data Scientist ā€“
Sexiest Job of the 21st Century?
ā€¢ Thomas H. Davenport and D.J. Patil, (Harvard
Business Review, 2012)
94(c) KDnuggets 2016
ā€œData Scientistā€ - leading job trend
Ā© KDnuggets 2016 95
ā€œData Scientistā€ Job has grown 1,700% from 2012 to 2016
Top 5 Tech Job Trends in 2016:
Data Scientist, Devops, Puppet, PaaS, Hadoop
?
Indeed.com/jobtrends
Attention to Detail:
[Data Scientist] != ā€œData Scientistā€
Ā© KDnuggets 2016 96
Indeed.com/jobtrends
Data Scientist
ā€œData Scientistā€ = ā€œdata scientistā€
ā€œData Scientistā€ vs Statistician
Ā© KDnuggets 2016 97
Indeed.com job trends
ā€œData Scientistā€
Statistician
Data Scientist jobs on KDnuggets
Ā© KDnuggets 2016 98
0%
5%
10%
15%
20%
25%
30%
35%
40%
2010 2011 2012 2013 2014 2015
% Data Scientist jobs on KDnuggets
Including Senior, Junior, Principal, Chief DS, ā€¦
LinkedIn 25 Hot Skills
Ā© KDnuggets 2016 99
2015
2014
Data Science Future
100
Big Data
ā€¢ Next Industrial Revolution
ā€¢ Data Science is the Engine of Big Data
101(c) KDnuggets 2016
Doing Old Things Better
Application areas
ā€“ Direct marketing/Customer modeling
ā€“ Recommendations
ā€“ Fraud detection
ā€“ Security/Intelligence
ā€“ Healthcare
ā€“ ā€¦
ā€¢ Competition will level companies
102(c) KDnuggets 2016
Big Data Enables New Things !
ā€¢ Google ā€“ first big success of big data
ā€¢ Social networks (Facebook, Twitter, LinkedIn,
ā€¦) success depends on network size, i.e. big
data
ā€¢ Big Data in Health-care
ā€“ image analysis, diagnosis,
ā€“ Personalized medicine
ā€¢ Recommendations - Netflix streaming
103(c) KDnuggets 2016
New services, products, platforms
ā€¢ Image recognition ā€“ FB uses to decide what to
show users
ā€¢ Face recognition - security
ā€¢ Location-based services ā€“ Tinder
ā€¢ Big Data to Power AI and Machine Learning
ā€“ Imagine Google DeepMind, IBM Watson, Siri in
2020 ?
Ā© KDnuggets 2016 104
Gartner Hype Cycle, 2012
Ā© 2016 KDnuggets
105
Gartner Hype Cycle
Big Data
Gartner Hype Cycle, 2013
Ā© 2016 KDnuggets
106
Gartner Hype Cycle
Big Data
Gartner Hype Cycle, 2014
Ā© 2016 KDnuggets
107
Big DataData
Science
See http://diggdata.in/ which has 4 years of Gartner Hype Cycle
Gartner Hype Cycle, 2015
Ā© 2016 KDnuggets
108
Gartner Hype Cycle
Big Data
www.kdnuggets.com/2015/08/gartner-2015-hype-cycle-big-data-is-out-machine-learning-is-in.html
Citizen
Data
Science
Machine
Learning
ā€œCitizenā€ Data Science
Ā© KDnuggets 2016 110
This is Bob, our new Citizen Data Scientist.
He previously worked as a citizen dentist
and a citizen pilot.
Golden Age of Data Science,
Machine Learning
ā€¢ Amazing New Tools
ā€¢ Very Complex Algorithms are very easy to use
ā€¢ scikit-learn, iPython notebooks, etc
ā€¢ One-Click deployment of TensorFlow on AWS
with GPU
Ā© KDnuggets 2016 111
Data Science Automated ?
Ā© KDnuggets 2016 112
Expert Human Ability
Current
Computer
Ability
Data Science Automated ?
Ā© KDnuggets 2016 113
Expert Human Ability
Data Science Automated By 2025?
Ā© KDnuggets 2016 114
KDnuggets Poll in 2015:
51% of voters expect Data Science Automation to happen in 10 years or less -
www.kdnuggets.com/2015/05/data-scientists-automated-2025.html
Data Science Automation
Ā© KDnuggets 2016 115
I remember when only a Deep Learning
supercomputer could beat
me in a Data Science competition
Data Science Automation
KDnuggets: Software: Automated Data Science:
ā€¢ AutoDiscovery from ButlerScientifics
ā€¢ Automatic Business Modeler from Algolytics
ā€¢ Automatic Statistician project
ā€¢ DataRobot
ā€¢ DMWay
ā€¢ ForecastThis DSX
ā€¢ FeatureLab
ā€¢ Loom Systems,
ā€¢ machineJS: Automated machine learning
ā€¢ Quill from Narrative Science
ā€¢ SAP Predictive Analytics
ā€¢ Savvy from Yseop.
ā€¢ Skytree Machine Learning Software
ā€¢ Tree-based Pipeline Optimization Tool (TPOT)
Ā© KDnuggets 2016 116
Data Science Automation
ā€¢ New tools make Data Scientists more
productive
ā€¢ Make data results more widely available
ā€¢ Automate lower-level Data Science tasks
Ā© KDnuggets 2016 117
ā€œSoftā€ Data Science Skills
Harder to Automate
ā€¢ Curiosity
ā€¢ Intuition
ā€¢ Business Knowledge
ā€¢ Selecting a good metric
ā€¢ Posing the right question
ā€¢ Presentation Skills
Data Science ā€“ still a great profession
Ā© KDnuggets 2016 118
Questions?
KDnuggets: Analytics, Big Data, Data Science
ā€¢ Subscribe to KDnuggets News email at
www.KDnuggets.com/subscribe.html
ā€¢ Email to editor1@kdnuggets.com
ā€¢ Twitter: @kdnuggets
ā€¢ facebook.com/kdnuggets
ā€¢ LinkedIn group: KDnuggets
119Ā© KDnuggets 2016

More Related Content

What's hot

Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
Ā 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
Ā 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
Ā 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
Ā 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
Ā 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
Ā 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
Ā 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist? HackerEarth
Ā 
Data Science 101
Data Science 101Data Science 101
Data Science 101odsc
Ā 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
Ā 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
Ā 
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...Galvanize
Ā 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
Ā 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopIan Hopkinson
Ā 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
Ā 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
Ā 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_DescriptionSuman Banerjee
Ā 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
Ā 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Gregory Piatetsky-Shapiro
Ā 

What's hot (20)

Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
Ā 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Ā 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
Ā 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
Ā 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
Ā 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Ā 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
Ā 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist?
Ā 
Data Science 101
Data Science 101Data Science 101
Data Science 101
Ā 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Ā 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
Ā 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
Ā 
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist ā€“Ā By Ryan Orban, VP of Operations and Expansio...
Ā 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
Ā 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
Ā 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Ā 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
Ā 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
Ā 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Ā 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
Ā 

Viewers also liked

Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Domino Data Lab
Ā 
CANCHA MURAL
CANCHA MURALCANCHA MURAL
CANCHA MURALCmd Zapopan
Ā 
LicenƧas de obras e projetos de arquitetura no rio de janeiro
LicenƧas de obras e projetos de arquitetura no rio de janeiroLicenƧas de obras e projetos de arquitetura no rio de janeiro
LicenƧas de obras e projetos de arquitetura no rio de janeiroRobson Quintiliano
Ā 
DIARIO NTR
DIARIO NTRDIARIO NTR
DIARIO NTRCmd Zapopan
Ā 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
Ā 
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learnAutomatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learnAbhishek Thakur
Ā 
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDN
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDNGoogle Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDN
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDNMustafa Kemal TEMEL
Ā 
Branding & Marketing Firm Brand Book
Branding & Marketing Firm Brand Book Branding & Marketing Firm Brand Book
Branding & Marketing Firm Brand Book Rachael Alexander
Ā 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
Ā 
Outlook on Artificial Intelligence in the Enterprise 2016
Outlook on Artificial Intelligence in the Enterprise 2016Outlook on Artificial Intelligence in the Enterprise 2016
Outlook on Artificial Intelligence in the Enterprise 2016Narrative Science
Ā 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
Ā 
VocĆŖ pode ir muito alĆ©m do que imagina.
VocĆŖ pode ir muito alĆ©m do que imagina.VocĆŖ pode ir muito alĆ©m do que imagina.
VocĆŖ pode ir muito alĆ©m do que imagina.Valeria Dantas Machado
Ā 
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipur
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from UdaipurMartina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipur
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipurhttps://www.martinamotwani.com
Ā 
Ruby恧Roomba悒惏惃ć‚Æ恙悋
Ruby恧Roomba悒惏惃ć‚Æ恙悋Ruby恧Roomba悒惏惃ć‚Æ恙悋
Ruby恧Roomba悒惏惃ć‚Æ恙悋Yusuke Kon
Ā 
Top 28 Quotes on Simplicity
Top 28 Quotes on Simplicity Top 28 Quotes on Simplicity
Top 28 Quotes on Simplicity Margaret Molloy
Ā 
OpenACC Highlights - March
OpenACC Highlights - MarchOpenACC Highlights - March
OpenACC Highlights - MarchNVIDIA
Ā 
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion PlatformKubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion PlatformHenning Jacobs
Ā 
The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsGood Funnel
Ā 
How to Earn the Attention of Today's Buyer
How to Earn the Attention of Today's BuyerHow to Earn the Attention of Today's Buyer
How to Earn the Attention of Today's BuyerHubSpot
Ā 

Viewers also liked (20)

Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Ā 
CANCHA MURAL
CANCHA MURALCANCHA MURAL
CANCHA MURAL
Ā 
LicenƧas de obras e projetos de arquitetura no rio de janeiro
LicenƧas de obras e projetos de arquitetura no rio de janeiroLicenƧas de obras e projetos de arquitetura no rio de janeiro
LicenƧas de obras e projetos de arquitetura no rio de janeiro
Ā 
DIARIO NTR
DIARIO NTRDIARIO NTR
DIARIO NTR
Ā 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
Ā 
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learnAutomatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Ā 
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDN
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDNGoogle Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDN
Google Display Network - Google GƶrĆ¼ntĆ¼lĆ¼ Reklam Ağı - GDN
Ā 
Branding & Marketing Firm Brand Book
Branding & Marketing Firm Brand Book Branding & Marketing Firm Brand Book
Branding & Marketing Firm Brand Book
Ā 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Ā 
Outlook on Artificial Intelligence in the Enterprise 2016
Outlook on Artificial Intelligence in the Enterprise 2016Outlook on Artificial Intelligence in the Enterprise 2016
Outlook on Artificial Intelligence in the Enterprise 2016
Ā 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
Ā 
VocĆŖ pode ir muito alĆ©m do que imagina.
VocĆŖ pode ir muito alĆ©m do que imagina.VocĆŖ pode ir muito alĆ©m do que imagina.
VocĆŖ pode ir muito alĆ©m do que imagina.
Ā 
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipur
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from UdaipurMartina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipur
Martina Motwani- Freelance SEO, SMM Expert and Web Consultant from Udaipur
Ā 
Ruby恧Roomba悒惏惃ć‚Æ恙悋
Ruby恧Roomba悒惏惃ć‚Æ恙悋Ruby恧Roomba悒惏惃ć‚Æ恙悋
Ruby恧Roomba悒惏惃ć‚Æ恙悋
Ā 
Top 28 Quotes on Simplicity
Top 28 Quotes on Simplicity Top 28 Quotes on Simplicity
Top 28 Quotes on Simplicity
Ā 
L'entraƮnement cardiovasculaire
L'entraƮnement cardiovasculaireL'entraƮnement cardiovasculaire
L'entraƮnement cardiovasculaire
Ā 
OpenACC Highlights - March
OpenACC Highlights - MarchOpenACC Highlights - March
OpenACC Highlights - March
Ā 
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion PlatformKubernetes on AWS at Europe's Leading Online Fashion Platform
Kubernetes on AWS at Europe's Leading Online Fashion Platform
Ā 
The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer Interviews
Ā 
How to Earn the Attention of Today's Buyer
How to Earn the Attention of Today's BuyerHow to Earn the Attention of Today's Buyer
How to Earn the Attention of Today's Buyer
Ā 

Similar to Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro

Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014MedicReS
Ā 
Leland Lockhart - SXSW Intro to Data Science
Leland Lockhart -  SXSW Intro to Data ScienceLeland Lockhart -  SXSW Intro to Data Science
Leland Lockhart - SXSW Intro to Data ScienceLeland Lockhart, PhD
Ā 
What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...ProductCamp Boston
Ā 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
Ā 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
Ā 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
Ā 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
Ā 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
Ā 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
Ā 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
Ā 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
Ā 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big DataGregory Piatetsky-Shapiro
Ā 
Math in data
Math in dataMath in data
Math in dataJune Andrews
Ā 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
Ā 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
Ā 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
Ā 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?InfoTrust LLC
Ā 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
Ā 

Similar to Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro (20)

Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
Ā 
Leland Lockhart - SXSW Intro to Data Science
Leland Lockhart -  SXSW Intro to Data ScienceLeland Lockhart -  SXSW Intro to Data Science
Leland Lockhart - SXSW Intro to Data Science
Ā 
What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...
Ā 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
Ā 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
Ā 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
Ā 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
Ā 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Ā 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Ā 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Ā 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Ā 
Big data
Big dataBig data
Big data
Ā 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
Ā 
Math in data
Math in dataMath in data
Math in data
Ā 
Data Science Webinar
Data Science WebinarData Science Webinar
Data Science Webinar
Ā 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
Ā 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Ā 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
Ā 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
Ā 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
Ā 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Delhi Call girls
Ā 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
Ā 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
Ā 
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...amitlee9823
Ā 
CHEAP Call Girls in Rabindra Nagar (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
Ā 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
Ā 
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
Ā 
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men šŸ”malwašŸ” Escorts Ser...
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men  šŸ”malwašŸ”   Escorts Ser...āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men  šŸ”malwašŸ”   Escorts Ser...
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men šŸ”malwašŸ” Escorts Ser...amitlee9823
Ā 
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night StandCall Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Standamitlee9823
Ā 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...amitlee9823
Ā 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceDelhi Call girls
Ā 
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...amitlee9823
Ā 
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...amitlee9823
Ā 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
Ā 
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...amitlee9823
Ā 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Ā 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
Ā 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Ā 
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Ā 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Ā 
CHEAP Call Girls in Rabindra Nagar (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
Ā 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Ā 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
Ā 
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...
šŸ‘‰ Amritsar Call Girl šŸ‘‰šŸ“ž 6367187148 šŸ‘‰šŸ“ž JustšŸ“² Call Ruhi Call Girl Phone No Amri...
Ā 
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men šŸ”malwašŸ” Escorts Ser...
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men  šŸ”malwašŸ”   Escorts Ser...āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men  šŸ”malwašŸ”   Escorts Ser...
āž„šŸ” 7737669865 šŸ”ā–» malwa Call-girls in Women Seeking Men šŸ”malwašŸ” Escorts Ser...
Ā 
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night StandCall Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Nandini Layout ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Ā 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Ā 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
Ā 
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Ā 
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Ā 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Ā 
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ā˜Ž 7737669865ā˜Ž Book Your One night Stand (B...
Ā 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Ā 

Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro

  • 2. Data Science: Past, Present, and Future Gregory Piatetsky-Shapiro KDnuggets 2Ā© KDnuggets 2016 La Science des donnĆ©es: passĆ©, prĆ©sent et futur
  • 3. Predicting Behavior ā€“ Key to Survival Ā© KDnuggets 2016 3 Better prediction ā€“ better intelligence
  • 4. ā€œPredictionsā€: Astrology Ā© KDnuggets 2016 4 My May 26 Horoscope: So what if things aren't completely wonderful in your life right now? Just keep your hopes high, and your fingers crossed. ā€¦ Being with the people who make you feel good about yourself will help keep your thoughts bright, so get together with your closest friend as soon as you can.. www.astrology.com/horoscope/daily/aries.html
  • 5. ā€œPredictionsā€ : Turkish Coffee Grinds Ā© KDnuggets 2016 5 If a big chunk of the coffee grounds falls down on the saucer then it is taken as the first positive sign of your reading. ā€œTrouble and worries are leaving youā€.
  • 6. Pundits ā€œPredictionsā€ ā€¢ Nate Silver FiveThirtyEight.com prediction for Trump winning Republican nomination: ā€¢ Aug 2015: 2% ā€¢ Sep 2015: 5% ā€¢ Nov 2015: 6% ā€¢ Jan 2016: 12% ā€¢ May 2016: 99% Ā© KDnuggets 2016 6
  • 7. Desire to Predict ā€“ Deep Human Trait Ā© KDnuggets 2016 7 ā€¢ People are hard-wired to see patterns ā€¢ People want predictions ā€¢ Human intuition does not work on large scale data, for understanding probability ā€¢ Good story is essential to a convincing prediction (whether true or false) Lessons
  • 8. Data Science Data-Driven, Scientific approach to prediction and data analysis 8
  • 9. Outline ā€¢ Intro, Data Science History and Terms ā€¢ 10 Real-World Data Science Lessons ā€¢ Data Science Now: Polls & Trends ā€¢ Data Science Roles ā€¢ Data Science Job Trends ā€¢ Data Science Future Ā© KDnuggets 2016 9
  • 10. What do we call it? ā€¢ Statistics ā€¢ Data Mining ā€¢ Knowledge Discovery in Data (KDD) ā€¢ Predictive Analytics ā€¢ Data Analytics ā€¢ Data Science ā€¢ ā€¦? Ā© KDnuggets 2016 10 Core Idea: Finding Useful Patterns in Data
  • 11. Pre-history (1800-2008): Statistics Ā© KDnuggets 2016 11 From Google Ngram viewer ā€“ English language books Search case insensitive. Other languages need to be considered for full picture statistics is the biggest term in 20th century, Analytics is used increasingly thru 20th century data mining appears in late 1990s
  • 12. French Books, 1800-2008 Statistiques vs Mathematiques Ā© KDnuggets 2016 12
  • 13. ā€œData Miningā€ Surges in 1996 Ā© KDnuggets 2016 13 Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996, Eds: U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy Analytics Data Mining KDD-95, 1st Conference on Knowledge Discovery and Data Mining, Montreal Google N-grams search case insensitive, smoothing 1
  • 14. Earliest use of ā€œdata miningā€: 1962 (c) KDnuggets 2016 15 Source: Google Books After eliminating many ā€œfollowing data. Mining cost is ā€ examples which refer to Mining of minerals, and books from ā€œ1958ā€ that have a CD attached (errors in book year) The earliest ā€œdata miningā€ reference I found is
  • 15. Very Recent History Using Google Trends (c) KDnuggets 2016 16
  • 16. Google Trends, 2005-2016: After 2006, Analytics > Data Mining 17(c) KDnuggets 2016 Global ā€“ all regions
  • 17. >50% of ā€œAnalyticsā€ searches are for ā€œGoogle Analyticsā€ 18(c) KDnuggets 2016 Google Analytics introduced, Dec 2005
  • 18. Google Trends, 2005-2016 (c) KDnuggets 2016 data science analytics - Google big data data mining 2010 2012 2014
  • 19. Google Trends, 2005-2016 (c) KDnuggets 2016 2012: Analytics down, Big Data up 2015 2005
  • 20. Google Trends, 2005-2016 (c) KDnuggets 2016 2013: Data Science grows 20132005
  • 21. Google Trends: Machine Learning, Data Science, Deep Learning Ā© KDnuggets 2016 22 2009 2011 2013 2015
  • 22. Google Trends: Machine Learning Ā© KDnuggets 2016 23 Machine Learning ~ ā€œMachine Learningā€
  • 23. Google Trends: Data Science Ā© KDnuggets 2016 24 [Data Science] != ā€œData Scienceā€ Lesson: Examine assumptions carefully 2009 2011 2013 2015
  • 24. Regional Interest in ā€œData Scienceā€ in 2015 25(c) KDnuggets 2016 Google Trends Note: search for ā€œData Scienceā€ is different from [Data Science]
  • 25. KDnuggets Audience by Region, Q1 2016 Ā© KDnuggets 2016 26
  • 26. Data Science History ā€¢ < 1900 - Statistics ā€¢ 1960s Data Mining = bad activity, data ā€œdredgingā€ ā€¢ 1990 - ā€œData Miningā€ is good, surges in 1996 ā€¢ 2003 - ā€œData Miningā€ peaks (bad in press, invasion of privacy?), slowly declines, but still popular ā€¢ 2006 - Google Analytics ā€¢ 2007 - Business/Data/Predictive Analytics ā€¢ 2012 - Big Data ā€¢ 2014 - Data Science ā€¢ 2015 - Deep Learning ā€¢ 2018 - ?? 27Ā© KDnuggets 2016
  • 27. 10 Real-World Lessons from the Art & Practice of Data Science & Data Mining 28Ā© KDnuggets 2016
  • 28. Lesson 1: It is a Iterative, Circular Process Ā© KDnuggets 2016 29 Waterfall model does NOT work for Data Science
  • 29. CRISP-DM: Iterative, Circular Process Ā© KDnuggets 2016 30 See www.kdnuggets.com/2016/03/data-science-process-rediscovered.html Data Mining Process ā€“ CRISP-DM, 1998 CRISP-DM, 1998 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment
  • 30. Academic Data Science Process Ā© KDnuggets 2016 31 See www.kdnuggets.com/2016/03/data-science-process-rediscovered.html Harvard, 2013
  • 31. Machine Learning Workflow, MS Azure Ā© KDnuggets 2016 32 See www.kdnuggets.com/2016/04/developers-need-know-about-machine-learning.html blogs.msdn.microsoft.com/continuous_learning/2014/11/15/end-to-end-predictive-model-in- azureml-using-linear-regression/
  • 32. Lesson 2: Data Engineering Takes The Bulk of Time ā€¢ Building Machine Learning/Predicting Models is the key (and most fun) part, but only a small part of the whole process ā€¢ 60-80% (?) spent on Data Preparation/Engineering Ā© KDnuggets 2016 33
  • 33. Competitions are different Ā© KDnuggets 2016 34 March Machine Learning Mania 2016, Winner's Interview: 1st Place, Miguel Alomar https://twitter.com/kdnuggets/status/730417186167263232 http://blog.kaggle.com/2016/05/10/march-machine-learning- mania-2016-winners-interview-1st-place-miguel-alomar/ How #MachineLearning @Kaggle winner spent time: 35% read forums, 25% build models, 25% evaluate results 15% data preparation,
  • 34. Lesson 3: Question Assumptions Ā© KDnuggets 2016 35 Problem: Deciles not uniform Decile 1 is too rare, Decile 0 ā€“ too frequent? Why ? * Not actual data Measurement
  • 35. Mass Spectrometry Ā© KDnuggets 2016 36 Mass spectrometry (MS) is an analytical technique that ionizes chemical species and sorts the ions based on their mass to charge ratio. Can produce a large number (~ 20,000) of m/z values for a sample Goal: find biomarkers for disease, test, condition
  • 36. Question Assumptions Ā© KDnuggets 2016 37 Instead of Measurement Deciles Examine actual ranges, including 0 Nothing between 1 and 14 Value 0 is too frequent Why ? * Not actual data Measurement
  • 37. Question Assumptions Ā© KDnuggets 2016 38 Instead of Measurement Deciles Examine actual ranges, including 0 Nothing between 1 and 14 Value 0 is too frequent Why ? * Not actual data Measurement Someone added a rule to round raw measurement values below 15 to zero
  • 38. The best data scientists have one thing in common ā€“ unbelievable curiosity DJ Patil, US First Chief Data Scientist http://www.sciencefriday.com/articles/10-questions-for-the- nations-first-chief-data-scientist April 2016 39
  • 39. Lesson 4: Focus on the Right Metric - Actionable ā€¢ Consumer: Churn may depend on age, region, usage, and rate plan. Rate plan easiest to change. ā€¢ Uplift Modeling in Marketing and Politics: focus on persuadables Ā© KDnuggets 2016 40
  • 40. Right Metric: Uplift Modeling Ā© KDnuggets 2016 41 Donā€™t model if consumer will buy ā€“ Model if consumer will buy in response to an offer
  • 41. Right Metric: Uplift Modeling Ā© KDnuggets 2016 42 From Eric Siegel presentation at PAW, 2011 In Obama 2012 Campaign www.thefiscaltimes.com/Articles/2013/01/21/The-Real-Story-Behind-Obamas-Election-Victory
  • 42. Lesson 5: Be a Fox, not a Hedgehog Ā© KDnuggets 2016 43 Read Isaiah Berlin 1953 essay, The Hedgehog and the Fox A fox knows many things, but a hedgehog - one important thing.
  • 43. Lesson 5: Modeling No Free Lunch Theorem ā€“ no method is universally the best (Wolpert) In Kaggle competitions, there are 2 ways to win (Anthony Goldbloom, 2016): ā€¢ Handcrafted feature engineering ā€¢ Or Deep Learning Neural Networks www.kdnuggets.com/2016/01/anthony-goldbloom-secret-winning-kaggle-competitions.html ā€¢ XGBoost ā€“ winning method in many recent Kaggle competitions ā€¢ Ensemble methods For Structured Data (Sebastian Rashka ) ā€¢ SVM (Support Vector Machines) for smaller data ā€¢ Random Forests ā€“ more data, more automated www.kdnuggets.com/2016/04/deep-learning-vs-svm-random-forest.html Unstructured: ā€¢ Deep Learning Ā© KDnuggets 2016 44
  • 44. Lesson 6: Avoid Overfitting Ā© KDnuggets 2016 45 http://www.kdnuggets.com/2014/06/cardinal-sin-data-mining-data-science.html Many examples at http://tylervigen.com/spurious-correlations
  • 45. Avoid Overfitting Ā© KDnuggets 2016 46 ā€œIrreproducibleā€ results - BIG problem is social sciences, medicine: John P. A. Ioannidis famous paper Why Most Published Research Findings Are False (PLoS Medicine, 2005). Due to ā€¢ Small samples ā€¢ Testing too many hypotheses ā€¢ Confirmation bias (explicit or implicit) ā€¢ Poor training
  • 46. How to Avoid Overfitting ā€¢ If it is too good to be true, it probably is ā€¢ Find the simplest possible hypothesis ā€¢ Adjusting the False Discovery Rate ā€¢ Randomization Testing ā€¢ Nested cross-validation (train, test, holdout) ā€¢ Regularization (adding a penalty for complexity) Ā© KDnuggets 2016 47 www.kdnuggets.com/2014/06/cardinal-sin-data-mining-data-science.html
  • 47. Lesson 7: Tell a story ā€¢ Combine facts into a story ā€¢ Combine visual and text presentation ā€¢ Explanation gives credibility ā€¢ Dynamic / Interactive ā€¢ Examples: Kefir, Google Analytics, Quill Ā© KDnuggets 2016 48
  • 48. KEFIR (KEy FInding Reporter), 1994 ā€¢ Overview report www.kdnuggets.com/data_mining_course/kefir/overview.htm ā€¢ Inpatient admissions www.kdnuggets.com/data_mining_course/kefir/s2.htm Ā© KDnuggets 2016 49
  • 49. Quill report for KDnuggets ā€¢ Sessions Stay Flat, But Way Higher Than 12-Month Weekly Average ā€¢ Sessions remained flat compared to the prior week. The 121,040 sessions, however, were above your 85,105-session weekly average for the year. Your site's total pageviews stayed flat last week at 206,124, while pages per session grew less than a percent to 1.7. That's equal to your weekly average for the year. ā€¢ Among all your pages, Analytics, Data Mining, and Data Science had both the highest bounce rate (43%) and the most pageviews (8,734) last week. Ā© KDnuggets 2016 50
  • 50. La Diseuse de bonne aventure, Caravaggio, 1595 (Louvre) Ā© KDnuggets 2016 51 Beware of Fortune tellers!
  • 51. Lesson 8: Limits to Predicting Human Behavior? ā€¢ Inherent randomness, complexity in human behavior ā€¢ Individual predictions have limited accuracy (but can still be better than random and very useful for consumer analytics) ā€¢ Aggregate predictions (eg who will win the election) more accurate, because individual randomness cancels out (c) KDnuggets 2016 52
  • 52. Example: Netflix Prize, 2006 ā€¢ Example: Netflix Prize: the most advanced algorithms were only a few percentages better than basic algorithms Ā© KDnuggets 2016 53 See Gregory Piatetsky, ā€œBig Data: Hype & Realityā€, Harvard Business Review 2012, https://hbr.org/2012/10/big-data-hype-and-reality/
  • 53. Direct Marketing Lift: Random and Model-sorted Lists 0 10 20 30 40 50 60 70 80 90 100 5 15 25 35 45 55 65 75 85 95 Random Model 5% of random list have 5% of hits 5% of model-score ranked list have 21% of hits. Lift(5%) = 21%/5% = 4.2 Pct list CPH:CumulativePctHits
  • 54. Most lift curves are surprising similar- limit to human predictability? Study of lift curves in banking, telecom Best lift curves are similar Special point T=Target percentage Lift(T) ~ sqrt (1/T) G. Piatetsky-Shapiro, B. Masand, Estimating Campaign Benefits and Modeling Lift, in Proceedings of KDD-99 Conference, ACM Press, 1999. (c) KDnuggets 2016 55 0 2 4 6 8 10 12 14 0 5 10 15 20 25 100*T% Lift Actual lift(T) Est. lift(T)
  • 55. More recent data is more predictive! ā€¢ Real-time behavior data more predictive than historical, demographic data ā€¢ Ad retargeting Ā© KDnuggets 2016 56
  • 56. Lesson 9: Deployment & Maintenance ā€¢ Netflix Prize winning algorithm not deployed ā€¢ Technical debt of Machine Learning ā€“ (Google research.google.com/pubs/pub43146.html ) Ā© KDnuggets 2016 57 ā€¦ the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then. http://techblog.netflix.com/2012/04/netflix -recommendations-beyond-5-stars.html
  • 57. Modeling in Real World vs Kaggle ā€¢ ROI of extra accuracy vs cost of maintenance ā€¢ Is model explainable ? (legal, acceptance reasons) ā€¢ Does model discriminate on basis of race, gender,ā€¦? ā€¢ Netflix Prize algorithm which won $1M - not implemented ā€¢ In real-world, simpler is usually better Ā© KDnuggets 2016 58
  • 58. Deployment Test and Monitor ā€¢ Monitor assumptions ā€“ Do fields have the same value distributions ā€¢ Detect when model is no longer valid, needs rebuilding ā€¢ Automatic model re-build Ā© KDnuggets 2016 59
  • 59. Lesson 10: Donā€™t just predict, optimize ā€¢ Prediction is usually just one part of making a decision ā€¢ Consider cost, frequency, latency, human behavior, etc ā€¢ Goal: Optimization ā€¢ From Data Science to Decision Science Ā© KDnuggets 2016 60
  • 60. Privacy in the age of Big Data ā€¢ Privacy laws much stricter in Europe ā€¢ Individual Privacy vs Benefits for all (eg aggregated health-care data) ā€¢ Image and Face recognition (eg Facebook) ā€¢ Very hard to keep privacy with so many digital breadcrumbs ā€¢ Privacy vs Security (eg FBI vs Apple) ā€¢ Politicians are behind technology curve ā€“ researchers should help society, politicians make an informed decision Ā© KDnuggets 2016 61
  • 61. When It Is Ethical To Analyze A Particular Dataset? 62Ā© KDnuggets 2016
  • 62. Data Ethics Golden Rule Donā€™t do with someone else data what you donā€™t want done with your data Ā© KDnuggets 2016 63
  • 63. Data Science Now What, Where, How KDnuggets Polls Findings www.KDnuggets.com/polls/ 64(c) KDnuggets 2016
  • 64. 65Ā© KDnuggets 2016 www.kdnuggets.com/2016/01/poll-analytics-data-mining-data-science-applied-2015.html Where did you apply Analytics, Data Mining, Data Science ? Avg. Number of Industries 2.7 Most Popular: - CRM - Finance - Banking - Health Care - Science - e-commerce Highest growth in: Games, 121% Entertainment / Music 74% Social Good/Non-profit, 68% Finance, 42% Education, 30%
  • 65. Data Types Analyzed/Mined 66Ā© KDnuggets 2016 www.kdnuggets.com/polls/2014/data-types-sources-analyzed.html Most popular: - Table data - Time series - Text - itemsets/transactions Most growing: - music/audio - JSON
  • 66. Largest Dataset Analyzed? Ā© KDnuggets 2016 67 www.kdnuggets.com/2015/08/largest-dataset-analyzed-more-gigabytes-petabytes.html
  • 67. Largest Dataset Analyzed? Ā© KDnuggets 2016 68 Python swallowed an Elephant? Antoine de Saint-Exupery
  • 68. Largest Dataset Analyzed? Ā© KDnuggets 2016 69 Big Data Miners ā€“ elite group . www.kdnuggets.com/2015/08/largest-dataset-analyzed-more-gigabytes-petabytes.html Median in 11-100 GB range, slight increase.
  • 69. Largest Dataset Analyzed by Region Ā© KDnuggets 2016 70 Big Data Miners: TeraBytes and Petabytes 10-25%
  • 70. 4 Main Languages of Data Science Ā© KDnuggets 2016 71 www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
  • 71. 4 Main Languages of Data Science, 2 Ā© KDnuggets 2016 72
  • 72. R vs Python Ā© KDnuggets 2016 74 http://www.kdnuggets.com/2015/07/poll-primary-analytics-language-r-python.html Surprising Stability: 88% of R users stayed with R and 91% stayed with Python. % of primary R , Python users up, while % Other or None down.
  • 73. Data Science Roles 77(c) KDnuggets 2016
  • 74. Data Science Roles ā€¢ Data Analyst ā€¢ (Big) Data Engineer ā€¢ Data Scientist ā€¢ Machine Learning Researcher ā€¢ Data Science Manager/Director ā€¢ Chief Data Officer ā€¢ Company Founder Ā© KDnuggets 2016 78
  • 75. Data Science Venn Diagram, 2010 Ā© KDnuggets 2016 79 Drew Conway, 2010
  • 76. LinkedIn Data Skills LinkedIn has 334,000 Titles with ā€œDataā€ ā€¢ Data Analyst 60,273 ā€¢ Data Scientist 12,680 ā€¢ Database Analyst 4,357 ā€¢ Business Data Analyst 1,709 ā€¢ Senior Data Scientist 1,691 ā€¢ Sr. Data Analyst 1,131 Thanks to Lutz Finger, Director of Analytics at LinkedIn for this custom study Ā© KDnuggets 2016 80
  • 77. LinkedIn: 4 Groups of Skills Skills of people with ā€œDataā€ in the title grouped into dedicated clusters - using similarity of members with similar skills. Database Management and Software ā€¢ Access Database BTEQ Cubes Data Center Data Modeling Database Admin Database Administration Database Design Databases DB2 Embedded SQL FastExport FastLoad MDX Memcached Microsoft SQL Server MLOAD MongoDB Multiload MySQL NoSQL OA Framework Oracle Oracle Developer Suite Oracle Discoverer Oracle Enterprise Manager Oracle PL/SQL Development Oracle RAC Oracle SQL Developer Performance Tuning PhpMyAdmin PL/SQL PostgreSQL RDBMS Redis Relational Databases Replication RMAN SQL SQL Server Management Studio SQL*Plus SQL400 SQLite Stored Procedures Sybase T-SQL Teradata Toad TPT TPUMP Machine Learning ā€¢ Computational Linguistics Data Visualization Information Retrieval Machine Learning Natural Language Processing Research Design Sentiment Analysis Structural Bioinformatics Text Mining Mathematics ā€¢ Algebra Applied Mathematics Calculus Differential Equations Fortran Geometry Image Analysis LabVIEW Linear Algebra Maple Mathematica Mathematical Modeling Mathematics Matlab Monte Carlo Simulation Numerical Analysis Numerical Simulation Operations Research Partial Differential Equations Pre-Calculus Scientific Computing Simulations Trigonometry Statistical Analysis and Data Mining ā€¢ A/B Testing Analytics ANOVA Business Analytics Cluster Analysis Data Analysis Data Mining Decision Trees Design of Experiments Economic Modeling Experimental Design Factor Analysis Google Analytics JMP Linear Regression Logistic Regression Marketing Analytics Minitab Pattern Recognition Predictive Analytics Predictive Modeling Primary Research Questionnaire Design Questionnaires R Sampling SAS SAS Programming SDTM Secondary Research SPSS Statistical Consulting Statistical Data Analysis Statistical Modeling Statistical Programming Statistics Survey Research Survival Analysis Time Series Analysis Web Analytics Ā© KDnuggets 2016 81
  • 78. LinkedIn Skills N. Skills relating to Data Number of LinkedIn Members 1 9,708,214 2 3,870,376 3 2,065,318 4 1,097,849 5 576,310 6 305,266 7 169,351 8 98,284 9 60,419 10 37,689 Ā© KDnuggets 2016 82
  • 79. Data Science Skills, Updated Ā© KDnuggets 2016 84 Database, Coding Skills Domain/Business Expertise
  • 80. Database, Coding Skills Domain/Business Expertise Data Analyst/BI Analyst Ā© KDnuggets 2016 85 Data Analyst Glassdoor, Apr 2016 US Avg Salary: $60-70,000 Positions: 13,000
  • 81. Database, Coding Skills Data Engineer Ā© KDnuggets 2016 86 Domain/Business Expertise Data Engineer Glassdoor, Apr 2016 US Salary: $95,500 Jobs: 40,296 IngĆ©nieur ā€¦ Data France: 5K Jobs
  • 82. Machine Learning Researcher Ā© KDnuggets 2016 87 Database, Coding Skills Domain/Business Expertise ML Researcher
  • 83. ā€œUnicornā€ Data Scientist Ā© KDnuggets 2016 88 Database, Coding Skills Domain/Business Expertise Glassdoor, Apr 2016 US Salary: $113,400 Jobs: 2572 France: ā‚¬43,500 Jobs: 180 ā€œUnicornā€ Data Scientist
  • 84. Data Science Manager/Director Ā© KDnuggets 2016 89 Database, Coding Skills Domain/ Business Expertise People Management Skills Data Science Leader
  • 85. Company Founder Ā© KDnuggets 2016 90 Database, Coding Skills Domain/ Business Expertise People Management Skills + Vision Founder
  • 86. Data Career Progression Ā© KDnuggets 2016 91 BI/Data Analyst Data Engineer Data Scientist Machine Learning Researcher Data Science Manager/Director Company Founder/CEO Chief Data Officer Chief Scientist
  • 87. DATA SCIENCE JOB TRENDS (c) KDnuggets 2016 92
  • 88. Shortage of Data Scientists? ā€¢ McKinsey (2011): shortage by 2018 in US ā€“ 140-190,000 people with deep analytical skills ā€“ 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions. Source: www.mckinsey.com/mgi/publications/big_data/ 93(c) KDnuggets 2016
  • 89. Data Scientist ā€“ Sexiest Job of the 21st Century? ā€¢ Thomas H. Davenport and D.J. Patil, (Harvard Business Review, 2012) 94(c) KDnuggets 2016
  • 90. ā€œData Scientistā€ - leading job trend Ā© KDnuggets 2016 95 ā€œData Scientistā€ Job has grown 1,700% from 2012 to 2016 Top 5 Tech Job Trends in 2016: Data Scientist, Devops, Puppet, PaaS, Hadoop ? Indeed.com/jobtrends
  • 91. Attention to Detail: [Data Scientist] != ā€œData Scientistā€ Ā© KDnuggets 2016 96 Indeed.com/jobtrends Data Scientist ā€œData Scientistā€ = ā€œdata scientistā€
  • 92. ā€œData Scientistā€ vs Statistician Ā© KDnuggets 2016 97 Indeed.com job trends ā€œData Scientistā€ Statistician
  • 93. Data Scientist jobs on KDnuggets Ā© KDnuggets 2016 98 0% 5% 10% 15% 20% 25% 30% 35% 40% 2010 2011 2012 2013 2014 2015 % Data Scientist jobs on KDnuggets Including Senior, Junior, Principal, Chief DS, ā€¦
  • 94. LinkedIn 25 Hot Skills Ā© KDnuggets 2016 99 2015 2014
  • 96. Big Data ā€¢ Next Industrial Revolution ā€¢ Data Science is the Engine of Big Data 101(c) KDnuggets 2016
  • 97. Doing Old Things Better Application areas ā€“ Direct marketing/Customer modeling ā€“ Recommendations ā€“ Fraud detection ā€“ Security/Intelligence ā€“ Healthcare ā€“ ā€¦ ā€¢ Competition will level companies 102(c) KDnuggets 2016
  • 98. Big Data Enables New Things ! ā€¢ Google ā€“ first big success of big data ā€¢ Social networks (Facebook, Twitter, LinkedIn, ā€¦) success depends on network size, i.e. big data ā€¢ Big Data in Health-care ā€“ image analysis, diagnosis, ā€“ Personalized medicine ā€¢ Recommendations - Netflix streaming 103(c) KDnuggets 2016
  • 99. New services, products, platforms ā€¢ Image recognition ā€“ FB uses to decide what to show users ā€¢ Face recognition - security ā€¢ Location-based services ā€“ Tinder ā€¢ Big Data to Power AI and Machine Learning ā€“ Imagine Google DeepMind, IBM Watson, Siri in 2020 ? Ā© KDnuggets 2016 104
  • 100. Gartner Hype Cycle, 2012 Ā© 2016 KDnuggets 105 Gartner Hype Cycle Big Data
  • 101. Gartner Hype Cycle, 2013 Ā© 2016 KDnuggets 106 Gartner Hype Cycle Big Data
  • 102. Gartner Hype Cycle, 2014 Ā© 2016 KDnuggets 107 Big DataData Science See http://diggdata.in/ which has 4 years of Gartner Hype Cycle
  • 103. Gartner Hype Cycle, 2015 Ā© 2016 KDnuggets 108 Gartner Hype Cycle Big Data www.kdnuggets.com/2015/08/gartner-2015-hype-cycle-big-data-is-out-machine-learning-is-in.html Citizen Data Science Machine Learning
  • 104. ā€œCitizenā€ Data Science Ā© KDnuggets 2016 110 This is Bob, our new Citizen Data Scientist. He previously worked as a citizen dentist and a citizen pilot.
  • 105. Golden Age of Data Science, Machine Learning ā€¢ Amazing New Tools ā€¢ Very Complex Algorithms are very easy to use ā€¢ scikit-learn, iPython notebooks, etc ā€¢ One-Click deployment of TensorFlow on AWS with GPU Ā© KDnuggets 2016 111
  • 106. Data Science Automated ? Ā© KDnuggets 2016 112 Expert Human Ability Current Computer Ability
  • 107. Data Science Automated ? Ā© KDnuggets 2016 113 Expert Human Ability
  • 108. Data Science Automated By 2025? Ā© KDnuggets 2016 114 KDnuggets Poll in 2015: 51% of voters expect Data Science Automation to happen in 10 years or less - www.kdnuggets.com/2015/05/data-scientists-automated-2025.html
  • 109. Data Science Automation Ā© KDnuggets 2016 115 I remember when only a Deep Learning supercomputer could beat me in a Data Science competition
  • 110. Data Science Automation KDnuggets: Software: Automated Data Science: ā€¢ AutoDiscovery from ButlerScientifics ā€¢ Automatic Business Modeler from Algolytics ā€¢ Automatic Statistician project ā€¢ DataRobot ā€¢ DMWay ā€¢ ForecastThis DSX ā€¢ FeatureLab ā€¢ Loom Systems, ā€¢ machineJS: Automated machine learning ā€¢ Quill from Narrative Science ā€¢ SAP Predictive Analytics ā€¢ Savvy from Yseop. ā€¢ Skytree Machine Learning Software ā€¢ Tree-based Pipeline Optimization Tool (TPOT) Ā© KDnuggets 2016 116
  • 111. Data Science Automation ā€¢ New tools make Data Scientists more productive ā€¢ Make data results more widely available ā€¢ Automate lower-level Data Science tasks Ā© KDnuggets 2016 117
  • 112. ā€œSoftā€ Data Science Skills Harder to Automate ā€¢ Curiosity ā€¢ Intuition ā€¢ Business Knowledge ā€¢ Selecting a good metric ā€¢ Posing the right question ā€¢ Presentation Skills Data Science ā€“ still a great profession Ā© KDnuggets 2016 118
  • 113. Questions? KDnuggets: Analytics, Big Data, Data Science ā€¢ Subscribe to KDnuggets News email at www.KDnuggets.com/subscribe.html ā€¢ Email to editor1@kdnuggets.com ā€¢ Twitter: @kdnuggets ā€¢ facebook.com/kdnuggets ā€¢ LinkedIn group: KDnuggets 119Ā© KDnuggets 2016

Editor's Notes

  1. Churn: best algorithms for predicting churn have lift of 5-7 ā€“ 5-7 times better than random. Behavioral advertising: 2-3% CTR ā€“ 10 times better than random
  2. Future is Bright for Big Data, but need use caution when evaluating claims