SlideShare a Scribd company logo
1 of 26
Download to read offline
Laws and Limits of Data Science: 
The Next Decade 
Michael L. Brodie
2 
Big Data is 
Opening the door to …
3 
Grand Opportunities: 
Accelerating Scientific Discovery …
4 
Grand Challenges: 
Many – efficacy, efficiency, …
What is Big Data? 
• Defining Big Data constrains this emerging phenomena 
• Since Big Data is not 
— About data, but a problem solving ecosystem 
— A discipline, but a multidisciplinary sub-domain of most disciplines* 
• What matters is what we will do with Big Data 
• Big Data is opening the door to profound change in 
— Processing 
— Thinking 
• Let’s use the potential of profound change to understand Big Data 
5 
* 
“transforma,ve 
… 
changing 
academia 
(… 
emerged 
.. 
on 
the 
cri,cal 
path 
for 
their 
sub-­‐discipline)” 
and 
is 
changing 
society” 
Michael 
Jordan.
Starting to Understand Big Data 
• Listen to Data 
— Hypothesis generation ! overcome limits of human cognition* 
• Multiple, Simultaneous Perspectives 
— Ensemble models ! Accelerating Scientific Discovery* 
• And many more … 
6 
* Necessary condition: human-guidance
7 
Big Data is in its infancy 
With at least decade-long challenges
Outline 
• Big Picture: Why and What 
• Grand Opportunities 
• Grand Challenges 
— Efficacy, amongst many 
• Laws and Limits of Data Science
Big Picture 
Scientific Method 
Hypothesis 
Phenomenon 
Causality 
Experiment Model
Big Picture: 
Why & What 
Experiment Model What 
(Big Data) 
Why 
(Empiricism) 
Correlation: 
What might occur 
Causation: 
Why it occurs 
Phenomenon
Why: Scientific Method and the Search for Causation 
History of Science and the Scientific Method 
Mature Disciplines: Empiricism, Clinical Studies, Drug Discovery 
The Holy Grail of science is to identify accurate causality. 
Empirical, clinical trial, and drug discovery methods take time +100 years 
Three Ages of Medicine [The Remedy: Goetz] 
Free-for-All: 1850s–1940s 
Rise of Trials: 1940s–2010s 
Beyond the Lab: Post-2010
What: Models and the Search for Meaningful 
Correlations 
• History of Modelling: mathematics, sciences, computing, … 
• Disciplines 
" Mature (theory-driven): math, physics, statistics, … 
" Emerging (data-driven): data mining, machine learning, neural networks, support 
vector machines, … 
The Holy Grail of data-intensive discovery is correlations that are 
accurate and 
reliable. 
meaningful. 
The Holy Grail of data-intensive discovery is correlations that are meaningful. 
Correlation does not imply causation 
• Methodologies 
" Mature: 100s of years 
" Emerging: at least a decade
Big Data 
GRAND OPPORTUNITIES
Accelerating Scientific Discovery 
Hypotheses 
Experiment Model 
Correlations 
Why: 
Causation 
What: 
Correlation 
Theory Driven Data Driven
Accelerating Scientific Discovery 
Hypotheses 
Experiment Model 
Correlations 
Why: 
Causation 
Theory Driven Data Driven 
Baylor Watson 
Scientists 
What: 
Correlation 
Wonderful 
Use Case
Grand Challenges 
• Big Data is in its infancy: 10+ year evolution 
" Efficiency: expression/language ! execution (stack) 
" Open Data: data use/reuse / sharing 
" Efficacy 
“major engineering and mathematical challenge, one 
that will not be solved by just gluing together a few 
existing ideas from statistics, optimization, databases 
and computer systems.” Michael Jordan
“wrt to Big Data we’re now at the what are the 
principles? point in time”. Michael Jordan
What is Data Science @ Scale? 
Data Science @ scale is to data-intensive discovery as 
The Scientific Method is to scientific discovery 
Reframe Empiricism* 
" Data Science is the data component of the Scientific Method for data 
" Concepts, tools, and techniques for data-intensive discovery 
• Data-intensive discovery = virtual experiment 
" Laws and Limits of Data Science 
* With Dr. Jennie Duggan, MIT & Northwestern University
First Law of Data Science 
Meaning of a correlation requires empirical verification 
What is seldom enough 
Why is not always necessary 
Best Practice #1: Efficacy-driven data discovery 
(Efficacy before efficiency)
Second Law of Data Science* 
Causality can be determined from correlations only by 
community accepted mechanisms and metrics**, e.g., 
empiricism. 
* With Gregory Piatetsky-Shapiro, KDNuggets 
** for What and Why
Limits of Data Science 
We do not know where our concepts, tools, and 
techniques break on massive data sets! 
Caution: Big Data Winter Potential (Michael Jordan) 
Best Practice #2: Experiment + Error bars everywhere 
" Common Practice: not so much 
Best Practice #3: Machine-driven, human guided 
" Common Practice: not so much
Best Practice Not So Common* 
• BP1: Efficacy-driven data discovery 
" Best eScience, Journalism, Economics, Computational X, … 
" Big Data not so much (<5%) 
• BP2: Experiment + Error bars everywhere 
" Above + Best Data Scientists (~5%, w/scientific, ML, … training) 
" Big Data (<5%): Customers don’t ask; data scientists don’t practice 
• BP3: Machine-driven, human guided 
" ~5% strict;95% not so much, e.g., ~60 Data Curation products 
" 50% partial: supervised / trained 
• Example: based on the above Laws and Best Practices 
*Personal un-scientific study, limited data, yet so unbiased and oh so true
Laws of Data Science Less So … 
1st Correlations ≠ Causation 
Common confusion in science*, more in Data Science, even more in business 
2nd Causality (meaning) requires verification by community-accepted norms 
Cornerstone of Science, hopefully emerging in Data Science** 
*Richard Feynman, 1974 
** If #1 is rare, #2 is more so
Conclusions 
• Big Data is in its infancy and is opening the door to … 
• Grand Opportunities 
• Grand Challenges 
• 10+ year evolution 
• Data Science ~= Scientific Method For Data 
• Laws of Data Science 
1 Correlations must be verified 
2 Verification relative to community-accepted norms 
• Data Science Best Practices 
1 Efficacy-driven discovery 
2 Experiment + Error Bars everywhere 
3 Machine-Driven – Human Guided 
• Limit of Data Science: we do not know where our tools break
25
26

More Related Content

What's hot

Science20brussels osimo april2013
Science20brussels osimo april2013Science20brussels osimo april2013
Science20brussels osimo april2013osimod
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger Hoerl
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save livesDorothy Bishop
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Space Situational Awareness Forum - U.S Air Force Presentation
Space Situational Awareness Forum - U.S Air Force PresentationSpace Situational Awareness Forum - U.S Air Force Presentation
Space Situational Awareness Forum - U.S Air Force PresentationSpace_Situational_Awareness
 
Altmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implicationsAltmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implicationsKR_Barker
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
 
Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015MedicReS
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data scienceJoe Keating
 
Adil E. Shamoo MedicReS World Congress 2015
 Adil E. Shamoo MedicReS World Congress 2015 Adil E. Shamoo MedicReS World Congress 2015
Adil E. Shamoo MedicReS World Congress 2015MedicReS
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress MedicReS
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Altmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsAltmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsCMHSL
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Rebecca E. Cooney MedicReS World Congress 2015
Rebecca E. Cooney MedicReS World Congress 2015Rebecca E. Cooney MedicReS World Congress 2015
Rebecca E. Cooney MedicReS World Congress 2015MedicReS
 

What's hot (20)

La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
La ricerca scientifica nell'era dei Big Data - Sabina LeonelliLa ricerca scientifica nell'era dei Big Data - Sabina Leonelli
La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
 
Science20brussels osimo april2013
Science20brussels osimo april2013Science20brussels osimo april2013
Science20brussels osimo april2013
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Space Situational Awareness Forum - U.S Air Force Presentation
Space Situational Awareness Forum - U.S Air Force PresentationSpace Situational Awareness Forum - U.S Air Force Presentation
Space Situational Awareness Forum - U.S Air Force Presentation
 
Altmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implicationsAltmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implications
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Hands-on Introduction to Machine Learning
Hands-on Introduction to Machine LearningHands-on Introduction to Machine Learning
Hands-on Introduction to Machine Learning
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015Zubin Master MedicReS World Congress 2015
Zubin Master MedicReS World Congress 2015
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data science
 
Adil E. Shamoo MedicReS World Congress 2015
 Adil E. Shamoo MedicReS World Congress 2015 Adil E. Shamoo MedicReS World Congress 2015
Adil E. Shamoo MedicReS World Congress 2015
 
Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress Cemal H. Guvercin MedicReS 5th World Congress
Cemal H. Guvercin MedicReS 5th World Congress
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Altmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsAltmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the Implications
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Rebecca E. Cooney MedicReS World Congress 2015
Rebecca E. Cooney MedicReS World Congress 2015Rebecca E. Cooney MedicReS World Congress 2015
Rebecca E. Cooney MedicReS World Congress 2015
 

Similar to Laws and Limits of Data Science

Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonOpenAIRE
 
Presentation1a paul carpenter
Presentation1a paul carpenterPresentation1a paul carpenter
Presentation1a paul carpenterYinglingV
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsrobkitchin
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data LifecycleRichard Ferrers
 
Johnson & Johnson Presentation 25 Sept 2013
Johnson & Johnson Presentation 25 Sept 2013Johnson & Johnson Presentation 25 Sept 2013
Johnson & Johnson Presentation 25 Sept 2013Craig Rispin
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataSylvia Ogweng
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
 
Ethics, Research & Society
Ethics, Research & SocietyEthics, Research & Society
Ethics, Research & SocietyGuillaume Dumas
 
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).pptSanjayAcharaya
 

Similar to Laws and Limits of Data Science (20)

Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey Boulton
 
Presentation1a paul carpenter
Presentation1a paul carpenterPresentation1a paul carpenter
Presentation1a paul carpenter
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shifts
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data Lifecycle
 
Johnson & Johnson Presentation 25 Sept 2013
Johnson & Johnson Presentation 25 Sept 2013Johnson & Johnson Presentation 25 Sept 2013
Johnson & Johnson Presentation 25 Sept 2013
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big Data
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
Ethics, Research & Society
Ethics, Research & SocietyEthics, Research & Society
Ethics, Research & Society
 
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Laws and Limits of Data Science

  • 1. Laws and Limits of Data Science: The Next Decade Michael L. Brodie
  • 2. 2 Big Data is Opening the door to …
  • 3. 3 Grand Opportunities: Accelerating Scientific Discovery …
  • 4. 4 Grand Challenges: Many – efficacy, efficiency, …
  • 5. What is Big Data? • Defining Big Data constrains this emerging phenomena • Since Big Data is not — About data, but a problem solving ecosystem — A discipline, but a multidisciplinary sub-domain of most disciplines* • What matters is what we will do with Big Data • Big Data is opening the door to profound change in — Processing — Thinking • Let’s use the potential of profound change to understand Big Data 5 * “transforma,ve … changing academia (… emerged .. on the cri,cal path for their sub-­‐discipline)” and is changing society” Michael Jordan.
  • 6. Starting to Understand Big Data • Listen to Data — Hypothesis generation ! overcome limits of human cognition* • Multiple, Simultaneous Perspectives — Ensemble models ! Accelerating Scientific Discovery* • And many more … 6 * Necessary condition: human-guidance
  • 7. 7 Big Data is in its infancy With at least decade-long challenges
  • 8. Outline • Big Picture: Why and What • Grand Opportunities • Grand Challenges — Efficacy, amongst many • Laws and Limits of Data Science
  • 9. Big Picture Scientific Method Hypothesis Phenomenon Causality Experiment Model
  • 10. Big Picture: Why & What Experiment Model What (Big Data) Why (Empiricism) Correlation: What might occur Causation: Why it occurs Phenomenon
  • 11. Why: Scientific Method and the Search for Causation History of Science and the Scientific Method Mature Disciplines: Empiricism, Clinical Studies, Drug Discovery The Holy Grail of science is to identify accurate causality. Empirical, clinical trial, and drug discovery methods take time +100 years Three Ages of Medicine [The Remedy: Goetz] Free-for-All: 1850s–1940s Rise of Trials: 1940s–2010s Beyond the Lab: Post-2010
  • 12. What: Models and the Search for Meaningful Correlations • History of Modelling: mathematics, sciences, computing, … • Disciplines " Mature (theory-driven): math, physics, statistics, … " Emerging (data-driven): data mining, machine learning, neural networks, support vector machines, … The Holy Grail of data-intensive discovery is correlations that are accurate and reliable. meaningful. The Holy Grail of data-intensive discovery is correlations that are meaningful. Correlation does not imply causation • Methodologies " Mature: 100s of years " Emerging: at least a decade
  • 13. Big Data GRAND OPPORTUNITIES
  • 14. Accelerating Scientific Discovery Hypotheses Experiment Model Correlations Why: Causation What: Correlation Theory Driven Data Driven
  • 15. Accelerating Scientific Discovery Hypotheses Experiment Model Correlations Why: Causation Theory Driven Data Driven Baylor Watson Scientists What: Correlation Wonderful Use Case
  • 16. Grand Challenges • Big Data is in its infancy: 10+ year evolution " Efficiency: expression/language ! execution (stack) " Open Data: data use/reuse / sharing " Efficacy “major engineering and mathematical challenge, one that will not be solved by just gluing together a few existing ideas from statistics, optimization, databases and computer systems.” Michael Jordan
  • 17. “wrt to Big Data we’re now at the what are the principles? point in time”. Michael Jordan
  • 18. What is Data Science @ Scale? Data Science @ scale is to data-intensive discovery as The Scientific Method is to scientific discovery Reframe Empiricism* " Data Science is the data component of the Scientific Method for data " Concepts, tools, and techniques for data-intensive discovery • Data-intensive discovery = virtual experiment " Laws and Limits of Data Science * With Dr. Jennie Duggan, MIT & Northwestern University
  • 19. First Law of Data Science Meaning of a correlation requires empirical verification What is seldom enough Why is not always necessary Best Practice #1: Efficacy-driven data discovery (Efficacy before efficiency)
  • 20. Second Law of Data Science* Causality can be determined from correlations only by community accepted mechanisms and metrics**, e.g., empiricism. * With Gregory Piatetsky-Shapiro, KDNuggets ** for What and Why
  • 21. Limits of Data Science We do not know where our concepts, tools, and techniques break on massive data sets! Caution: Big Data Winter Potential (Michael Jordan) Best Practice #2: Experiment + Error bars everywhere " Common Practice: not so much Best Practice #3: Machine-driven, human guided " Common Practice: not so much
  • 22. Best Practice Not So Common* • BP1: Efficacy-driven data discovery " Best eScience, Journalism, Economics, Computational X, … " Big Data not so much (<5%) • BP2: Experiment + Error bars everywhere " Above + Best Data Scientists (~5%, w/scientific, ML, … training) " Big Data (<5%): Customers don’t ask; data scientists don’t practice • BP3: Machine-driven, human guided " ~5% strict;95% not so much, e.g., ~60 Data Curation products " 50% partial: supervised / trained • Example: based on the above Laws and Best Practices *Personal un-scientific study, limited data, yet so unbiased and oh so true
  • 23. Laws of Data Science Less So … 1st Correlations ≠ Causation Common confusion in science*, more in Data Science, even more in business 2nd Causality (meaning) requires verification by community-accepted norms Cornerstone of Science, hopefully emerging in Data Science** *Richard Feynman, 1974 ** If #1 is rare, #2 is more so
  • 24. Conclusions • Big Data is in its infancy and is opening the door to … • Grand Opportunities • Grand Challenges • 10+ year evolution • Data Science ~= Scientific Method For Data • Laws of Data Science 1 Correlations must be verified 2 Verification relative to community-accepted norms • Data Science Best Practices 1 Efficacy-driven discovery 2 Experiment + Error Bars everywhere 3 Machine-Driven – Human Guided • Limit of Data Science: we do not know where our tools break
  • 25. 25
  • 26. 26