SlideShare a Scribd company logo
Data Science Isn’t a Fad
Let’s Keep It That Way


     Presentation to Research Triangle Analysts
                 February 21, 2013
                www.rtpanalysts.org
Data Science: Buyer Beware
               Forbes article: Data Science:
                 Buyer Beware “This is a
                   management fad.”

Me: I’ve been doing this for 16 years. It
  isn’t a fad. You keep renaming it.


Result: Great conversation, and another Forbes article.
Obligatory Definition
 Wikipedia: Data science is a novel term that is often
 used interchangeably with competitive intelligence or
 business analytics, although it is becoming more
 common. Data science seeks to use all available and
 relevant data to effectively tell a story that can be easily
 understood by non-practitioners.
 Sexiest job of the 21st century. --Thomas H. Davenport
 and DJ Patil
 Pseudo science performed by rock-star unicorns. --
 The Internet
Data SCIENCE
Data: emphasizes the transformation of raw
information into actionable results.
Science: emphasizes the commitment to verifiable and
repeatable process.
Data Science: The discipline of transforming raw
information into actionable results in a manner that is
verifiable and repeatable.


“Information is cheap. Meaning is expensive.”
           --George Dyson, 2011
Data Science Is....

    Google’s
  Search Engine        Fraud Framework




Spotfire Operations   Analytics in Production
    Analytics
Once upon a time...	
Information was VERY expensive.
Data Science and Statistics

 The statistical methods you learn as an undergraduate
 were optimized to make efficient use of small data
 samples.
 Data is a unique resource: The more you have, the
 more valuable each individual piece becomes.
 Provided you can extract meaning from the
 information.
“Big Data” = New Problems

Dynamic environment: relationships change.
Constant sampling means you will have false positives.
Large numbers of variables and data points means you
have to rely on automated tools.
Not all automated tools are created equal.
Cue Shameless Plug....
              John Sall
   Co-Founder & EVP of SAS Institute
           Director of JMP

     “From Big Data to Big Statistics”
           March 21, 6:30pm
           Louie and Charlies
        www.louieandcharlies.com
Raw Information to Actionable
Results


 The results of the analysis must answer the business
 question(s).
 The results of the analysis must provide a course of
 action.
Actionable


Click on this link.   Check this person’s file.



Stop/encourage this
                         Look at this pattern.
      activity.
Verifiable

 The assumptions from the underlying methods must be
 stated and shown to be true.
 Outlier cases must be documented and handled
 effectively.
   Different analysis, error table, excluded point.
Y = 3.0017 + 0.499X
                                 Corr = 0.8199


Anscombe’s Quartet
Linear regression assumes a straight line
relationship and normally distributed errors.
Y = 3.0017 + 0.499X
                                  Corr = 0.8199


Anscombe’s Quartet
This line has the same statistics as the one
before. But the relationship is not a straight line.
Y = 3.0017 + 0.499X
           Corr = 0.8199




Anscombe’s Quartet
An outlier is affecting the equation.
Y = 3.0017 + 0.499X
                                 Corr = 0.8199



Anscombe’s Quartet
One outlier drives the entire relationship.
Repeatable


When I do this again with data that meets the stated
assumptions, I should get the same answers.
Small changes in the data should NOT break the
algorithm.



          Easier said than done.
Making Results Repeatable
Automated verification of assumptions.
Good coding practices (no matter the language).
Out of sample testing.
  Do the same analysis with similar data.
Failure conditions
  Document what should happen when bad data goes
  into the algorithm.
  Run the algorithm with bad data.
This is the endpoint of the analysis.
Companies who hire data scientists use the results
to make decisions.
Repeatable: Closing the
Loop With Users
It is the data scientist’s responsibility to make sure the
results are used effectively.
Involve users at the beginning of the process.
Use iterative feedback to make sure results are:
  Actionable
  Verifiable
  Repeatable.
Why Bother?
           “Beware the Big Errors of Big Data”


  “Big Data is Falling into the
   Trough of Disillusionment”


         “If you asked me to describe the rising
          philosophy of the day, I would say it’s
                       data-sim...”
Really,Then, Why Bother?

     “...the Oakland A's' front
office ...fielded a team that could
  compete successfully against
   richer competitors in Major
     League Baseball (MLB).”
Because What We Do Matters
         “Refugees United...uses mobile and
        web technologies to help refugees find
              their missing loved ones.”
                    --datakind.org


      “Predictive analytics is saving lives and
        taxpayer dollars in New York City.”
    --Alex Howard, Michael Flowers interview
That’s Enough From Me
What do you think about me?


               mthielbar@gmail.com

          melindathielbar.wordpress.com
               info@rtpanalysts.org
                    THANK YOU!
All photos the property of their respective owners.

More Related Content

What's hot

Introduction to Machine Learning
Introduction to Machine Learning Introduction to Machine Learning
Introduction to Machine Learning
Rupak Roy
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
Vishal Patel
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
Edureka!
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Tamir Taha
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
Amit Sharma
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
Trieu Nguyen
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Edgar Alejandro Villegas
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
Amit Sharma
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerful
Stat Analytica
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
InfoTrust LLC
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
SunView Software, Inc.
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
Displayr
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
Displayr
 
Association Mining
Association Mining Association Mining
Association Mining
Edureka!
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
Yusuke Kaneko
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
VijayMohan Vasu
 
Slides for automate or die (presentation)
Slides for automate or die (presentation)Slides for automate or die (presentation)
Slides for automate or die (presentation)
Displayr
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
Manojit Nandi
 
What is Data Science actually is?
What is Data Science actually is?What is Data Science actually is?
What is Data Science actually is?
Rupak Roy
 

What's hot (20)

Introduction to Machine Learning
Introduction to Machine Learning Introduction to Machine Learning
Introduction to Machine Learning
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerful
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
Association Mining
Association Mining Association Mining
Association Mining
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Slides for automate or die (presentation)
Slides for automate or die (presentation)Slides for automate or die (presentation)
Slides for automate or die (presentation)
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
 
What is Data Science actually is?
What is Data Science actually is?What is Data Science actually is?
What is Data Science actually is?
 

Viewers also liked

Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)
Ricardo Idarraga Valbuena
 
Dragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor GraphicsDragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor Graphics
Rebekah Ronningen
 
Ancillary task drafts
Ancillary task draftsAncillary task drafts
Ancillary task drafts
evekerrigan
 
PRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENTPRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENT
lalithamani sampath
 
Negotiation
NegotiationNegotiation
Community Heart at Trinity at Alkimos
Community Heart at Trinity at AlkimosCommunity Heart at Trinity at Alkimos
Community Heart at Trinity at Alkimos
LWP Property Group
 
Nucleus
NucleusNucleus
Tesina Azzurra
Tesina AzzurraTesina Azzurra
Tesina Azzurra
aiutodislessia
 

Viewers also liked (10)

Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)
 
Dragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor GraphicsDragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor Graphics
 
Ancillary task drafts
Ancillary task draftsAncillary task drafts
Ancillary task drafts
 
PRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENTPRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENT
 
Negotiation
NegotiationNegotiation
Negotiation
 
Community Heart at Trinity at Alkimos
Community Heart at Trinity at AlkimosCommunity Heart at Trinity at Alkimos
Community Heart at Trinity at Alkimos
 
the_inner_edge_n-05
the_inner_edge_n-05the_inner_edge_n-05
the_inner_edge_n-05
 
Nucleus
NucleusNucleus
Nucleus
 
...
......
...
 
Tesina Azzurra
Tesina AzzurraTesina Azzurra
Tesina Azzurra
 

Similar to Data Science Isn't a Fad: Let's Keep it That Way

DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
Gary Rector
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
WCJones6348
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
Dan Elton
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
Dylan Erens
 
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
Data Con LA
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert Munro
Robert Munro
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
whittemorelucilla
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
jk6653284
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Hugo Bowne-Anderson
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualization
Visual Resources Association
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
Sanghamitra Deb
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
Suman Chatterjee
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
butest
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
Jack Moore
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclass
Lean Analytics
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data science
Ashiq Rahman
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
Sara Hooker
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
Gramener
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
Istituto nazionale di statistica
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 

Similar to Data Science Isn't a Fad: Let's Keep it That Way (20)

DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert Munro
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualization
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclass
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data science
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 

Recently uploaded

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 

Recently uploaded (20)

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 

Data Science Isn't a Fad: Let's Keep it That Way

  • 1. Data Science Isn’t a Fad Let’s Keep It That Way Presentation to Research Triangle Analysts February 21, 2013 www.rtpanalysts.org
  • 2. Data Science: Buyer Beware Forbes article: Data Science: Buyer Beware “This is a management fad.” Me: I’ve been doing this for 16 years. It isn’t a fad. You keep renaming it. Result: Great conversation, and another Forbes article.
  • 3. Obligatory Definition Wikipedia: Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners. Sexiest job of the 21st century. --Thomas H. Davenport and DJ Patil Pseudo science performed by rock-star unicorns. -- The Internet
  • 4. Data SCIENCE Data: emphasizes the transformation of raw information into actionable results. Science: emphasizes the commitment to verifiable and repeatable process. Data Science: The discipline of transforming raw information into actionable results in a manner that is verifiable and repeatable. “Information is cheap. Meaning is expensive.” --George Dyson, 2011
  • 5. Data Science Is.... Google’s Search Engine Fraud Framework Spotfire Operations Analytics in Production Analytics
  • 6. Once upon a time... Information was VERY expensive.
  • 7. Data Science and Statistics The statistical methods you learn as an undergraduate were optimized to make efficient use of small data samples. Data is a unique resource: The more you have, the more valuable each individual piece becomes. Provided you can extract meaning from the information.
  • 8. “Big Data” = New Problems Dynamic environment: relationships change. Constant sampling means you will have false positives. Large numbers of variables and data points means you have to rely on automated tools. Not all automated tools are created equal.
  • 9. Cue Shameless Plug.... John Sall Co-Founder & EVP of SAS Institute Director of JMP “From Big Data to Big Statistics” March 21, 6:30pm Louie and Charlies www.louieandcharlies.com
  • 10. Raw Information to Actionable Results The results of the analysis must answer the business question(s). The results of the analysis must provide a course of action.
  • 11. Actionable Click on this link. Check this person’s file. Stop/encourage this Look at this pattern. activity.
  • 12. Verifiable The assumptions from the underlying methods must be stated and shown to be true. Outlier cases must be documented and handled effectively. Different analysis, error table, excluded point.
  • 13. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet Linear regression assumes a straight line relationship and normally distributed errors.
  • 14. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet This line has the same statistics as the one before. But the relationship is not a straight line.
  • 15. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet An outlier is affecting the equation.
  • 16. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet One outlier drives the entire relationship.
  • 17. Repeatable When I do this again with data that meets the stated assumptions, I should get the same answers. Small changes in the data should NOT break the algorithm. Easier said than done.
  • 18. Making Results Repeatable Automated verification of assumptions. Good coding practices (no matter the language). Out of sample testing. Do the same analysis with similar data. Failure conditions Document what should happen when bad data goes into the algorithm. Run the algorithm with bad data.
  • 19. This is the endpoint of the analysis. Companies who hire data scientists use the results to make decisions.
  • 20. Repeatable: Closing the Loop With Users It is the data scientist’s responsibility to make sure the results are used effectively. Involve users at the beginning of the process. Use iterative feedback to make sure results are: Actionable Verifiable Repeatable.
  • 21. Why Bother? “Beware the Big Errors of Big Data” “Big Data is Falling into the Trough of Disillusionment” “If you asked me to describe the rising philosophy of the day, I would say it’s data-sim...”
  • 22. Really,Then, Why Bother? “...the Oakland A's' front office ...fielded a team that could compete successfully against richer competitors in Major League Baseball (MLB).”
  • 23. Because What We Do Matters “Refugees United...uses mobile and web technologies to help refugees find their missing loved ones.” --datakind.org “Predictive analytics is saving lives and taxpayer dollars in New York City.” --Alex Howard, Michael Flowers interview
  • 24. That’s Enough From Me What do you think about me? mthielbar@gmail.com melindathielbar.wordpress.com info@rtpanalysts.org THANK YOU! All photos the property of their respective owners.