Data Science Isn’t a FadLet’s Keep It That Way     Presentation to Research Triangle Analysts                 February 21,...
Data Science: Buyer Beware               Forbes article: Data Science:                 Buyer Beware “This is a            ...
Obligatory Definition Wikipedia: Data science is a novel term that is often used interchangeably with competitive intellige...
Data SCIENCEData: emphasizes the transformation of rawinformation into actionable results.Science: emphasizes the commitme...
Data Science Is....    Google’s  Search Engine        Fraud FrameworkSpotfire Operations   Analytics in Production    Analy...
Once upon a time...	Information was VERY expensive.
Data Science and Statistics The statistical methods you learn as an undergraduate were optimized to make efficient use of s...
“Big Data” = New ProblemsDynamic environment: relationships change.Constant sampling means you will have false positives.L...
Cue Shameless Plug....              John Sall   Co-Founder & EVP of SAS Institute           Director of JMP     “From Big ...
Raw Information to ActionableResults The results of the analysis must answer the business question(s). The results of the ...
ActionableClick on this link.   Check this person’s file.Stop/encourage this                         Look at this pattern. ...
Verifiable The assumptions from the underlying methods must be stated and shown to be true. Outlier cases must be documente...
Y = 3.0017 + 0.499X                                 Corr = 0.8199Anscombe’s QuartetLinear regression assumes a straight li...
Y = 3.0017 + 0.499X                                  Corr = 0.8199Anscombe’s QuartetThis line has the same statistics as t...
Y = 3.0017 + 0.499X           Corr = 0.8199Anscombe’s QuartetAn outlier is affecting the equation.
Y = 3.0017 + 0.499X                                 Corr = 0.8199Anscombe’s QuartetOne outlier drives the entire relations...
RepeatableWhen I do this again with data that meets the statedassumptions, I should get the same answers.Small changes in ...
Making Results RepeatableAutomated verification of assumptions.Good coding practices (no matter the language).Out of sample...
This is the endpoint of the analysis.Companies who hire data scientists use the resultsto make decisions.
Repeatable: Closing theLoop With UsersIt is the data scientist’s responsibility to make sure theresults are used effective...
Why Bother?           “Beware the Big Errors of Big Data”  “Big Data is Falling into the   Trough of Disillusionment”     ...
Really,Then, Why Bother?     “...the Oakland As frontoffice ...fielded a team that could  compete successfully against   ric...
Because What We Do Matters         “Refugees United...uses mobile and        web technologies to help refugees find        ...
That’s Enough From MeWhat do you think about me?               mthielbar@gmail.com          melindathielbar.wordpress.com ...
Upcoming SlideShare
Loading in …5
×

Data Science Isn't a Fad: Let's Keep it That Way

616 views
384 views

Published on

First presented at the February 2013 Research Triangle Analysts meeting, this presentation discusses the technical side of making data science a field that's here to last. This presentation focuses on the "science" aspect of data science and how it drives value to an organization.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
616
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Science Isn't a Fad: Let's Keep it That Way

  1. 1. Data Science Isn’t a FadLet’s Keep It That Way Presentation to Research Triangle Analysts February 21, 2013 www.rtpanalysts.org
  2. 2. Data Science: Buyer Beware Forbes article: Data Science: Buyer Beware “This is a management fad.”Me: I’ve been doing this for 16 years. It isn’t a fad. You keep renaming it.Result: Great conversation, and another Forbes article.
  3. 3. Obligatory Definition Wikipedia: Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners. Sexiest job of the 21st century. --Thomas H. Davenport and DJ Patil Pseudo science performed by rock-star unicorns. -- The Internet
  4. 4. Data SCIENCEData: emphasizes the transformation of rawinformation into actionable results.Science: emphasizes the commitment to verifiable andrepeatable process.Data Science: The discipline of transforming rawinformation into actionable results in a manner that isverifiable and repeatable.“Information is cheap. Meaning is expensive.” --George Dyson, 2011
  5. 5. Data Science Is.... Google’s Search Engine Fraud FrameworkSpotfire Operations Analytics in Production Analytics
  6. 6. Once upon a time... Information was VERY expensive.
  7. 7. Data Science and Statistics The statistical methods you learn as an undergraduate were optimized to make efficient use of small data samples. Data is a unique resource: The more you have, the more valuable each individual piece becomes. Provided you can extract meaning from the information.
  8. 8. “Big Data” = New ProblemsDynamic environment: relationships change.Constant sampling means you will have false positives.Large numbers of variables and data points means youhave to rely on automated tools.Not all automated tools are created equal.
  9. 9. Cue Shameless Plug.... John Sall Co-Founder & EVP of SAS Institute Director of JMP “From Big Data to Big Statistics” March 21, 6:30pm Louie and Charlies www.louieandcharlies.com
  10. 10. Raw Information to ActionableResults The results of the analysis must answer the business question(s). The results of the analysis must provide a course of action.
  11. 11. ActionableClick on this link. Check this person’s file.Stop/encourage this Look at this pattern. activity.
  12. 12. Verifiable The assumptions from the underlying methods must be stated and shown to be true. Outlier cases must be documented and handled effectively. Different analysis, error table, excluded point.
  13. 13. Y = 3.0017 + 0.499X Corr = 0.8199Anscombe’s QuartetLinear regression assumes a straight linerelationship and normally distributed errors.
  14. 14. Y = 3.0017 + 0.499X Corr = 0.8199Anscombe’s QuartetThis line has the same statistics as the onebefore. But the relationship is not a straight line.
  15. 15. Y = 3.0017 + 0.499X Corr = 0.8199Anscombe’s QuartetAn outlier is affecting the equation.
  16. 16. Y = 3.0017 + 0.499X Corr = 0.8199Anscombe’s QuartetOne outlier drives the entire relationship.
  17. 17. RepeatableWhen I do this again with data that meets the statedassumptions, I should get the same answers.Small changes in the data should NOT break thealgorithm. Easier said than done.
  18. 18. Making Results RepeatableAutomated verification of assumptions.Good coding practices (no matter the language).Out of sample testing. Do the same analysis with similar data.Failure conditions Document what should happen when bad data goes into the algorithm. Run the algorithm with bad data.
  19. 19. This is the endpoint of the analysis.Companies who hire data scientists use the resultsto make decisions.
  20. 20. Repeatable: Closing theLoop With UsersIt is the data scientist’s responsibility to make sure theresults are used effectively.Involve users at the beginning of the process.Use iterative feedback to make sure results are: Actionable Verifiable Repeatable.
  21. 21. Why Bother? “Beware the Big Errors of Big Data” “Big Data is Falling into the Trough of Disillusionment” “If you asked me to describe the rising philosophy of the day, I would say it’s data-sim...”
  22. 22. Really,Then, Why Bother? “...the Oakland As frontoffice ...fielded a team that could compete successfully against richer competitors in Major League Baseball (MLB).”
  23. 23. Because What We Do Matters “Refugees United...uses mobile and web technologies to help refugees find their missing loved ones.” --datakind.org “Predictive analytics is saving lives and taxpayer dollars in New York City.” --Alex Howard, Michael Flowers interview
  24. 24. That’s Enough From MeWhat do you think about me? mthielbar@gmail.com melindathielbar.wordpress.com info@rtpanalysts.org THANK YOU!All photos the property of their respective owners.

×