http://www.hilarymason.com/media_and_press/im-in-glamour-magazine/Ivan Fellegi, Chief Statistician of Canada and SSC President for 1981http://www.flickr.com/photos/ssc_liaison/431047111/
Churn: bestalgorithms for predicting churn have lift of 5-7 – 5-7 times better than random. Behavioral advertising: 2-3% CTR – 10 times better than random
1. 1 May 14, 2013© Kalido I Kalido Confidential May 14, 2013Data Scientist: Your Must-HaveBusiness Investment NOW
2. 2 May 14, 2013© Kalido I Kalido Confidential May 14, 2013Gregory PiatetskyEditor, Kdnuggetsco-founder KDD and ACM SIGKDDDavid SmithData ScientistRevolution AnalyticsCarla GentryData ScientistAnalytical SolutionDarren PeirceCTOKalidoEric KavanaghDM Radio HostInformation ManagementMagazine’s DM RadioToday’s Speakers #DataScienceNow
3. Revolution Confidential3© Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/
4. Revolution ConfidentialStatistician Data ScientistImage Baseball (Cricket) HBR Sexiest Job of 21st CenturyMode Reactive ConsultativeWorks Solo In a teamInputs Data File, Hypothesis A Business ProblemData Pre-prepared, clean Distributed, messy, unstructuredData Size Kilobytes GigabytesTools SAS, Mainframe R, Python, awk, Hadoop, Linux,…Nouns Tables Data VisualizationsFocus Inference (why) Prediction (what)Output Report Data App / Data ProductLatency Weeks SecondsStars G.E.P BoxTrevor HastieHilary MasonNate Silverhttp://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 4
5. Revolution ConfidentialStatistician Data ScientistImage Baseball (Cricket) HBR Sexiest Job of 21st CenturyMode Reactive ConsultativeWorks Solo In a teamInputs Data File, Hypothesis A Business ProblemData Pre-prepared, clean Distributed, messy, unstructuredData Size Kilobytes GigabytesTools SAS, Mainframe R, Python, awk, Hadoop, Linux,…Nouns Tables Data VisualizationsFocus Inference (why) Prediction (what)Output Report Data App / Data ProductLatency Weeks SecondsStars G.E.P BoxTrevor HastieHilary MasonNate Silver5
6. Revolution ConfidentialThree Essential Skills of Data Scientists6Drew Conwayhttp://www.dataists.com/2010/09/the-data-science-venn-diagram/Data IntegrationMashupsApplicationsModelsVisualizationPredictionsUncertaintyProblemsData SourcesCredibilityEffectiveDataApplications
7. Revolution ConfidentialData Science to theRescue!
8. Revolution ConfidentialBusinessIntelligence Data SciencePerspective Looking backwards Looking forwardsActions Slice and Dice InteractExpertise Business User Data ScientistData Warehoused, Siloed Distributed, real-timeScope Unlimited Specific business questionQuestions What happened? What will happen?What if?Output Table AnswerApplicability Historic, possibleconfounding factorsFuture, correcting for influencesTools SAP, Cognos,Microstrategy, SASRevolution R EnterpriseQlikView, Tableau, JaspersoftHot or not? So 1997 Transformational8
9. What is Data Science?By Carla GentryData ScientistAnalytical-Solution
10. Data Science is….• The term "data science" has existed for overthirty years – first mentioned by Peter Naur in1960 but more recently it has gained a lot ofattention!
11. Data Science can be broken down into4 main areas of expertise.• Data knowledge– design & structure• Programming– SAS, R, SQL, NO-SQL• Analytics– Insight• Communication– Tell the story
12. Data Knowledge: Part analyst - part IT• What kind of servers do you own?- Servers vs. Mainframe• What kind of load can the server handle?- Iterations matter– Why ask this?
13. Programming – Pick a language anduse it wisely• Efficiency is KING!- Why?• Number of iterations & complex algorithms orscripts. Snowflakes vs. Star schema?-Design is import but why?• Key things: normalize, index, there is more toData Science than just analytics.
14. How can I learn about Data Science?• For those who want to invest their time andtalent there are resources.• College Courses• Online• Webinars• Blogs
15. Data Science and Data ScientistsNowGregory Piatetsky, @kdnuggetsAnalytics, Big Data,Data Mining, and Data Science Resources15© KDnuggets 2013
16. • Statistics, 1830-• Data mining, 1980-• Knowledge Discovery inData (KDD), 1989-• Business Analytics, 1997-• Predictive Analytics, 2002-• Data Analytics,2011-• Data Science, 2011-• …?© KDnuggets 2013 16Same Core Idea:Finding UsefulPatterns in DataDifferentEmphasisTrends from Google Ngrams (1800-2008)and Google Trends (2005-2013)
17. Big Data > Data Mining >Business Analytics > Predictive Analytics> Data Science17© KDnuggets 2013Big DataGoogle Trends search, Jan 2008- Apr 2013, WorldwideData mining
18. © KDnuggets 2013 18Data Scientist – sexiest job of the 21st Century (???)say Thomas H. Davenport and D.J. Patil, (HBR, Oct 2012)“Data Scientist”Fastest growing term onwww.kdnuggets.com/jobs1% of jobs in 20104% of jobs in 201119% of jobs in 201223% of jobs in 2013
19. 19© KDnuggets 2013Data MiningBig DataData Scientist“Data mining” jobs are more common, but“Big Data” jobs are surging much faster than “Data Scientist”“Statistician” jobs are steady, but not growingStatistician
20. • Big Data can produce better predictions, but expect limitedimprovement• Example: Netflix prize took 3 years to improve prediction ofmovie ranking from 0.95 stars to 0.86• Inherent randomness in human behavior• Data Science should help separate hype from reality• Biggest effects from Big Data are from new platforms, likeGoogle, Facebook, LinkedIn; Personalized medicine• However, Big Data makes privacy online almost possibleGregory Piatetsky-Shapiro, Big Data Hype and Reality, HarvardBusiness Review blog, Oct 2012© KDnuggets 2013 20
21. © 2013 KDnuggets21Gartner Hype CycleBig DataGartner VP says Big Datais Falling into the Troughof Disillusionment, Jan2013
22. © 2013 Kalido I Kalido Confidential I May 14, 201322Q&AGregory PiatetskyEditor, Kdnuggetsco-founder KDD and ACMSIGKDD@kdnuggetsDavid SmithData ScientistRevolution Analytics@revodavidCarla GentryData ScientistAnalytical Solution@data_nerdDarren PeirceCTOKalido@DarrenPeirceEric KavanaghDM Radio HostInformation ManagementMagazine’s DM Radio@eric_kavanagh
23. © 2010 Kalido I Kalido Confidential I May 14, 201323Summers Sessions: Two Tracks For YOUSeries KickoffMay 14: Data Scientist: Your must-havebusiness investment now.(30 Minute Learning Sessions)May 28 Rapid Data Integrationtools and methodsJune 4 Harmonizing Data for theWarehouseJune 11 Rapid Iteration MethodologyUsing Information ModelsSeries KickoffJune 25: Find your data warehouse’s hiddencosts before they find you.(30 Minute Learning Sessions)July 2 The real cost per release cycleJuly 9 Automate to reduce operating costsJuly 16 Reduce tool costJuly 23 Scale drives cost reductionsAgile Information Foundationfor the Data ScientistTCO: Find data warehousecosts before they find you.Visit get.kalido.com/summer-series to register