• Save
Data Scientist 101 BI Dutch
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Data Scientist 101 BI Dutch

  • 3,377 views
Uploaded on

Slides for my 30 minute 'keynote' during the June 2013 BI Dutch session

Slides for my 30 minute 'keynote' during the June 2013 BI Dutch session

More in: Career , Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,377
On Slideshare
1,766
From Embeds
1,611
Number of Embeds
36

Actions

Shares
Downloads
0
Comments
0
Likes
5

Embeds 1,611

http://todobi.blogspot.com.es 691
http://todobi.blogspot.com 440
http://todobi.blogspot.com.ar 122
http://cloud.feedly.com 93
http://todobi.blogspot.mx 86
https://twitter.com 32
http://www.todobi.blogspot.com 30
http://todobi.blogspot.de 20
http://feeds.feedburner.com 15
http://todobi.blogspot.com.br 14
http://www.linkedin.com 9
http://newsblur.com 7
http://todobi.blogspot.co.uk 7
http://todobi.blogspot.fr 6
http://todobi.blogspot.co.nz 6
http://www.todobi.blogspot.com.es 3
http://todobi.blogspot.in 3
http://feedreader.com 2
http://www.directrss.co.il 2
http://todobi.blogspot.jp 2
http://todobi.blogspot.com.au 2
http://todobi.blogspot.ca 2
http://todobi.blogspot.nl 2
https://www.linkedin.com 2
http://webcache.googleusercontent.com 2
http://131.253.14.66 1
http://todobi.blogspot.tw 1
http://todobi.blogspot.ch 1
http://dev.newsblur.com 1
http://todobi.blogspot.be 1
http://www.digg.com 1
http://todobi.blogspot.it 1
http://prlog.ru 1
http://todobi.blogspot.se 1
http://todobi.blogspot.hu 1
http://feedly.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Scientist 101: How to become a Super Cruncher
  • 2. “All truths are easy to understand once they are discovered; the point is to discover them.”
  • 3. The 4 “soft” C's of a Data Scientist
  • 4. ...and the 5 R's of 21st Century Literacy ⇨Reading ⇨wRiting ⇨aRithmetic ⇨pRobability ⇨R Source: Joe BlitzStein, Harvard
  • 5. "data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation." Kate Crawford, Microsoft Research/MIT
  • 6. Wrong prediction due to extensive media attention & coverage
  • 7. Data Science: wetting your appetite
  • 8. The Data Science Venn Diagram Source: Drew Conway, NYU http://drewconway.com/zia/2013/3/ 26/the-data-science-venn-diagram
  • 9. Another way to look at things...
  • 10. The nerdy approach... Source: Hillary Mason, bit.ly
  • 11. Data Scientists have more fun Source: How to Engage and Retain Analytical Talent By Elizabeth Craig, Jeanne G. Harris and Henry Egan January 2010
  • 12. How Do I Become A Data Scientist? ⇨ Learn about matrix factorizations ⇨ Learn about distributed computing ⇨ Learn about statistical analysis ⇨ Learn about optimization ⇨ Learn about machine learning ⇨ Learn about information retrieval ⇨ Learn about signal detection and estimation ⇨ Master algorithms and data structures ⇨ Practice ⇨ Study Engineering Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
  • 13. 6 levels of expertise needed Data wranglingStatistics Data mining Visualization Communication Data Science* Domain & Business Expertise * a bit of programming skills doesn't hurt either
  • 14. Programming Skills? C C++ PAL Smalltalk VB.Net C# SQL LotusScript VBScript JavaScript HTML Delphi (Java) Python R Perl Me “Them” Prolog Octave Ruby SQL Pascal
  • 15. SQL Still Matters! ⇨ Big Data SQL ⇨ Hbase & Hive ⇨ Amazon Redshift ⇨ Cloudera Impala ⇨ HortonWorks Stinger ⇨ ... Source: KDNuggets.com
  • 16. How about Technology?
  • 17. New analytics->new infrastructure
  • 18. The Analytics Landscape
  • 19. Why you need (some) Statistics
  • 20. Correlation != Causation
  • 21. Learning Statistics ⇨ Coursera.org ⇨ Statistics One ⇨ Passion Driven Statistics ⇨ Statistics: Making sense of Data
  • 22. Essentially, all models are wrong... ...but some are useful George E.P. Box
  • 23. Learning Data Mining ⇨ Coursera.org ⇨ Machine Learning ⇨ Neural Networks for Machine Learning ⇨ Kaggle.com ⇨ Kaggle In Class
  • 24. VisualizationVisualization
  • 25. Visualization is... Theconversionofanyabstractdataintoagraphicalformatsothecharacteristicsand relationshipsofthedatacanbeexploredandanalyzed. ⇨ Humans have the ability to analyze large amounts of information that is presented visually ⇨ This is good for certain types of pattern and trend analysis ⇨ It’s often easy to detect outliers and unusual patterns Usefulforexploration,explanation,discovery,but not forautomatedsystemactions.
  • 26. How many 5's? 3435261241134352612203498723566 9623466620398652034095823450238 4560289567109238401645089630489 5769782364196873484
  • 27. Again: how many 5's? 3435261241134352612203498723566 9623466620398652034095823450238 4560289567109238401645089630489 5769782364196873484
  • 28. Learning Visualization ⇨ Stephen Few classes ($$) ⇨ Alberto Cairo ⇨ Introduction to Data Journalism
  • 29. Want to get your feet wet? Tableau Public http://www.tableausoftware.com/public/ SAS Visual Analytics http://www.sas.com/software/visual-analytics
  • 30. Where to go from here? ⇨ Read 'Competing on Analytics' ⇨ Move on to 'Data Analysis Using SQL and Excel' ⇨ Then buy 'Handbook of Statistical Analysis & Data Mining Applications' ⇨ Statistics for business: ⇨ http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm ⇨ Data Mining: ⇨ www.rapid-i.com (RapidMiner) ⇨ http://www.thearling.com ⇨ http://www.autonlab.org/tutorials/ ⇨ For free text books, search www.scribd.com ⇨ Enter http://www.coursera.org
  • 31. More Resources to Get You Started Books: ⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff ⇨ DataPreparationforDataMining,DorianPyle ⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray ⇨ AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze ⇨ InformationRetrieval,C.J.vanRijsbergen ⇨ TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte Journals,Newsletters,WebSites: ⇨ SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining ⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence ⇨ SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics ⇨ KDNuggetsdataminingresources: www.kdnuggets.com ⇨ FlowingData,visualizationresources: http://flowingdata.com/ ⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/ ⇨ VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm ⇨ Recommendationsystemsresources: http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx ⇨ TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
  • 32. Free Stuff So You Can Work Cheaply ⇨ WEKA http://www.cs.waikato.ac.nz/ml/weka/ ⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/ ⇨ Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ ⇨ Parallel Sets http://eagereyes.org/parallel-sets#download ⇨ RapidMiner http://rapid-i.com/content/blogcategory/38/69/ ⇨ Knime http://www.knime.org/ ⇨ Orange http://www.ailab.si/Orange/ ⇨ R statistics software http://www.r-project.org/ ⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html ⇨ Octave numerical and matrix computation http://www.gnu.org/software/octave/ ⇨ Processing http://www.processing.org/ ⇨ Circos http://mkweb.bcgsc.ca/circos/ ⇨ Treemap http://www.cs.umd.edu/hcil/treemap/ ⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/ ⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
  • 33. Web: www.sas.com Email: jos.vandongen<at>sas.com Phone: +31-(0)6-10172008 Skype: tholis.jos LinkedIn: jvdongen Twitter: josvandongen Delicious: jvdongen Jos van Dongen In BI since 1991 Principal Consultant @ SAS Author/Speaker/Analyst