Revolution ConfidentialT he R is e of DataS c ienc e in the age ofB ig Data A nalytic sWhy Data Dis tillation and Mac hine...
Today, we’ll dis c us s :               Revolution Confidential   What is Data Science?   Why machine learning isn’t eno...
Revolution Confidential© Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/                     3
Where is it s afe to fis h near S an F ranc is c o?  Revolution Confidential                   San Francisco Estuary Insti...
Hurric ane S andy                                                                 Revolution Confidential           Bob Ru...
Hurric ane S andy                                 Revolution Confidential  Ed Chen  http://blog.echen.me/hurricane-sandy-o...
When did Mic hael J ac ks on have hisbigges t hits ?                                                                      ...
T hree E s s ential S kills of Data S c ientis ts                                      Revolution Confidential            ...
Revolution ConfidentialImage © Abode of Chaos, CC BY 2.0http://www.flickr.com/photos/home_of_chaos/6418989233/            ...
Mac hine learning (ML ) for predic tions                                                 Revolution Confidential          ...
P roblem: A lac k of pers pec tive                                           Revolution Confidential                Image ...
P roblem: L ac k of c redibility   Revolution Confidential                                                    12
P roblem: C omplexity   Revolution Confidential                                         13
Data Science to the             Revolution Confidential           Rescue!                              14
A ns wer Unas ked Ques tions                                                         Revolution Confidential              ...
F ill in knowledge gaps                                                                                 Revolution Confide...
Avoid ineffec tive reac tions                                                     Revolution Confidential   S&P 500       ...
Revolution Confidential© Henricks Photos CC-BY-ND 2.0http://www.flickr.com/photos/hendricksphotos/3240667626/             ...
0. Data (B ig & Mes s y)   Revolution Confidential                                            19
1. A language for programming with data     Revolution Confidential                           Download the White Paper    ...
Data import and pre-                                           processing                                                R...
2. S peed. L ots and lots of s peed.                                       Revolution Confidential                        ...
Us e all available c omputing c yc les                             Revolution Confidential                                ...
3. A lgorithms that don’t c hoke on B ig Data                                       Revolution Confidential               ...
Drink les s c offee!                     Revolution Confidential               Single Threaded                Non-optimize...
4. Move c ode to data (not vic e vers a)          Revolution Confidential            Map-Reduce                 RHadoop: h...
B ig Data A pplianc es                        Revolution Confidential         More info: http://bit.ly/R-Netezza          ...
P lay Nic e with Others             Revolution Confidential    Presentation Layer    • Business Intelligence Tools    • We...
What every data s c ientis t needs                            Revolution Confidential                                     ...
R evolution R E nterpris e: B ig-Data R                                Revolution Confidential                            ...
Revolution ConfidentialImage © www.tinyplanetphotography.com                    31
A nd … the future?                                    Revolution Confidential Even more data Cloud computing Demand for...
Diverging data paradigms                                 Revolution Confidential                 More data, better fault t...
Data S c ienc e in P roduc tion              Revolution Confidential     Real-time Big Data Analytics: From         Deploy...
B uilding Data S c ienc e Teams                 Revolution Confidential DJ Patil in O’Reilly Radar: http://oreil.ly/I3H5f...
C los ing T houghts                    Revolution Confidential Data Science process leads to more  powerful, and more use...
R es ourc es                               Revolution Confidential Revolution R Enterprise : R for Big Data   www.revolu...
T hank you.                                                                      Revolution Confidential           The lea...
Upcoming SlideShare
Loading in …5
×

The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough

2,807 views

Published on

The reason why Big Data is important is because we want to use it to make sense of our world. It’s tempting to think there’s some “magic bullet” for analyzing big data, but simple “data distillation” often isn’t enough, and unsupervised machine-learning systems can be dangerous. (Like, bringing-down-the-entire-financial-system dangerous.) Data Science is the key to unlocking insight from Big Data: by combining computer science skills with statistical analysis and a deep understanding of the data and problem we can not only make better predictions, but also fill in gaps in our knowledge, and even find answers to questions we hadn’t even thought of yet.

The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough

  1. 1. Revolution ConfidentialT he R is e of DataS c ienc e in the age ofB ig Data A nalytic sWhy Data Dis tillation and Mac hineL earning A ren’t E noughDavid M S mithV P Marketing and C ommunityR evolution Analytic s
  2. 2. Today, we’ll dis c us s : Revolution Confidential What is Data Science? Why machine learning isn’t enough Why Data Science works The Data Scientists Toolkit The Future of Big Data Analytics Closing thoughts and resources 2
  3. 3. Revolution Confidential© Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/ 3
  4. 4. Where is it s afe to fis h near S an F ranc is c o? Revolution Confidential San Francisco Estuary Institute http://www.sfei.org/tools/wqt 4
  5. 5. Hurric ane S andy Revolution Confidential Bob Rudis http://rud.is/b/2012/10/28/watch-sandy-in-r-including-forecast-cone/ 5
  6. 6. Hurric ane S andy Revolution Confidential Ed Chen http://blog.echen.me/hurricane-sandy-outages/ 6
  7. 7. When did Mic hael J ac ks on have hisbigges t hits ? Revolution Confidential New York Times, June 25 2009 (3 hours after Michael Jackson’s death) http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html 7
  8. 8. T hree E s s ential S kills of Data S c ientis ts Revolution Confidential ModelsData Integration Visualization Mashups Predictions Applications Uncertainty Problems Effective Data Sources Data Credibility Applications Drew Conway http://www.dataists.com/2010/09/the-data-science-venn-diagram/ 8
  9. 9. Revolution ConfidentialImage © Abode of Chaos, CC BY 2.0http://www.flickr.com/photos/home_of_chaos/6418989233/ 9
  10. 10. Mac hine learning (ML ) for predic tions Revolution Confidential Building the Model Responses Features scoring Scoring new data ML rules Predictions (scores) New Data scoring Validating the Model Predictions rules Response Validation scoring set rules “Accuracy” 10
  11. 11. P roblem: A lac k of pers pec tive Revolution Confidential Image © 2010 David M Smith. Some rights reserved CC BY-2.0 11
  12. 12. P roblem: L ac k of c redibility Revolution Confidential 12
  13. 13. P roblem: C omplexity Revolution Confidential 13
  14. 14. Data Science to the Revolution Confidential Rescue! 14
  15. 15. A ns wer Unas ked Ques tions Revolution Confidential Revolutions blog: “The Uncanny Valley of Big Data” http://blog.revolutionanalytics.com/2012/02/the-uncanny-valley-of-big-data.html 15
  16. 16. F ill in knowledge gaps Revolution Confidential “Companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue.” -- Tim O’Reilly “More data beats better algorithms, every time” – Google Google Research, “The Unreasonable Effectiveness of Data”: http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html Tim O’Reilly on Google+: https://plus.google.com/107033731246200681024/posts/4Xa76AtxYwd TechnoCalifornia: http://technocalifornia.blogspot.com/2012/07/more-data-or-better-models.html 16
  17. 17. Avoid ineffec tive reac tions Revolution Confidential S&P 500 Stupid Data Miner Tricks http://nerdsonwallstreet.typepad.com/my_weblog/files/dataminejune_2000.pdf 17
  18. 18. Revolution Confidential© Henricks Photos CC-BY-ND 2.0http://www.flickr.com/photos/hendricksphotos/3240667626/ 18
  19. 19. 0. Data (B ig & Mes s y) Revolution Confidential 19
  20. 20. 1. A language for programming with data Revolution Confidential Download the White Paper R is Hot bit.ly/r-is-hot 20
  21. 21. Data import and pre- processing Revolution Confidential User-defined functions Internet API interface XML parsing Grant awards to homeless veterans FY09Iterative data processing Data: Data.gov Analysis: Drew Conway Custom graphics 21
  22. 22. 2. S peed. L ots and lots of s peed. Revolution Confidential Variable Transformation Feature Selection Model Data Sampling Estimation Predictions Aggregation Model Model Comparison / Refinement Benkmarking 22
  23. 23. Us e all available c omputing c yc les Revolution Confidential Shared Memory Data Data Data Core 0 Core 1 Core 2 Core n Disk (Thread 0) (Thread 1) (Thread 2) (Thread n) Multicore Processor (4, 8, 16+ cores) 23
  24. 24. 3. A lgorithms that don’t c hoke on B ig Data Revolution Confidential Compute Node Data Partition Compute Data Node Partition BIG Data Master Node Partition Compute DATA Node Data Partition Compute NodePEMAs: Parallel External-Memory Algorithms 24
  25. 25. Drink les s c offee! Revolution Confidential Single Threaded Non-optimized algorithms Optimized Parallelized Algorithms 25
  26. 26. 4. Move c ode to data (not vic e vers a) Revolution Confidential Map-Reduce RHadoop: http://bit.ly/RHadoop 26
  27. 27. B ig Data A pplianc es Revolution Confidential More info: http://bit.ly/R-Netezza 27
  28. 28. P lay Nic e with Others Revolution Confidential Presentation Layer • Business Intelligence Tools • Web-based data apps • Reporting / Spreadsheets Analytics Layer •R Data Layer • Relational datastores • Unstructured datastores 28
  29. 29. What every data s c ientis t needs Revolution Confidential Revolution R Open-Source R Enterprise Interface with multiple data sources ✓ ✓✓ Exploratory data analysis ✓✓ ✓✓ Wide range of statistical methods ✓✓ ✓✓ High-speed computation ✘ ✓✓ Big Data support ✘ ✓✓ Data/code locality (Hadoop, etc.) ✘ ✓✓ Print-quality data visualization ✓ ✓ Scheduled batch production ✓ ✓✓ Works in a multi-tool ecosystem ✓✓ ✓✓ Integration into Data Apps ✘ ✓✓ 29
  30. 30. R evolution R E nterpris e: B ig-Data R Revolution Confidential Revolution R Open-Source R Enterprise Interface with multiple data sources ✓ ✓✓ Exploratory data analysis ✓✓ ✓✓ Wide range of statistical methods ✓✓ ✓✓ High-speed computation ✘ ✓✓ Big Data support ✘ ✓✓ Data/code locality (Hadoop, etc.) ✘ ✓✓ Print-quality data visualization ✓✓ ✓✓ Scheduled batch production ✓ ✓✓ Works in a multi-tool ecosystem ✓✓ ✓✓ Integration into Data Apps ✘ ✓✓ www.revolutionanalytics.com/products 30
  31. 31. Revolution ConfidentialImage © www.tinyplanetphotography.com 31
  32. 32. A nd … the future? Revolution Confidential Even more data Cloud computing Demand for Data Scientists Diverging paradigms for data analytics http://www.indeed.com/jobtrends 32
  33. 33. Diverging data paradigms Revolution Confidential More data, better fault tolerance Files Data Hadoop Clusters Appliances NoSQLExploration Storage Modeling Preprocessing Easier programming, better performance Production 33
  34. 34. Data S c ienc e in P roduc tion Revolution Confidential Real-time Big Data Analytics: From Deployment to Production Thursday, November 29, 2012 10:00AM - 11:00AM Pacific Timewww.revolutionanalytics.com/news-events/free-webinars/ 34
  35. 35. B uilding Data S c ienc e Teams Revolution Confidential DJ Patil in O’Reilly Radar: http://oreil.ly/I3H5fI Statistics and Data Science graduates Kaggle and Chorus Revolution Analytics R Training:  http://www.revolutionanalytics.com/services/training/ 35
  36. 36. C los ing T houghts Revolution Confidential Data Science process leads to more powerful, and more useful models Data Scientists need a technology platform to think about, explore, and model data Revolution R Enterprise is R for Big Data 36
  37. 37. R es ourc es Revolution Confidential Revolution R Enterprise : R for Big Data  www.revolutionanalytics.com/products Rhadoop : Connecting R and Hadoop  bit.ly/r-hadoop Contact David Smith  david@revolutionanalytics.com  @revodavid  blog.revolutionanalytics.com 37
  38. 38. T hank you. Revolution Confidential The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR 38

×