Data Science with R for Java Developers

3,769 views

Published on

Published in: Technology, Education

Data Science with R for Java Developers

  1. 1. a D ci S ta i W ce en ~ R th r~ fo va Ja rs pe elo ev D S @ ak _M er nd a
  2. 2. Data Science 1 0 1 0 a 1 0 1 1 The R language end 0 Ag 0 1 10 1 Gimme some Java!
  3. 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
  4. 4. Stand back. I know Data Science!
  5. 5. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
  6. 6. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
  7. 7. Data Science: Achievement Unlocked
  8. 8. Data Science: Achievement Unlocked To da y R, R-Studio
  9. 9. Ag end a Data Science 1 0 1 0 The R language 0 1 0 1 1 0 11 0 1 Gimme some Java!
  10. 10. Language Designers? Statisticians?
  11. 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
  12. 12. Why R, then? De-facto standard (in statistical research) Open Source Interactive data exploration “It’s a DSL posing as general purpose language”
  13. 13. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?)
  14. 14. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?) ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno
  15. 15. Apparently, statisticians aren’t designers, either...
  16. 16. VS
  17. 17. Functional/OO/Procedural Dynamic (eval) Interpreted OO Static types Compiled
  18. 18. numeric character Factor Integer/Double/... String Enum
  19. 19. vector list dataframe numeric character Factor Integer/Double/... String Enum
  20. 20. 1 2 3 4 1-based 0 1 2 3 0-based
  21. 21. higher-order functions sapply(vec, function(elm) { elm + 1; }) 1 2 3 4 1-based 0 1 2 3 0-based for-loops
  22. 22. Studio
  23. 23. Studio Comprehensive R Archive Network Central
  24. 24. Coding time!
  25. 25. Titanic Competition: Machine Learning from Disaster
  26. 26. Titanic Competition: Machine Learning from Disaster Survived?
  27. 27. Titanic Competition: Machine Learning from Disaster Decision Tree Sex == Female Age > 16 Age > 50 Fare > 100 T T F T F
  28. 28. Titanic Competition: Machine Learning from Disaster Decision Tree Random Forest Sex == Female Age > 16 Age > 50 T T T F T T Fare > 100 F F T F T T T F T F T T T F T F F T F
  29. 29. Demo time!
  30. 30. . . . . . .
  31. 31. Data Science a 1 0 1 1 0 The R language end 0 Ag 0 11 0 1 Gimme some Java! 1 0 1
  32. 32. Bridging R and Java Integrate Assimilate Replace
  33. 33. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
  34. 34. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  35. 35. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet... // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')");
  36. 36. Big Data?
  37. 37. JVM Libraries/platforms Replace
  38. 38. Scalable R distributions (non-JVM) Replace Revolution Analytics Oracle Enterprise R
  39. 39. Wr apup Data Science 1 0 The R language 0 1 1 1 0 Gimme some Java! 1 0 1 0 11 0
  40. 40. Sanitize Explore Model Predict Scale
  41. 41. Next steps Install R Read Computing for Data Analysis starts Jan. 6th 2014
  42. 42. Qu esti ons ? Data Science The R language @Sander_Mak 0 0 1 Gimme some Java! 0 1 1 11 0 00 1 1 1 branchandbound.net

×