Successfully reported this slideshow.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Data Science with R for Java developers

  1. 1. Data Science With R ~ for ~ Java Developers @Sander_Mak
  2. 2. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  3. 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
  4. 4. Stand back. I know Data Science!
  5. 5. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  6. 6. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  7. 7. Data Science: Achievement Unlocked
  8. 8. R, R-Studio Today Data Science: Achievement Unlocked
  9. 9. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  10. 10. Language Designers Statisticians
  11. 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
  12. 12. Why R, then? Open Source De-facto standard (in statistical research) “It’s a DSL posing as general purpose language” Interactive data exploration
  13. 13. Why not R, then? Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  14. 14. Why not R, then? ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  15. 15. Apparently, statisticians aren’t designers, either...
  16. 16. VS
  17. 17. Dynamic (eval) Interpreted Static types Compiled Functional/OO/Procedural OO
  18. 18. Factor Enum numeric character String Integer/Double/...
  19. 19. Factor Enum numeric character String vector list dataframe Integer/Double/...
  20. 20. 1-based 0-based1 2 3 4 0 1 2 3
  21. 21. 1-based 0-based1 2 3 4 0 1 2 3 for-loops higher-order functions sapply(vec, function(elm) { elm + 1; })
  22. 22. eager evalutationlazy evaluation
  23. 23. eager evalutationlazy evaluation pass-by-value (copy-on-write) pass-by-reference Function F Value A Value A Value A’ call F(A) modify
  24. 24. Studio
  25. 25. Central Comprehensive R Archive Network Studio
  26. 26. Coding time!
  27. 27. Titanic Competition: Machine Learning from Disaster
  28. 28. Titanic Competition: Machine Learning from Disaster
  29. 29. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Fare > 100 T FT T F
  30. 30. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Random Forest Fare > 100 T FT T F T FT T FT FT T F T FT T FT FT T F
  31. 31. Demo time!
  32. 32. ... ...
  33. 33. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  34. 34. Bridging R and Java Integrate Assimilate Replace
  35. 35. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
  36. 36. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  37. 37. Assimilate // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')"); Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  38. 38. Reimplementation of R on JVM Share data: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  39. 39. Reimplementation of R on JVM Share data: import(com.foo.User) # instantiate Java beans tim <- User$new(name='Tim', age=23) tom <- User$new(name='Tom', age=45) # invoke setter tim$name <- "Timmy" Use Java from Renjin: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  40. 40. Big Data?
  41. 41. Replace JVM Libraries/platforms
  42. 42. Replace Scalable R distributions (non-JVM) Revolution Analytics Oracle Enterprise R
  43. 43. Wrap-up Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  44. 44. Sanitize Explore Model Predict Scale
  45. 45. Next steps Computing for Data Analysis starts Sept. 23rd Install R Read
  46. 46. Questions? Data Science The R language Gimme some Java! 1 1 1 1 1 1 110 0 0 0 0 0 @Sander_Mak branchandbound.net

×