Data Science with R for Java developers

20,958 views

Published on

As presented at JavaOne 2013

Published in: Technology, Education
2 Comments
36 Likes
Statistics
Notes
  • Are you looking for IT Training with job placements? Search more than 5000 IT Certified Consultants here Register IT Courses at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi Sander, its a nice and thought provoking presentation, now i'm in the process of hiring Java developers freshers in line to Hadoop setup. would it require to hire R programmers for analytics or Java developers instead ? please assist me on this to go forward.

    Thank you and look forward to read from you at the earliest.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
20,958
On SlideShare
0
From Embeds
0
Number of Embeds
1,678
Actions
Shares
0
Downloads
350
Comments
2
Likes
36
Embeds 0
No embeds

No notes for slide

Data Science with R for Java developers

  1. 1. Data Science With R ~ for ~ Java Developers @Sander_Mak
  2. 2. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  3. 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
  4. 4. Stand back. I know Data Science!
  5. 5. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  6. 6. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  7. 7. Data Science: Achievement Unlocked
  8. 8. R, R-Studio Today Data Science: Achievement Unlocked
  9. 9. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  10. 10. Language Designers Statisticians
  11. 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
  12. 12. Why R, then? Open Source De-facto standard (in statistical research) “It’s a DSL posing as general purpose language” Interactive data exploration
  13. 13. Why not R, then? Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  14. 14. Why not R, then? ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  15. 15. Apparently, statisticians aren’t designers, either...
  16. 16. VS
  17. 17. Dynamic (eval) Interpreted Static types Compiled Functional/OO/Procedural OO
  18. 18. Factor Enum numeric character String Integer/Double/...
  19. 19. Factor Enum numeric character String vector list dataframe Integer/Double/...
  20. 20. 1-based 0-based1 2 3 4 0 1 2 3
  21. 21. 1-based 0-based1 2 3 4 0 1 2 3 for-loops higher-order functions sapply(vec, function(elm) { elm + 1; })
  22. 22. eager evalutationlazy evaluation
  23. 23. eager evalutationlazy evaluation pass-by-value (copy-on-write) pass-by-reference Function F Value A Value A Value A’ call F(A) modify
  24. 24. Studio
  25. 25. Central Comprehensive R Archive Network Studio
  26. 26. Coding time!
  27. 27. Titanic Competition: Machine Learning from Disaster
  28. 28. Titanic Competition: Machine Learning from Disaster
  29. 29. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Fare > 100 T FT T F
  30. 30. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Random Forest Fare > 100 T FT T F T FT T FT FT T F T FT T FT FT T F
  31. 31. Demo time!
  32. 32. ... ...
  33. 33. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  34. 34. Bridging R and Java Integrate Assimilate Replace
  35. 35. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
  36. 36. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  37. 37. Assimilate // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')"); Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  38. 38. Reimplementation of R on JVM Share data: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  39. 39. Reimplementation of R on JVM Share data: import(com.foo.User) # instantiate Java beans tim <- User$new(name='Tim', age=23) tom <- User$new(name='Tom', age=45) # invoke setter tim$name <- "Timmy" Use Java from Renjin: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  40. 40. Big Data?
  41. 41. Replace JVM Libraries/platforms
  42. 42. Replace Scalable R distributions (non-JVM) Revolution Analytics Oracle Enterprise R
  43. 43. Wrap-up Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  44. 44. Sanitize Explore Model Predict Scale
  45. 45. Next steps Computing for Data Analysis starts Sept. 23rd Install R Read
  46. 46. Questions? Data Science The R language Gimme some Java! 1 1 1 1 1 1 110 0 0 0 0 0 @Sander_Mak branchandbound.net

×