- 1. a D ci S ta i W ce en ~ R th r~ fo va Ja rs pe elo ev D S @ ak _M er nd a
- 2. Data Science 1 0 1 0 a 1 0 1 1 The R language end 0 Ag 0 1 10 1 Gimme some Java!
- 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
- 4. Stand back. I know Data Science!
- 5. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
- 6. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
- 7. Data Science: Achievement Unlocked
- 8. Data Science: Achievement Unlocked To da y R, R-Studio
- 9. Ag end a Data Science 1 0 1 0 The R language 0 1 0 1 1 0 11 0 1 Gimme some Java!
- 10. Language Designers? Statisticians?
- 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
- 12. Why R, then? De-facto standard (in statistical research) Open Source Interactive data exploration “It’s a DSL posing as general purpose language”
- 13. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?)
- 14. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?) ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno
- 15. Apparently, statisticians aren’t designers, either...
- 16. VS
- 17. Functional/OO/Procedural Dynamic (eval) Interpreted OO Static types Compiled
- 18. numeric character Factor Integer/Double/... String Enum
- 19. vector list dataframe numeric character Factor Integer/Double/... String Enum
- 20. 1 2 3 4 1-based 0 1 2 3 0-based
- 21. higher-order functions sapply(vec, function(elm) { elm + 1; }) 1 2 3 4 1-based 0 1 2 3 0-based for-loops
- 22. Studio
- 23. Studio Comprehensive R Archive Network Central
- 24. Coding time!
- 25. Titanic Competition: Machine Learning from Disaster
- 26. Titanic Competition: Machine Learning from Disaster Survived?
- 27. Titanic Competition: Machine Learning from Disaster Decision Tree Sex == Female Age > 16 Age > 50 Fare > 100 T T F T F
- 28. Titanic Competition: Machine Learning from Disaster Decision Tree Random Forest Sex == Female Age > 16 Age > 50 T T T F T T Fare > 100 F F T F T T T F T F T T T F T F F T F
- 29. Demo time!
- 30. . . . . . .
- 31. Data Science a 1 0 1 1 0 The R language end 0 Ag 0 11 0 1 Gimme some Java! 1 0 1
- 32. Bridging R and Java Integrate Assimilate Replace
- 33. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
- 34. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
- 35. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet... // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')");
- 36. Big Data?
- 37. JVM Libraries/platforms Replace
- 38. Scalable R distributions (non-JVM) Replace Revolution Analytics Oracle Enterprise R
- 39. Wr apup Data Science 1 0 The R language 0 1 1 1 0 Gimme some Java! 1 0 1 0 11 0
- 40. Sanitize Explore Model Predict Scale
- 41. Next steps Install R Read Computing for Data Analysis starts Jan. 6th 2014
- 42. Qu esti ons ? Data Science The R language @Sander_Mak 0 0 1 Gimme some Java! 0 1 1 11 0 00 1 1 1 branchandbound.net

