Your SlideShare is downloading. ×
Data Science with R for Java Developers
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Science with R for Java Developers

1,263
views

Published on

Published in: Technology, Education

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,263
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. a D ci S ta i W ce en ~ R th r~ fo va Ja rs pe elo ev D S @ ak _M er nd a
  • 2. Data Science 1 0 1 0 a 1 0 1 1 The R language end 0 Ag 0 1 10 1 Gimme some Java!
  • 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
  • 4. Stand back. I know Data Science!
  • 5. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
  • 6. Hacking Skills Math & Statistics Machine Learning Data Science Danger! Perl ahead! Operations Research Domain Expertise
  • 7. Data Science: Achievement Unlocked
  • 8. Data Science: Achievement Unlocked To da y R, R-Studio
  • 9. Ag end a Data Science 1 0 1 0 The R language 0 1 0 1 1 0 11 0 1 Gimme some Java!
  • 10. Language Designers? Statisticians?
  • 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
  • 12. Why R, then? De-facto standard (in statistical research) Open Source Interactive data exploration “It’s a DSL posing as general purpose language”
  • 13. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?)
  • 14. Why not R, then? Slow Memory Bound Try googling for R... (Did I mention it’s a quirky language?) ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno
  • 15. Apparently, statisticians aren’t designers, either...
  • 16. VS
  • 17. Functional/OO/Procedural Dynamic (eval) Interpreted OO Static types Compiled
  • 18. numeric character Factor Integer/Double/... String Enum
  • 19. vector list dataframe numeric character Factor Integer/Double/... String Enum
  • 20. 1 2 3 4 1-based 0 1 2 3 0-based
  • 21. higher-order functions sapply(vec, function(elm) { elm + 1; }) 1 2 3 4 1-based 0 1 2 3 0-based for-loops
  • 22. Studio
  • 23. Studio Comprehensive R Archive Network Central
  • 24. Coding time!
  • 25. Titanic Competition: Machine Learning from Disaster
  • 26. Titanic Competition: Machine Learning from Disaster Survived?
  • 27. Titanic Competition: Machine Learning from Disaster Decision Tree Sex == Female Age > 16 Age > 50 Fare > 100 T T F T F
  • 28. Titanic Competition: Machine Learning from Disaster Decision Tree Random Forest Sex == Female Age > 16 Age > 50 T T T F T T Fare > 100 F F T F T T T F T F T T T F T F F T F
  • 29. Demo time!
  • 30. . . . . . .
  • 31. Data Science a 1 0 1 1 0 The R language end 0 Ag 0 11 0 1 Gimme some Java! 1 0 1
  • 32. Bridging R and Java Integrate Assimilate Replace
  • 33. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
  • 34. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  • 35. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet... // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')");
  • 36. Big Data?
  • 37. JVM Libraries/platforms Replace
  • 38. Scalable R distributions (non-JVM) Replace Revolution Analytics Oracle Enterprise R
  • 39. Wr apup Data Science 1 0 The R language 0 1 1 1 0 Gimme some Java! 1 0 1 0 11 0
  • 40. Sanitize Explore Model Predict Scale
  • 41. Next steps Install R Read Computing for Data Analysis starts Jan. 6th 2014
  • 42. Qu esti ons ? Data Science The R language @Sander_Mak 0 0 1 Gimme some Java! 0 1 1 11 0 00 1 1 1 branchandbound.net

×