Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Science With R
~ for ~
Java Developers
@Sander_Mak
Agenda
Data Science
The R language
Gimme some Java!
1
1
1
1
11
1
1
0
0
0
0
0
0
90% of the world’s data was
produced in the last 2 years
- SINTEF/ScienceDaily June 2013!!!!!!!!
We need more than
just CR...
Stand back.
I know Data Science!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahe...
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahe...
Data Science:
Achievement Unlocked
R, R-Studio
Today
Data Science:
Achievement Unlocked
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Language
Designers
Statisticians
Language
Designers?
Statisticians?
The best thing about R is that it was developed by statisticians. The
worst thing about...
Why R, then?
Open Source
De-facto standard (in statistical research)
“It’s a DSL posing as general purpose language”
Inter...
Why not R, then?
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Why not R, then?
‘If you are using R and you think
you’re in hell, this is a map for you.’
- The R Inferno
Slow
Memory Bou...
Apparently, statisticians aren’t designers, either...
VS
Dynamic (eval)
Interpreted
Static types
Compiled
Functional/OO/Procedural OO
Factor Enum
numeric
character String
Integer/Double/...
Factor Enum
numeric
character String
vector
list
dataframe
Integer/Double/...
1-based 0-based1
2
3
4
0
1
2
3
1-based 0-based1
2
3
4
0
1
2
3
for-loops
higher-order functions
sapply(vec, function(elm) {
elm + 1;
})
eager evalutationlazy evaluation
eager evalutationlazy evaluation
pass-by-value
(copy-on-write)
pass-by-reference
Function F
Value A Value A
Value A’
call ...
Studio
Central
Comprehensive
R
Archive
Network
Studio
Coding time!
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Fare > 100
T FT T F
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Random Forest
Fare > 100
...
Demo time!
...
...
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Bridging R and Java
Integrate
Assimilate
Replace
rJava & Java/R interface
Integrate
Two way native interface
- JNI: libjri
- or TCP to RServe
Rengine re = new Rengine(new ...
Assimilate
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Assimilate
// create a script engine manager
ScriptEngineManager factory =
new ScriptEngineManager();
// create an R engin...
Reimplementation of R on JVM
Share data:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data...
Reimplementation of R on JVM
Share data:
import(com.foo.User)
# instantiate Java beans
tim <- User$new(name='Tim', age=23)...
Big Data?
Replace
JVM Libraries/platforms
Replace
Scalable R distributions
(non-JVM)
Revolution Analytics
Oracle Enterprise R
Wrap-up
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Sanitize
Explore
Model Predict
Scale
Next steps
Computing for Data Analysis
starts Sept. 23rd
Install R Read
Questions?
Data Science
The R language
Gimme some Java!
1
1
1 1
1
1
110
0
0
0
0
0
@Sander_Mak
branchandbound.net
Upcoming SlideShare
Loading in …5
×

Data Science with R for Java developers

24,357 views

Published on

As presented at JavaOne 2013

Published in: Technology, Education
  • Are you looking for IT Training with job placements? Search more than 5000 IT Certified Consultants here Register IT Courses at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi Sander, its a nice and thought provoking presentation, now i'm in the process of hiring Java developers freshers in line to Hadoop setup. would it require to hire R programmers for analytics or Java developers instead ? please assist me on this to go forward.

    Thank you and look forward to read from you at the earliest.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Science with R for Java developers

  1. 1. Data Science With R ~ for ~ Java Developers @Sander_Mak
  2. 2. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  3. 3. 90% of the world’s data was produced in the last 2 years - SINTEF/ScienceDaily June 2013!!!!!!!! We need more than just CRUD
  4. 4. Stand back. I know Data Science!
  5. 5. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  6. 6. Software Engineering Domain Expertise Math & Statistics Data Science Machine Learning Operations Research Danger! Perl ahead!
  7. 7. Data Science: Achievement Unlocked
  8. 8. R, R-Studio Today Data Science: Achievement Unlocked
  9. 9. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  10. 10. Language Designers Statisticians
  11. 11. Language Designers? Statisticians? The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google
  12. 12. Why R, then? Open Source De-facto standard (in statistical research) “It’s a DSL posing as general purpose language” Interactive data exploration
  13. 13. Why not R, then? Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  14. 14. Why not R, then? ‘If you are using R and you think you’re in hell, this is a map for you.’ - The R Inferno Slow Memory Bound (Did I mention it’s a quirky language?) Try googling for R...
  15. 15. Apparently, statisticians aren’t designers, either...
  16. 16. VS
  17. 17. Dynamic (eval) Interpreted Static types Compiled Functional/OO/Procedural OO
  18. 18. Factor Enum numeric character String Integer/Double/...
  19. 19. Factor Enum numeric character String vector list dataframe Integer/Double/...
  20. 20. 1-based 0-based1 2 3 4 0 1 2 3
  21. 21. 1-based 0-based1 2 3 4 0 1 2 3 for-loops higher-order functions sapply(vec, function(elm) { elm + 1; })
  22. 22. eager evalutationlazy evaluation
  23. 23. eager evalutationlazy evaluation pass-by-value (copy-on-write) pass-by-reference Function F Value A Value A Value A’ call F(A) modify
  24. 24. Studio
  25. 25. Central Comprehensive R Archive Network Studio
  26. 26. Coding time!
  27. 27. Titanic Competition: Machine Learning from Disaster
  28. 28. Titanic Competition: Machine Learning from Disaster
  29. 29. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Fare > 100 T FT T F
  30. 30. Titanic Competition: Machine Learning from Disaster Sex == Female Decision Tree Age > 50Age > 16 Random Forest Fare > 100 T FT T F T FT T FT FT T F T FT T FT FT T F
  31. 31. Demo time!
  32. 32. ... ...
  33. 33. Agenda Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  34. 34. Bridging R and Java Integrate Assimilate Replace
  35. 35. rJava & Java/R interface Integrate Two way native interface - JNI: libjri - or TCP to RServe Rengine re = new Rengine(new String[] {}, false, null); // wait until engine is ready if (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”); } re.eval("data(cars)", false); REXP cars = re.eval("cars"); RVector carsVector = cars.asVector(); // dissect carsVector...
  36. 36. Assimilate Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  37. 37. Assimilate // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello from R')"); Reimplementation of R on JVM Fast & lean Parallelized Just-another-lib ... not production ready yet...
  38. 38. Reimplementation of R on JVM Share data: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  39. 39. Reimplementation of R on JVM Share data: import(com.foo.User) # instantiate Java beans tim <- User$new(name='Tim', age=23) tom <- User$new(name='Tom', age=45) # invoke setter tim$name <- "Timmy" Use Java from Renjin: Integer[] data = {1, 2, 3}; engine.put("data", data); engine.eval("print(sum(data))"); Assimilate
  40. 40. Big Data?
  41. 41. Replace JVM Libraries/platforms
  42. 42. Replace Scalable R distributions (non-JVM) Revolution Analytics Oracle Enterprise R
  43. 43. Wrap-up Data Science The R language Gimme some Java! 1 1 1 1 11 1 1 0 0 0 0 0 0
  44. 44. Sanitize Explore Model Predict Scale
  45. 45. Next steps Computing for Data Analysis starts Sept. 23rd Install R Read
  46. 46. Questions? Data Science The R language Gimme some Java! 1 1 1 1 1 1 110 0 0 0 0 0 @Sander_Mak branchandbound.net

×