Data Science with R for Java developers

Data Science With R
~ for ~
Java Developers
@Sander_Mak
Agenda
Data Science
The R language
Gimme some Java!
1
1
1
1
11
1
1
0
0
0
0
0
0
90% of the world’s data was
produced in the last 2 years
- SINTEF/ScienceDaily June 2013!!!!!!!!
We need more than
just CRUD
Stand back.
I know Data Science!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Data Science:
Achievement Unlocked
R, R-Studio
Today
Data Science:
Achievement Unlocked
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Language
Designers
Statisticians
Language
Designers?
Statisticians?
The best thing about R is that it was developed by statisticians. The
worst thing about R is that... it was developed by statisticians.
- Bo Cowgill, Google
Why R, then?
Open Source
De-facto standard (in statistical research)
“It’s a DSL posing as general purpose language”
Interactive data exploration
Why not R, then?
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Why not R, then?
‘If you are using R and you think
you’re in hell, this is a map for you.’
- The R Inferno
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Apparently, statisticians aren’t designers, either...
VS
Dynamic (eval)
Interpreted
Static types
Compiled
Functional/OO/Procedural OO
Factor Enum
numeric
character String
Integer/Double/...
Factor Enum
numeric
character String
vector
list
dataframe
Integer/Double/...
1-based 0-based1
2
3
4
0
1
2
3
1-based 0-based1
2
3
4
0
1
2
3
for-loops
higher-order functions
sapply(vec, function(elm) {
elm + 1;
})
eager evalutationlazy evaluation
eager evalutationlazy evaluation
pass-by-value
(copy-on-write)
pass-by-reference
Function F
Value A Value A
Value A’
call F(A) modify
Studio
Central
Comprehensive
R
Archive
Network
Studio
Coding time!
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Fare > 100
T FT T F
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Random Forest
Fare > 100
T FT T F
T
FT T FT
FT T F
T
FT T FT
FT T F
Demo time!
...
...
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Bridging R and Java
Integrate
Assimilate
Replace
rJava & Java/R interface
Integrate
Two way native interface
- JNI: libjri
- or TCP to RServe
Rengine re = new Rengine(new String[] {}, false, null);
// wait until engine is ready
if (!re.waitForR()) {
throw new IllegalStateException(“Can’t load R engine”);
}
re.eval("data(cars)", false);
REXP cars = re.eval("cars");
RVector carsVector = cars.asVector();
// dissect carsVector...
Assimilate
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Assimilate
// create a script engine manager
ScriptEngineManager factory =
new ScriptEngineManager();
// create an R engine
ScriptEngine engine =
factory.getEngineByName("Renjin");
// load package from classpath
engine.eval(“library(survey)");
// evaluate R code from String
engine.eval("print('Hello from R')");
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Reimplementation of R on JVM
Share data:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Reimplementation of R on JVM
Share data:
import(com.foo.User)
# instantiate Java beans
tim <- User$new(name='Tim', age=23)
tom <- User$new(name='Tom', age=45)
# invoke setter
tim$name <- "Timmy"
Use Java from Renjin:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Big Data?
Replace
JVM Libraries/platforms
Replace
Scalable R distributions
(non-JVM)
Revolution Analytics
Oracle Enterprise R
Wrap-up
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Sanitize
Explore
Model Predict
Scale
Next steps
Computing for Data Analysis
starts Sept. 23rd
Install R Read
Questions?
Data Science
The R language
Gimme some Java!
1
1
1 1
1
1
110
0
0
0
0
0
@Sander_Mak
branchandbound.net
1 of 46

More Related Content

More from Sander Mak (@Sander_Mak)(20)

Modules or microservices?Modules or microservices?
Modules or microservices?
Sander Mak (@Sander_Mak)5.9K views
Migrating to Java 9 ModulesMigrating to Java 9 Modules
Migrating to Java 9 Modules
Sander Mak (@Sander_Mak)5.5K views
Java 9 Modularity in ActionJava 9 Modularity in Action
Java 9 Modularity in Action
Sander Mak (@Sander_Mak)8.5K views
Java modularity: life after Java 9Java modularity: life after Java 9
Java modularity: life after Java 9
Sander Mak (@Sander_Mak)4.4K views
Provisioning the IoTProvisioning the IoT
Provisioning the IoT
Sander Mak (@Sander_Mak)4.4K views
Event-sourced architectures with AkkaEvent-sourced architectures with Akka
Event-sourced architectures with Akka
Sander Mak (@Sander_Mak)27.8K views
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the pain
Sander Mak (@Sander_Mak)16.9K views
The Ultimate Dependency Manager Shootout (QCon NY 2014)The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)
Sander Mak (@Sander_Mak)1.5K views
Modular JavaScriptModular JavaScript
Modular JavaScript
Sander Mak (@Sander_Mak)4.1K views
Modularity in the CloudModularity in the Cloud
Modularity in the Cloud
Sander Mak (@Sander_Mak)2.6K views
Scala & Lift (JEEConf 2012)Scala & Lift (JEEConf 2012)
Scala & Lift (JEEConf 2012)
Sander Mak (@Sander_Mak)2.5K views
Hibernate Performance Tuning (JEEConf 2012)Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)
Sander Mak (@Sander_Mak)9.4K views
Akka (BeJUG)Akka (BeJUG)
Akka (BeJUG)
Sander Mak (@Sander_Mak)1.8K views
Fork Join (BeJUG 2012)Fork Join (BeJUG 2012)
Fork Join (BeJUG 2012)
Sander Mak (@Sander_Mak)2.1K views
Fork/Join for Fun and Profit!Fork/Join for Fun and Profit!
Fork/Join for Fun and Profit!
Sander Mak (@Sander_Mak)1.7K views
Kscope11 recapKscope11 recap
Kscope11 recap
Sander Mak (@Sander_Mak)636 views
Java 7: Fork/Join, Invokedynamic and the futureJava 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the future
Sander Mak (@Sander_Mak)1.5K views
Scala and LiftScala and Lift
Scala and Lift
Sander Mak (@Sander_Mak)7.6K views
Elevate your webapps with Scala and LiftElevate your webapps with Scala and Lift
Elevate your webapps with Scala and Lift
Sander Mak (@Sander_Mak)3.2K views

Recently uploaded(20)

Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views

Data Science with R for Java developers