Scalable Prediction Services with R
#RstatsNYC @Socure
• Real-time fraud detection service using social and online data.
• Predictive R models.
• Latency SLA with customers.
• Model versioning.
• Zero-downtime updates.
#RstatsNYC @Socure
Challenges
• R not dev-ops friendly.
• Enterprise prediction services a large commitment.
• Enterprise prediction services offer limited model types.
• Transferability and transparency of models.
• Vendor lock-in.
#RstatsNYC @Socure
Solution
• Embed R models within dev-op friendly middleware.
• Management, deployment, integration leverages existing dev-op
processes.
• Service scaling using established strategies and methods.
#RstatsNYC @Socure
<file>
gen_20150215.rds
saveRDS()
#RstatsNYC @Socure
<model>
name = generic
version = 20150215
<file>
gen_20150215.rds
readRDS()
saveRDS()
#RstatsNYC @Socure
Rook
http://…./model/20150215
<model>
name = generic
version = 20150215
name version
model 20150215
Model Map
<file>
gen_20150215.rds
readRDS()
saveRDS()
#RstatsNYC @Socure
Rook
http://…./model/20150215
<model>
name = generic
version = 20150215
name version
model 20150215
Model Map
predict()
<file>
gen_20150215.rds
readRDS()
saveRDS()
JSON
#RstatsNYC @Socure
POST generic/20150215
Rook Rook Rook Rook
fork()
……..
#RstatsNYC @Socure
pmml
http://…./generic/20150215
org.jpmml.evaluator
ModelEvaluatordoPost()
Servlet
evaluate()
unmarshalPMML()
pmml.gbm()
#RstatsNYC @Socure
ServletServletServletServlet
POST generic/20150215
……..
#RstatsNYC @Socure
Virtual Machine
Docker Public Repository ECS
ElasticBeanStalk
R R RR R R
#RstatsNYC @Socure
http://…./generic/20150215
ElasticBeanStalk
Prediction
Service
Prediction
ServicePrediction
Service
US-EAST-1A
Prediction
Service
Prediction
ServicePrediction
Service
US-EAST-1A
Prediction
Service
Prediction
ServicePrediction
Service
US-EAST-1A
#RstatsNYC @Socure
#RstatsNYC @Socure
#RstatsNYC @Socure
#RstatsNYC @Socure
Conclusions
• Rapid deployment of R models in a scalable robust environment.
• Directly leverage R models developed by data scientists and
analysts.
• Apply existing dev-ops processes for testing, monitoring, scaling,
alerting of predictive models.
• Possible use of PMML to serialize models in future for compliance.
#RstatsNYC @Socure
GitHub
https://github.com/Socure/moduleR
#RstatsNYC @Socure
We’re Hiring
http://www.socure.com/hiring
Director of Data Science
Senior Data Scientist
Director of Engineering

Building Scalable Prediction Services in R