SlideShare a Scribd company logo
DataRobot R Package
Ron Pearson
DataRobot
Today’s agenda:
1. What is DataRobot?
a. A Boston-based software company
b. A massively parallel modeling engine
c. An R package (today’s focus)
2. An example to demonstrate the R package:
a. Predicting compressive strength of concrete
b. Shows typical DataRobot modeling project
c. Demonstrates partial dependence plots
What is DataRobot?
1: Boston-based software company
2: Massively parallel modeling engine
- On multiple hardware platforms
- Under various operating systems
- Has many software components:
- R
- Python
- … and various others
API server
3: R API client -
the DataRobot
R package
A few key details:
➢ Package in the final stages of internal testing
➢ Will be released via R-Forge under the MIT license
➢ Two vignettes are available with the package:
■ “Introduction to the DataRobot R package”
■ “Interpreting Predictive Models, from Linear Regression to
Machine Learning Ensembles”
Today’s predictive modeling example:
● Objective: predict the compressive strength of concrete
● Data source: ConcreteFrame
● Each record describes one laboratory concrete sample
● Target variable: Strength
● Prediction variables: Age + 7 composition variables
Creating the DataRobot modeling project
1. Upload the modeling dataframe:
> MyDRProject <- SetupProject(ConcreteFrame,"ConcreteProject")
2. Specify the target variable to be predicted and start building models:
> StartAutopilot(Target="Strength", Project = MyDRProject)
3. Add a custom R model to the project:
> CreateCustomModel(MyDRProject, LogAgeFit, LogAgePredict, “R:LinearWithLogAge”)
4. Retrieve the leaderboard summarizing all project models:
> ConcreteLeaderboard <- GetAllModels(MyDRProject)
Function arguments for CreateCustomModel
LogAgeFit <- function(response, data, extras=NULL){
#
DF <- cbind.data.frame(data,
CustomModelResponseY = response)
model <- lm(CustomModelResponseY ~ . - Age +
log(Age), data = DF)
return(model)
}
LogAgePredict <- function(model,data){
predictions <- predict(model, newdata = data)
return(predictions)
}
Model fitting function
Prediction function
ConcreteLeaderboard: the poorer models (ranks 16-31)
Same Models,
Different Preprocessing
ConcreteLeaderboard: the better models (ranks 1-15)
Ensemble Models
Understanding complicated models
● Linear regression models - easily explained via coefficients
● Not true for random forests, boosted trees, SVMs, etc.
● Alternative: partial dependence plots (Friedman 2001)
○ To assess dependence on xj
, average the predictions over all other covariates
○ Compute this average for a representative range of xj
values and plot
○ Linear model: plot is straight line with slope equal to coefficient of xj
Partial dependence plots for Age
Partial dependence plots for Water
Importance from permutations: large RMSE increase => important variable
Large average Age shift
Even larger Age shift for ENET Blender
Summary of the concrete strength example
● Best model = ENET Blender
○ Age dependence shows nonlinear hard saturation behavior
○ Water dependence is nonlinear and non-monotonic
● Combining with variable importance results:
○ Average measures: Age > Cement > Blast Furnace Slag > Water
○ ENET Blender measures: Age > Water > Cement > Blast Furnace Slag
● Best model captures Age saturation behavior and nonmonotonic
Water dependence that other models can’t
Note the novel water dependence for the best model
Questions?
Ronald K. Pearson | Data Scientist @DataRobot | ron@datarobot.com
Slides available at: bit.ly/UseR2015
Vignette - Intro to DataRobot: bit.ly/1dzgppC
Vignette - Two Applications: bit.ly/1KtWCGI
Want to use the DataRobot
R package?
Email us at:
useR2015@datarobot.com

More Related Content

What's hot

Ad science bid simulator (public ver)
Ad science bid simulator (public ver)Ad science bid simulator (public ver)
Ad science bid simulator (public ver)
Marsan Ma
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
Saad Elbeleidy
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
Ramesh Sampath
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
BigML, Inc
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
Marsan Ma
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
Owen Zhang
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine Learning
SolivarLabs
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
Venkata Reddy Konasani
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
Marsan Ma
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc
 
BSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic RegressionsBSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic Regressions
BigML, Inc
 
Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
Jaroslaw Szymczak
 
Applications of data structures
Applications of data structuresApplications of data structures
Applications of data structures
Wipro
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Eugene Yan Ziyou
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
BigML, Inc
 

What's hot (20)

Ad science bid simulator (public ver)
Ad science bid simulator (public ver)Ad science bid simulator (public ver)
Ad science bid simulator (public ver)
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine Learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
BSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic RegressionsBSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic Regressions
 
Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
 
Applications of data structures
Applications of data structuresApplications of data structures
Applications of data structures
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 

Viewers also liked

Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingHackerEarth
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
Jeong-Yoon Lee
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
Domino Data Lab
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarthHackerEarth
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
HackerEarth
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
Joe Kleinwaechter
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
HackerEarth
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
HackerEarth
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
HackerEarth
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
Jeong-Yoon Lee
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
DataRobot
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
HackerEarth
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
Sharat Chikkerur
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
HJ van Veen
 
Tda presentation
Tda presentationTda presentation
Tda presentation
HJ van Veen
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
HackerEarth
 

Viewers also liked (20)

Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
 
Work - LIGHT Ministry
Work - LIGHT MinistryWork - LIGHT Ministry
Work - LIGHT Ministry
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
 

Similar to DataRobot R Package

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directions
openseesdays
 
Recent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor ModelRecent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor Model
Sandia National Laboratories: Energy & Climate: Renewables
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
dimensional analysis
dimensional analysisdimensional analysis
dimensional analysis
RESHMAFEGADE
 
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stackNathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
PyData
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial design
Shinwoo Jang
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
Luc Lesoil
 
Sampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective MissionsSampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective Missions
Md Mahbubur Rahman
 
Model Transformation Reuse
Model Transformation ReuseModel Transformation Reuse
Model Transformation Reuse
miso_uam
 
9781285852744 ppt ch09
9781285852744 ppt ch099781285852744 ppt ch09
9781285852744 ppt ch09
Terry Yoast
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
Hector Alberto Cerdan Arteaga
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Sets
kylelutz
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
Joel Falcou
 

Similar to DataRobot R Package (20)

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directions
 
Recent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor ModelRecent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor Model
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
dimensional analysis
dimensional analysisdimensional analysis
dimensional analysis
 
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stackNathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial design
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
 
Sampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective MissionsSampling-Based Planning Algorithms for Multi-Objective Missions
Sampling-Based Planning Algorithms for Multi-Objective Missions
 
Model Transformation Reuse
Model Transformation ReuseModel Transformation Reuse
Model Transformation Reuse
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
9781285852744 ppt ch09
9781285852744 ppt ch099781285852744 ppt ch09
9781285852744 ppt ch09
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Sets
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

DataRobot R Package

  • 1. DataRobot R Package Ron Pearson DataRobot
  • 2. Today’s agenda: 1. What is DataRobot? a. A Boston-based software company b. A massively parallel modeling engine c. An R package (today’s focus) 2. An example to demonstrate the R package: a. Predicting compressive strength of concrete b. Shows typical DataRobot modeling project c. Demonstrates partial dependence plots
  • 3. What is DataRobot? 1: Boston-based software company 2: Massively parallel modeling engine - On multiple hardware platforms - Under various operating systems - Has many software components: - R - Python - … and various others API server 3: R API client - the DataRobot R package
  • 4.
  • 5. A few key details: ➢ Package in the final stages of internal testing ➢ Will be released via R-Forge under the MIT license ➢ Two vignettes are available with the package: ■ “Introduction to the DataRobot R package” ■ “Interpreting Predictive Models, from Linear Regression to Machine Learning Ensembles”
  • 6. Today’s predictive modeling example: ● Objective: predict the compressive strength of concrete ● Data source: ConcreteFrame ● Each record describes one laboratory concrete sample ● Target variable: Strength ● Prediction variables: Age + 7 composition variables
  • 7. Creating the DataRobot modeling project 1. Upload the modeling dataframe: > MyDRProject <- SetupProject(ConcreteFrame,"ConcreteProject") 2. Specify the target variable to be predicted and start building models: > StartAutopilot(Target="Strength", Project = MyDRProject) 3. Add a custom R model to the project: > CreateCustomModel(MyDRProject, LogAgeFit, LogAgePredict, “R:LinearWithLogAge”) 4. Retrieve the leaderboard summarizing all project models: > ConcreteLeaderboard <- GetAllModels(MyDRProject)
  • 8. Function arguments for CreateCustomModel LogAgeFit <- function(response, data, extras=NULL){ # DF <- cbind.data.frame(data, CustomModelResponseY = response) model <- lm(CustomModelResponseY ~ . - Age + log(Age), data = DF) return(model) } LogAgePredict <- function(model,data){ predictions <- predict(model, newdata = data) return(predictions) } Model fitting function Prediction function
  • 9. ConcreteLeaderboard: the poorer models (ranks 16-31) Same Models, Different Preprocessing
  • 10. ConcreteLeaderboard: the better models (ranks 1-15) Ensemble Models
  • 11. Understanding complicated models ● Linear regression models - easily explained via coefficients ● Not true for random forests, boosted trees, SVMs, etc. ● Alternative: partial dependence plots (Friedman 2001) ○ To assess dependence on xj , average the predictions over all other covariates ○ Compute this average for a representative range of xj values and plot ○ Linear model: plot is straight line with slope equal to coefficient of xj
  • 14. Importance from permutations: large RMSE increase => important variable Large average Age shift Even larger Age shift for ENET Blender
  • 15. Summary of the concrete strength example ● Best model = ENET Blender ○ Age dependence shows nonlinear hard saturation behavior ○ Water dependence is nonlinear and non-monotonic ● Combining with variable importance results: ○ Average measures: Age > Cement > Blast Furnace Slag > Water ○ ENET Blender measures: Age > Water > Cement > Blast Furnace Slag ● Best model captures Age saturation behavior and nonmonotonic Water dependence that other models can’t
  • 16. Note the novel water dependence for the best model
  • 18. Ronald K. Pearson | Data Scientist @DataRobot | ron@datarobot.com Slides available at: bit.ly/UseR2015 Vignette - Intro to DataRobot: bit.ly/1dzgppC Vignette - Two Applications: bit.ly/1KtWCGI Want to use the DataRobot R package? Email us at: useR2015@datarobot.com