SlideShare a Scribd company logo
1 of 23
Download to read offline
May 2016
Dan Steinberg
Mikhail Golovnya
Salford Systems
On the Cutting Edge of Technology
Salford Systems: Company Overview
• Founded 1983 by Dan Steinberg
o First products were SAS® procedures: MLOGIT,
MPROBIT
• 1993 CART Version 1.0
• 1994-1997 Extensive data mining consulting
• 1999 MARS Version 1.0
• 2000 CART 4.0 MARS 2.0
• 2002 CART 5.0 TreeNet 1.0
• 2004 Random Forests
• 2005 TreeNet 2.0
• 2006 CART 6.0
Salford Systems ©2014 Company and Product Line Overview 2
Recent Product Line Development
• 2008-2013 Series of Products
o CART® 6.0 in Standard, EX, and PRO versions
o TreeNet® 2.0 PRO EX version
o MARS® 3.0 with Time Series Support
o RandomForests® 2.0
o Generalized Path Seeker
• 2014 Data Mining Suite (integrates all tools)
o SPM 7.0
o Multithreading and modular design
o Engine APIs
o Ongoing work on SPM 8.0 and Big Data approaches
Salford Systems ©2014 Company and Product Line Overview 3
Salford Competitive Awards
• 2010 Direct Marketing Association. 1st place*
• 2009 KDDCup IDAnalytics*, and FEG Japan* 1st Runner Up
• 2008 DMA Direct Marketing Association 1st Runner Up
• 2007 Pacific Asia PAKDD: Credit Card Cross Sell. 1st place
• 2006 DMA Direct Marketing Association: Predictive Modeling*
• 2006 PAKDD Pacific Asia KDD: Telco Customer Type Profiling
• 2005 BI-Cup Latin America: Predictive Modeling E-commerce* 1st place
• 2004 KDDCup: Predictive Modeling ‘Most Accurate”*
• 2002 NCR/Teradata Duke University: Predictive Modeling-Churn
o all four separate predictive modeling challenges 1st place
• 2000 KDDCup: Predictive Modeling- Online behavior 1st place
• 2000 KDDCup: CRM Analysis 1st place
Salford Systems ©2014 Company and Product Line Overview
*Won either by Salford or by client using Salford tools
4
Salford Systems: R&D Staff and Partners
• Dan Steinberg, Ph.D Econometrics, Harvard (CART, MARS,
Discrete Choice)
• Nicholas Scott Cardell, PhD Econometrics, Harvard (Data
Mining, Discrete Choice)
• Jerome H. Friedman, Stanford University (algorithm coder
CART, MARS, Treenet, HotSpotDetector)
• Leo Breiman, UC Berkeley (algorithm developer,
ensembles of trees, randomization techniques to improve
trees)
• Richard Olshen, Stanford University (Survival CART, Tree-
Based Clustering)
• Charles Stone, UC Berkeley (CART large sample theory)
• Richard Carson, UC San Diego, Visualization Methods,
Super Computer methods
Salford Systems ©2014 Company and Product Line Overview 5
Introduction to SPM Salford Systems ©2014
Machine Learning Defined
• Machine Learning is the search for patterns
in data using modern highly automated,
computer intensive methods
o Data mining may be best defined as the use of a specific class of
tools (data mining methods) in the analysis of data
o The term “search” is key to this definition, as is “automated”
• The literature often refers to finding hidden
information in data
• We will focus on patterns that allow us to
accomplish two tasks:
o Classification
o Regression
• There is also a third common task
o Finding groups in data
(clustering, density estimation)
This is known as
“supervised learning”
This is known as
“unsupervised learning”
6
Introduction to SPM Salford Systems ©2014
The Essence of Machine Learning
• In a nutshell: Use historical data to gain insights
and/or make predictions on the new data
Population
Analyst
Model
Scoring
DM Engine
Historical
Data
New Data
Insights
Predictions
7
Boston Housing Data Set
• Concerns the housing values in Boston area
• Harrison, D. and D. Rubinfeld. Hedonic Prices and the
Demand For Clean Air. Journal of Environmental
Economics and Management, v5, 81-102 , 1978
• Combined information from 10 separate governmental
and educational sources to produce this data set
• 506 census tracts in City of Boston for the year 1970
o Goal: study relationship between quality of life variables and property values
o MV median value of owner-occupied homes in tract ($1,000’s)
o CRIM per capita crime rates
o NOX concentration of nitric oxides (pp 10 million)
o AGE percent built before 1940
o DIS weighted distance to centers of employment
o RM average number of rooms per house
o LSTAT % lower status of the population
o RAD accessibility to radial highways
o CHAS borders Charles River (0/1)
o INDUS percent non-retail business
o TAX property tax rate per $10,000
o PT pupil teacher ratio
Introduction to SPM Salford Systems ©2014 8
Target: Median House Value (MV)
The distribution of the target variable (in thousands $)
Clear manifestation of the inflation over the past 40 years
Introduction to SPM Salford Systems ©2014 9
• The data violates all conventional modeling assumptions
• Clearly some non-normal distributions and non-linear
relationships
Mutual Dependency
Introduction to SPM Salford Systems ©2014 10
OLS Regression
• OLS – ordinary least squares regression
o Discovered by Legendre (1805) and Gauss (1809) to solve problems in
astronomy using pen and paper
o Solid statistical foundation by Fisher in 1920s
o 1950s – use of electro-mechanical calculators
• The model is always of the form
• The response surface is a hyper-plane!
• A – the intercept term
• B1, B2, B3, … – parameter estimates
• A usually unique combination of values exists which
minimizes the mean squared error of predictions on the
learn sample
• Step-wise approaches to determine model size
Response = A + B1X1 + B2X2 + B3X3 + …
Introduction to SPM Salford Systems ©2014 11
OLS on Boston Data
• 414 records in the
learn sample
• 92 records in the
test sample
• Good agreement
o LEARN MSE = 27.455
o TEST MSE = 26.147
3-variable
Solution
-0.597 +5.247
-0.858
Introduction to SPM Salford Systems ©2014 12
Unique Personalities –
the “Founding Fathers” of CART
Leo Breiman
Jerome Friedman
Richard Olshen
Charles Stone
Salford Systems ©2014 Company and Product Line Overview 13
1984 CART Monograph
© Copyright Salford Systems 1999-2015
Introduction to SPM Salford Systems ©2014
15
Regression Tree Model
• All cases in the given node are assigned the same
predicted response – the node average of the original
target
• Nodes are color-coded according to the predicted
response
• We have a convenient segmentation of the population
according to the average response levels
Introduction to SPM Salford Systems ©2014
16
The Best and the Worst Segments
Gradient Boosting
• Begin with a very small tree as initial model
• Compute “residuals” (prediction errors) for this
simple model for every record in data
• Grow a second small tree to predict the residuals
from the first tree
• Compute residuals from this new 2-tree model
and grow a 3rd tree to predict revised residuals
• Repeat this process to grow a sequence of trees
+ + + …
Tree 1 Tree 2 Tree 3 More trees
Salford Systems ©2014 Company and Product Line Overview 17
Illustration: Saddle Function
• 500 {X1,X2} points randomly drawn from a [-3,+3] box to produce
the XOR response surface Y = X1 * X2
• Will use 3-node trees to show the evolution of TreeNet response
surface
Salford Systems ©2014 Company and Product Line Overview
1 Tree 2 Trees 3 Trees 4 Trees 10 Trees
20 Trees 30 Trees 40 Trees 100 Trees 195 Trees
18
Delinquency Dataset
VARIABLE DESCRIPTION
DELINQUENT Person experienced 90 days past due delinquency or
worse
AGE Age of borrower in years
DEBT_RATIO Monthly debt payments, alimony, living costs divided
by monthly gross income
MONTH_INCOME Monthly income
N_OPEN_LINES Number of open loans (mortgages, car loans, credit
cards, etc.)
N_MORTGAGES Number of mortgage and real estate loans
N_DEPENDENTS Number of dependents in family excluding yourself
Salford Systems ©2015 Advanced Uses of SPM 19
• TreeNet
TreeNet Model for Delinquency
Salford Systems ©2015 Advanced Uses of SPM 20
• Logistic Regression
• Here we show the performance of the 6-variable
TreeNet model compared to the performance of the
equivalent Logistic Regression model
• TreeNet has a clear edge of 5 points over the Logistic
Regression in terms of ROC-area!
Gathering Up Transformations
• TreeNet provides powerful insights into the inner workings
of the model by constructing partial dependence plots
• We can now use the partial dependence plots to
construct 1-st and 2-nd order univariate spline
transformations
• The resulting transforms are added to the dataset and
the code is saved for future use
Salford Systems ©2015 Advanced Uses of SPM 21
Enhanced Logistic Regression Model
• We can now build a logistic regression model using the
transformed features
• Our new model almost completely recovers performance
of the original unconstrained TreeNet model!
• This is because the data exhibit virtually no interactions
which can be easily confirmed by building a constrained
additive model in TreeNet
Salford Systems ©2015 Advanced Uses of SPM 22
Salford Predictive Modeler SPM
• Download a current version from our website
http://www.salford-systems.com
• Version will run without a license key for 10-days
• Request a license key from
unlock@salford-systems.com
• Request configuration to meet your needs
o Data handling capacity
o Data mining engines made available
Salford Systems ©2014 Company and Product Line Overview 23

More Related Content

Similar to Salford Systems - On the Cutting Edge of Technology

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...STEP_scotland
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXKirk Haslbeck
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...STEP_scotland
 
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...najeeb1984
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIvo Andreev
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...STEP_scotland
 
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and DigitalisationProspering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and DigitalisationKTN
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...israel edem
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsATMOSPHERE .
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vsIan Feller
 

Similar to Salford Systems - On the Cutting Edge of Technology (20)

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Quality Steering Group - Modelling & Monitoring Workshop -...
 
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...
An Empirical Study of Reliability Growth of Open versus Closed Source Softwar...
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDB
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
 
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and DigitalisationProspering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 

Recently uploaded

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Recently uploaded (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Salford Systems - On the Cutting Edge of Technology

  • 1. May 2016 Dan Steinberg Mikhail Golovnya Salford Systems On the Cutting Edge of Technology
  • 2. Salford Systems: Company Overview • Founded 1983 by Dan Steinberg o First products were SAS® procedures: MLOGIT, MPROBIT • 1993 CART Version 1.0 • 1994-1997 Extensive data mining consulting • 1999 MARS Version 1.0 • 2000 CART 4.0 MARS 2.0 • 2002 CART 5.0 TreeNet 1.0 • 2004 Random Forests • 2005 TreeNet 2.0 • 2006 CART 6.0 Salford Systems ©2014 Company and Product Line Overview 2
  • 3. Recent Product Line Development • 2008-2013 Series of Products o CART® 6.0 in Standard, EX, and PRO versions o TreeNet® 2.0 PRO EX version o MARS® 3.0 with Time Series Support o RandomForests® 2.0 o Generalized Path Seeker • 2014 Data Mining Suite (integrates all tools) o SPM 7.0 o Multithreading and modular design o Engine APIs o Ongoing work on SPM 8.0 and Big Data approaches Salford Systems ©2014 Company and Product Line Overview 3
  • 4. Salford Competitive Awards • 2010 Direct Marketing Association. 1st place* • 2009 KDDCup IDAnalytics*, and FEG Japan* 1st Runner Up • 2008 DMA Direct Marketing Association 1st Runner Up • 2007 Pacific Asia PAKDD: Credit Card Cross Sell. 1st place • 2006 DMA Direct Marketing Association: Predictive Modeling* • 2006 PAKDD Pacific Asia KDD: Telco Customer Type Profiling • 2005 BI-Cup Latin America: Predictive Modeling E-commerce* 1st place • 2004 KDDCup: Predictive Modeling ‘Most Accurate”* • 2002 NCR/Teradata Duke University: Predictive Modeling-Churn o all four separate predictive modeling challenges 1st place • 2000 KDDCup: Predictive Modeling- Online behavior 1st place • 2000 KDDCup: CRM Analysis 1st place Salford Systems ©2014 Company and Product Line Overview *Won either by Salford or by client using Salford tools 4
  • 5. Salford Systems: R&D Staff and Partners • Dan Steinberg, Ph.D Econometrics, Harvard (CART, MARS, Discrete Choice) • Nicholas Scott Cardell, PhD Econometrics, Harvard (Data Mining, Discrete Choice) • Jerome H. Friedman, Stanford University (algorithm coder CART, MARS, Treenet, HotSpotDetector) • Leo Breiman, UC Berkeley (algorithm developer, ensembles of trees, randomization techniques to improve trees) • Richard Olshen, Stanford University (Survival CART, Tree- Based Clustering) • Charles Stone, UC Berkeley (CART large sample theory) • Richard Carson, UC San Diego, Visualization Methods, Super Computer methods Salford Systems ©2014 Company and Product Line Overview 5
  • 6. Introduction to SPM Salford Systems ©2014 Machine Learning Defined • Machine Learning is the search for patterns in data using modern highly automated, computer intensive methods o Data mining may be best defined as the use of a specific class of tools (data mining methods) in the analysis of data o The term “search” is key to this definition, as is “automated” • The literature often refers to finding hidden information in data • We will focus on patterns that allow us to accomplish two tasks: o Classification o Regression • There is also a third common task o Finding groups in data (clustering, density estimation) This is known as “supervised learning” This is known as “unsupervised learning” 6
  • 7. Introduction to SPM Salford Systems ©2014 The Essence of Machine Learning • In a nutshell: Use historical data to gain insights and/or make predictions on the new data Population Analyst Model Scoring DM Engine Historical Data New Data Insights Predictions 7
  • 8. Boston Housing Data Set • Concerns the housing values in Boston area • Harrison, D. and D. Rubinfeld. Hedonic Prices and the Demand For Clean Air. Journal of Environmental Economics and Management, v5, 81-102 , 1978 • Combined information from 10 separate governmental and educational sources to produce this data set • 506 census tracts in City of Boston for the year 1970 o Goal: study relationship between quality of life variables and property values o MV median value of owner-occupied homes in tract ($1,000’s) o CRIM per capita crime rates o NOX concentration of nitric oxides (pp 10 million) o AGE percent built before 1940 o DIS weighted distance to centers of employment o RM average number of rooms per house o LSTAT % lower status of the population o RAD accessibility to radial highways o CHAS borders Charles River (0/1) o INDUS percent non-retail business o TAX property tax rate per $10,000 o PT pupil teacher ratio Introduction to SPM Salford Systems ©2014 8
  • 9. Target: Median House Value (MV) The distribution of the target variable (in thousands $) Clear manifestation of the inflation over the past 40 years Introduction to SPM Salford Systems ©2014 9
  • 10. • The data violates all conventional modeling assumptions • Clearly some non-normal distributions and non-linear relationships Mutual Dependency Introduction to SPM Salford Systems ©2014 10
  • 11. OLS Regression • OLS – ordinary least squares regression o Discovered by Legendre (1805) and Gauss (1809) to solve problems in astronomy using pen and paper o Solid statistical foundation by Fisher in 1920s o 1950s – use of electro-mechanical calculators • The model is always of the form • The response surface is a hyper-plane! • A – the intercept term • B1, B2, B3, … – parameter estimates • A usually unique combination of values exists which minimizes the mean squared error of predictions on the learn sample • Step-wise approaches to determine model size Response = A + B1X1 + B2X2 + B3X3 + … Introduction to SPM Salford Systems ©2014 11
  • 12. OLS on Boston Data • 414 records in the learn sample • 92 records in the test sample • Good agreement o LEARN MSE = 27.455 o TEST MSE = 26.147 3-variable Solution -0.597 +5.247 -0.858 Introduction to SPM Salford Systems ©2014 12
  • 13. Unique Personalities – the “Founding Fathers” of CART Leo Breiman Jerome Friedman Richard Olshen Charles Stone Salford Systems ©2014 Company and Product Line Overview 13
  • 14. 1984 CART Monograph © Copyright Salford Systems 1999-2015
  • 15. Introduction to SPM Salford Systems ©2014 15 Regression Tree Model • All cases in the given node are assigned the same predicted response – the node average of the original target • Nodes are color-coded according to the predicted response • We have a convenient segmentation of the population according to the average response levels
  • 16. Introduction to SPM Salford Systems ©2014 16 The Best and the Worst Segments
  • 17. Gradient Boosting • Begin with a very small tree as initial model • Compute “residuals” (prediction errors) for this simple model for every record in data • Grow a second small tree to predict the residuals from the first tree • Compute residuals from this new 2-tree model and grow a 3rd tree to predict revised residuals • Repeat this process to grow a sequence of trees + + + … Tree 1 Tree 2 Tree 3 More trees Salford Systems ©2014 Company and Product Line Overview 17
  • 18. Illustration: Saddle Function • 500 {X1,X2} points randomly drawn from a [-3,+3] box to produce the XOR response surface Y = X1 * X2 • Will use 3-node trees to show the evolution of TreeNet response surface Salford Systems ©2014 Company and Product Line Overview 1 Tree 2 Trees 3 Trees 4 Trees 10 Trees 20 Trees 30 Trees 40 Trees 100 Trees 195 Trees 18
  • 19. Delinquency Dataset VARIABLE DESCRIPTION DELINQUENT Person experienced 90 days past due delinquency or worse AGE Age of borrower in years DEBT_RATIO Monthly debt payments, alimony, living costs divided by monthly gross income MONTH_INCOME Monthly income N_OPEN_LINES Number of open loans (mortgages, car loans, credit cards, etc.) N_MORTGAGES Number of mortgage and real estate loans N_DEPENDENTS Number of dependents in family excluding yourself Salford Systems ©2015 Advanced Uses of SPM 19
  • 20. • TreeNet TreeNet Model for Delinquency Salford Systems ©2015 Advanced Uses of SPM 20 • Logistic Regression • Here we show the performance of the 6-variable TreeNet model compared to the performance of the equivalent Logistic Regression model • TreeNet has a clear edge of 5 points over the Logistic Regression in terms of ROC-area!
  • 21. Gathering Up Transformations • TreeNet provides powerful insights into the inner workings of the model by constructing partial dependence plots • We can now use the partial dependence plots to construct 1-st and 2-nd order univariate spline transformations • The resulting transforms are added to the dataset and the code is saved for future use Salford Systems ©2015 Advanced Uses of SPM 21
  • 22. Enhanced Logistic Regression Model • We can now build a logistic regression model using the transformed features • Our new model almost completely recovers performance of the original unconstrained TreeNet model! • This is because the data exhibit virtually no interactions which can be easily confirmed by building a constrained additive model in TreeNet Salford Systems ©2015 Advanced Uses of SPM 22
  • 23. Salford Predictive Modeler SPM • Download a current version from our website http://www.salford-systems.com • Version will run without a license key for 10-days • Request a license key from unlock@salford-systems.com • Request configuration to meet your needs o Data handling capacity o Data mining engines made available Salford Systems ©2014 Company and Product Line Overview 23