SlideShare a Scribd company logo
1 of 37
Advances in Boosted Tree Technology:
TreeNet Model Compression and Optimal Rule
                Extraction




 Dan Steinberg, Milkail Golovnya, N Scott Cardell
                    May 2012
                Salford Systems
      http://www.salford-systems.com
Beyond TreeNet

• TreeNet has set a high bar for automatic off-the-shelf model
  performance
   – TreeNet was used to win all four 1st place awards in the
     Duke/Teradata churn modeling competition of 2002
   – Awards in 2010, 2009, 2008, 2007, 2004 all based on TreeNet


• TreeNet was first developed (MART) in 1999 and essentially
  perfected in 2000
   – Many improvements since then but the fundamentals are largely
     those of the 2000 technology
• In subsequent work Friedman has introduced major
  extensions that go beyond the framework of boosted trees


                     © Copyright Salford Systems 2012
Importance Sampled Learning Ensembles (ISLE)

• Friedman’s work in 2003 is somewhat more complex than
  what we describe here
   – Presented his paper at our first data mining conference in San
     Francisco in March of 2004
• We focus on the concept of model compression
• TreeNet model is grown myopically one added tree at a time
   –   From current model attempt to improve it by predicting residuals
   –   Each tree represents incremental learning and error correction
   –   Slow learning, small steps
   –   During model development we do not know where we are going to
       end up
• Once we have the TreeNet model completed can we review
  it and “clean it up”
                       © Copyright Salford Systems 2012
Post-Processing With Regularized Regression

• Friedman’s ISLE takes a TreeNet model as its raw material
  and considers how we can refine it using regression
• Consider: every tree takes our raw data as input and
  generates outputs at the terminal nodes
• Each tree can be thought of as a new variable constructed
  out of the original data
   – No missing values in tree outputs even if there were missing values
     in the raw data
   – Outliers among such predictors are expected to be rare as each
     terminal is doing averaging and the trees are typically small
• Might create many more generated variables than original
  raw variables
   – Boston data set has 13 predictors, TN might generate 1000 trees

                      © Copyright Salford Systems 2012
Regularized Regression

• Modern regression techniques starting with Ridge
  regression, and then the Lasso, and finally hybrid models
• Methods have advantages over classical regression
   –   Can handle highly correlated variables (Ridge)
   –   Can work with data sets with more columns than rows
   –   Can do variable selection (Lasso, Ridge-Lasso hybrids)
   –   Much more effective and reliable than old fashioned stepwise
• Regularized regression is still regression and thus suffers
  from all the primary limitations of classical regression
   – No missing value handling
   – Linear additive model (no interactions)
   – Sensitive to functional form of predictors


                       © Copyright Salford Systems 2012
Regularized Regression Applied to Trees

• Applying to regularized regression to trees is not vulnerable
  to these traditional problems
   –   Missing values already handled and transformed to non-missing
   –   Interactions incorporated into the tree structure
   –   Trees are invariant with respect to typical univariate transformations
   –   Any order preserving transform will not affect tree
• What will a regularized regression on trees accomplish?
   –   Combine all identical trees into one
   –   Combine several similar trees into a compromise tree
   –   Bypass any meandering while TreeNet searched for optimum
   –   Reweights the trees (in TN all trees have equal weight)




                        © Copyright Salford Systems 2012
Regularized Regression of TreeNet

• In this mode of ISLE we develop the best TreeNet model we
  can
• Post-process results allowing for different degrees of
  compression
• By default we run four models on the TreeNet
   –   Ridge (no compression, just reweighting)
   –   Lasso (compression possible)
   –   Ridged Lasso (hybrid of Lasso and Ridge but mostly Lasso)
   –   Compact (maximum compression)
• Goal usually is to find a substantial degree of compression
  while giving up little or nothing on test sample performance
• Could focus only on beating TN performance

                       © Copyright Salford Systems 2012
Model Compression: Early Days

• TreeNet has always offered model truncation
• Instead of using the fully articulated model stop the process
  early
• In 2005 this method was being used by a major web portal
   – TreeNet model used to predict likely response to item presented to
     visitor on a web page (ad, link, photo, story)
   – To implement real time response TN model limited to first 30 trees
   – Sacrificed considerable predictive accuracy to have a model that
     could score fast enough in real time
   – Truncated TreeNet at 30 trees still was better than other alternatives
   – Consider that model might have been rebuilt every hour




                      © Copyright Salford Systems 2012
Illustrative Example: Boston Housing Data Set
                                Set Up Model




              © Copyright Salford Systems 2012
TreeNet Controls




1000 trees, Least Squares, AUTO Learnrate
                   © Copyright Salford Systems 2012
Post Processor Controls:
    What Type of Post Processing




© Copyright Salford Systems 2012
Post Processor Details:
                                                   Use all defaults




• Standardizing the “trees” gives all equal weight in regularized regression
• Worth experimenting with unstandardized – larger variance trees will dominate
                         © Copyright Salford Systems 2012
Two Stage Modeling Process

• First Stage here is a TreeNet but in SPM could also be
   –   Single CART Tree (focus would be on nodes eg from maximal tree)
   –   Ensemble of CART trees (bagger)
   –   MARS model (basis functions from maximal model)
   –   Random Forests
• In ISLE mode we need to operate on a collection of
  variables created by a learning machine
   – these can come from any of our tree engines or MARS


• We will get first stage results: a model
• Then get second stage: model refinement
   – Model compression or model selection (eg tree pruning)

                      © Copyright Salford Systems 2012
TreeNet Results




Test Set R2=.87875 MSE=7.407
                  © Copyright Salford Systems 2012
TreeNet Results: Residual Stats




One substantial outlier more than 5 IQR outside central data range
                          © Copyright Salford Systems 2012
TreeNet and Compressed TreeNet
                                                 Both Models Reported Below




•   The dashed lines show evolution of the compressed model
•   Because we can choose any of our 1000 trees to start the compressed model starts
    off much better than the original TreeNet and it has a coefficient
                           © Copyright Salford Systems 2012
ISLE Reweighted TreeNet:
                   Test Data Results




© Copyright Salford Systems 2012
TreeNet vs ISLE Residuals
              ISLE is wider in the center but narrower top to bottom


TreeNet Residuals                       ISLE Compressed TreeNet




               © Copyright Salford Systems 2012
Comment on the First Tree

• It is interesting to observe that in this example the
  compressed model with just one tree in it outperforms the
  TreeNet model with just one tree
• Trees are built without look ahead but having a menu of
  1000 trees to choose from allows the 2nd stage model to do
  better
• Worst case scenario is that 2nd stage chooses same first
  tree
• Coefficient can spread out the predictions




                   © Copyright Salford Systems 2012
TreeNet Model Compression

• TreeNet has set a high bar for predictive accuracy in the
  data mining field
• We now offer several ways in which a TreeNet can be
  further improved by post-processing
• Consider that a TreeNet model is built one step at a time
  without knowledge of where we will end up
   – Some trees are exact or almost exact copies of other trees
   – Some trees may exhibit some “wandering” before the right direction
     is found
   – Trees are each built on a different random subset of the data and
     some trees may just be “unlucky”
   – Post processing can combine multiple copies of essentially the same
     tree and skip any unnecessary wandering

                     © Copyright Salford Systems 2012
How Much Compression is Possible?

• Our experience derives from working with data from several
  industries (retail sales, online web advertising, credit risk,
  direct marketing)
• Compression of 80% is not uncommon for the best model
  generated by the post-processing
• However, user is free to truncate the compressed model as
  it is also built up sequentially (we add one tree at a time to
  the model)
• User can thus choose from a possibly broad range of
  tradeoffs opting for even greater compression available from
  a less accurate model
• In the BOSTON example 90% compression also performs
  quite well (about 40 trees instead of the optimal 91 trees)
                    © Copyright Salford Systems 2012
A Comment on the Theory behind ISLE

• In Friedman’s paper on ISLE he provides a rationale for this
  approach quite different from ours
• Consider that our goal is to learn a model from data where it
  is clear that a linear regression is not adequate
• How to automatically manufacture basis functions that
  capture more complex structure than raw variables
   – Imagine offering high order polynomials
   – Some have suggested adding Xi*Xj interactions and also 1/Xi as new
     predictors plus log(Xi) for all strictly positive regressors
   – Friedman proposes TreeNet as a vehicle for generating such new
     variables in the search for a more faithful model (to the truth)
   – Think of TreeNet as a search engine for features (constructed
     predictors)

                      © Copyright Salford Systems 2012
From Trees to Nodes

• In a second round of work on the idea of post-processing a
  tree ensemble Friedman suggested working with nodes
• Every node in a decision tree (other than the root) defines a
  potentially interesting subset of data
• Analysts have long thought about the terminal nodes of a
  CART tree in this way
   – Each terminal node is a segment or can be thought of as an
     interesting rule
   – Cardell and Steinberg proposed blending CART and logistic
     regression in this way (each terminal node is a dummy variable)
• Now we extend this thinking to all nodes below the root
   •   Tibshirani proposed using all the nodes of a maximal tree in a Lasso model
       to “prune” the tree

                         © Copyright Salford Systems 2012
Nodes in a Single TreeNet Tree
                                Tree grown to have T=6 terminal nodes


                                     •   Typical TreeNet has T=6 terminal nodes
                                     •   One level down has two nodes
                                     •   Next level has 4 nodes (3 terminal)
                                     •   Next 2 levels have 2 nodes each
                                     •   Total is 10 non-root nodes
                                     •   Will always be T + (T-1) -1 = 2(T-1)

                                     • Represent each node as a 0/1 indicator
                                     • Record passes through this node (1) or
                                       does not pass through this node (0)



• With 10 node indicators per each 6-terminal tree a 1,000 tree TreeNet will
  generate 10,000 node indicators
• Now we want to post-process this node representation of the TreeNet
• Methodology can generate an immense number of predictors

                          © Copyright Salford Systems 2012
Use Regularized Regression to Post Process

• Essential because even if we start with a small data set
  (rows and columns) we might generate thousands of trees
• The regularized regression is used to
   – SELECT trees (only a subset of the original trees will be used)
   – REWEIGHT trees (originally all had equal weight)
• The new model is still an ensemble of regression trees but
  now recombined differently
   – Some trees might get a negative weight
• New model could have two advantages
   – Could be MUCH smaller than original model (good for deployment)
   – Could be more accurate on holdout data
• No guarantees but results often attractive

                      © Copyright Salford Systems 2012
Variations on Node Post Processing

•   Pure: nodes (only node dummies in 2nd stage model)
•   Hybrid: nodes + trees (mix of ISLE and nodes)
•   Hybrid: raw predictors + nodes (Friedman’s preferred)
•   Hybrid: raw predictors + ISLE variables
•   Hybrid: raw predictors + ISLE trees + nodes

• In addition we could add the original TreeNet prediction to
  any of these sets of predictors
• Ideal interaction detection: include TreeNet prediction from a
  pure additive model and node indicators as regressors



                    © Copyright Salford Systems 2012
Raw Predictor Problems

• Much of our empirical work involves incomplete data
  (missing values) and the 2nd stage model requires complete
  data (listwise deletion)
• While the hybrid models involving raw variables can capture
  nonlinearity and interactions the raw predictors act as
  everyday regressors
   – Issue of functional form
   – Issue of outliers
• Using ISLE variables may be far better for working with data
  for which careful cleaning and repair is not an option




                      © Copyright Salford Systems 2012
Same Data Post-Processing Nodes




• In this example running only on nodes does not do well
    • See the upper dotted performance curves
• Still we will examine the outputs generated
• Which method works best will vary with specifics of the data

                       © Copyright Salford Systems 2012
Pure RuleSeeker




•   Each variable in model is a node, or a RULE
•   Worthwhile to examine mean target, lift, support and agreement with test data
•   All shown above
                            © Copyright Salford Systems 2012
Rule table:
                                                        Display is Sortable




•   Number of terms in a rule is determined by location of node in tree
•   Deep nodes can involve more variables (minimum is one, max is equal to depth of tree)
                           © Copyright Salford Systems 2012
Rule Statistics




More columns from the Rule Table Display

                      © Copyright Salford Systems 2012
Lift Report:
                         High Lifts Represent Interesting Sergments




Dot for each rule (here displaying test data results)

                        © Copyright Salford Systems 2012
Parametric Bootstrap For Interaction Statistics




© Copyright Salford Systems 2012
Final Details

• We have described RuleSeeker as a way to post-process a
  TreeNet model and this is a fundamental use of the method
• When our goal from the start is to extract rules then we are
  advised to modify the TreeNet control in two ways
   – Allow the sizes of the trees to vary at random
   – Use very small subsets of the data when growing each tree
• Friedman recommends an average tree size of 4 terminal
  nodes and using a Poisson distribution to generate varying
  tree sizes (will often yield a few trees with 10-16 nodes)
• Friedman describes experiments in which each tree in the
  TreeNet is grown on just 5% of the available data
   – TreeNet first stage is inferior to standard TreeNet but 2nd stage could
     actually outperform the standard TreeNet
                       © Copyright Salford Systems 2012
RuleSeeker and Huge Data

• If the RuleSeeker approach can in fact outperform standard
  TreeNet this suggests a sampling approach to massive data
  sets
• Extract rather small (possibly stratified) samples from each
  of many data repositories
• Grow a Treenet tree
• Repeat random draws to grow subsequent trees
• Friedman’s approach does not grow very many trees (200)
• The 2nd stage regression must be run on a much larger
  sample but regression is much easier to distribute than trees



                   © Copyright Salford Systems 2012
RuleSeeker Summary

• A RuleSeeker model has several interesting dimensions
   –   It is a post-processed version of a TreeNet
   –   RuleSeeker model could offer better performance than original TN
   –   RuleSeeker model might also be more compact
   –   Rules extracted could be seen as important INTERACTIONS
   –   Rules could be studied as rules
        • Compare train vs test Lift (want good agreement)
        • Consider tradeoff of Lift versus Support
            – Rules can guide targeting but only worthwhile if support is sufficient




                          © Copyright Salford Systems 2012
Big Data

• Currently we support 64-bit single server
• Using typical modern servers means 32-cores and 512GB
  RAM
   – Shortly we expect to see 200 cores and 2TB RAM at modest prices
   – Our training data can reach about 1/3 RAM without disk thrashing
   – 200GB training data (50 million rows by 1000 predictors)


• MapReduce/Hadoop appears to be the emerging standard
  for massively parallel data stores and computation
• Our approach will be bagging models that extract random
  samples from each of the data stores
• Each mapper and reducer are expected to have 4GB RAM
• We will require reducers to be equipped with 16GB
                     © Copyright Salford Systems 2012

More Related Content

Similar to Some of the new features in SPM 7

Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design PatternsEMC
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Salford Systems - On the Cutting Edge of Technology
Salford Systems - On the Cutting Edge of TechnologySalford Systems - On the Cutting Edge of Technology
Salford Systems - On the Cutting Edge of TechnologyVladyslav Frolov
 
Adbms 8 history of data models
Adbms 8 history of data modelsAdbms 8 history of data models
Adbms 8 history of data modelsVaibhav Khanna
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architectureChinh Ngo Nguyen
 
session on pattern oriented software architecture
session on pattern oriented software architecturesession on pattern oriented software architecture
session on pattern oriented software architectureSUJOY SETT
 
Webinar: SDS is Broken - And How to Fix it
Webinar: SDS is Broken - And How to Fix itWebinar: SDS is Broken - And How to Fix it
Webinar: SDS is Broken - And How to Fix itStorage Switzerland
 
Chapter-2 Database System Concepts and Architecture
Chapter-2 Database System Concepts and ArchitectureChapter-2 Database System Concepts and Architecture
Chapter-2 Database System Concepts and ArchitectureKunal Anand
 
Model based engineering tutorial thomas consulting 4_sep13-1
Model based engineering tutorial thomas consulting 4_sep13-1Model based engineering tutorial thomas consulting 4_sep13-1
Model based engineering tutorial thomas consulting 4_sep13-1seymourmedia
 
Simplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data VirtualizationSimplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data VirtualizationDenodo
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
Netezza Deep Dives
Netezza Deep DivesNetezza Deep Dives
Netezza Deep DivesRush Shah
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...Edge AI and Vision Alliance
 
Learning Trees - Decision Tree Learning Methods
Learning Trees - Decision Tree Learning MethodsLearning Trees - Decision Tree Learning Methods
Learning Trees - Decision Tree Learning MethodsHPCC Systems
 

Similar to Some of the new features in SPM 7 (20)

Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design Patterns
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Salford Systems - On the Cutting Edge of Technology
Salford Systems - On the Cutting Edge of TechnologySalford Systems - On the Cutting Edge of Technology
Salford Systems - On the Cutting Edge of Technology
 
Adbms 8 history of data models
Adbms 8 history of data modelsAdbms 8 history of data models
Adbms 8 history of data models
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architecture
 
session on pattern oriented software architecture
session on pattern oriented software architecturesession on pattern oriented software architecture
session on pattern oriented software architecture
 
Webinar: SDS is Broken - And How to Fix it
Webinar: SDS is Broken - And How to Fix itWebinar: SDS is Broken - And How to Fix it
Webinar: SDS is Broken - And How to Fix it
 
Chapter-2 Database System Concepts and Architecture
Chapter-2 Database System Concepts and ArchitectureChapter-2 Database System Concepts and Architecture
Chapter-2 Database System Concepts and Architecture
 
Model based engineering tutorial thomas consulting 4_sep13-1
Model based engineering tutorial thomas consulting 4_sep13-1Model based engineering tutorial thomas consulting 4_sep13-1
Model based engineering tutorial thomas consulting 4_sep13-1
 
Simplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data VirtualizationSimplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data Virtualization
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Netezza Deep Dives
Netezza Deep DivesNetezza Deep Dives
Netezza Deep Dives
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Learning Trees - Decision Tree Learning Methods
Learning Trees - Decision Tree Learning MethodsLearning Trees - Decision Tree Learning Methods
Learning Trees - Decision Tree Learning Methods
 

More from Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeSalford Systems
 

More from Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorial
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate Change
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Some of the new features in SPM 7

  • 1. Advances in Boosted Tree Technology: TreeNet Model Compression and Optimal Rule Extraction Dan Steinberg, Milkail Golovnya, N Scott Cardell May 2012 Salford Systems http://www.salford-systems.com
  • 2. Beyond TreeNet • TreeNet has set a high bar for automatic off-the-shelf model performance – TreeNet was used to win all four 1st place awards in the Duke/Teradata churn modeling competition of 2002 – Awards in 2010, 2009, 2008, 2007, 2004 all based on TreeNet • TreeNet was first developed (MART) in 1999 and essentially perfected in 2000 – Many improvements since then but the fundamentals are largely those of the 2000 technology • In subsequent work Friedman has introduced major extensions that go beyond the framework of boosted trees © Copyright Salford Systems 2012
  • 3. Importance Sampled Learning Ensembles (ISLE) • Friedman’s work in 2003 is somewhat more complex than what we describe here – Presented his paper at our first data mining conference in San Francisco in March of 2004 • We focus on the concept of model compression • TreeNet model is grown myopically one added tree at a time – From current model attempt to improve it by predicting residuals – Each tree represents incremental learning and error correction – Slow learning, small steps – During model development we do not know where we are going to end up • Once we have the TreeNet model completed can we review it and “clean it up” © Copyright Salford Systems 2012
  • 4. Post-Processing With Regularized Regression • Friedman’s ISLE takes a TreeNet model as its raw material and considers how we can refine it using regression • Consider: every tree takes our raw data as input and generates outputs at the terminal nodes • Each tree can be thought of as a new variable constructed out of the original data – No missing values in tree outputs even if there were missing values in the raw data – Outliers among such predictors are expected to be rare as each terminal is doing averaging and the trees are typically small • Might create many more generated variables than original raw variables – Boston data set has 13 predictors, TN might generate 1000 trees © Copyright Salford Systems 2012
  • 5. Regularized Regression • Modern regression techniques starting with Ridge regression, and then the Lasso, and finally hybrid models • Methods have advantages over classical regression – Can handle highly correlated variables (Ridge) – Can work with data sets with more columns than rows – Can do variable selection (Lasso, Ridge-Lasso hybrids) – Much more effective and reliable than old fashioned stepwise • Regularized regression is still regression and thus suffers from all the primary limitations of classical regression – No missing value handling – Linear additive model (no interactions) – Sensitive to functional form of predictors © Copyright Salford Systems 2012
  • 6. Regularized Regression Applied to Trees • Applying to regularized regression to trees is not vulnerable to these traditional problems – Missing values already handled and transformed to non-missing – Interactions incorporated into the tree structure – Trees are invariant with respect to typical univariate transformations – Any order preserving transform will not affect tree • What will a regularized regression on trees accomplish? – Combine all identical trees into one – Combine several similar trees into a compromise tree – Bypass any meandering while TreeNet searched for optimum – Reweights the trees (in TN all trees have equal weight) © Copyright Salford Systems 2012
  • 7. Regularized Regression of TreeNet • In this mode of ISLE we develop the best TreeNet model we can • Post-process results allowing for different degrees of compression • By default we run four models on the TreeNet – Ridge (no compression, just reweighting) – Lasso (compression possible) – Ridged Lasso (hybrid of Lasso and Ridge but mostly Lasso) – Compact (maximum compression) • Goal usually is to find a substantial degree of compression while giving up little or nothing on test sample performance • Could focus only on beating TN performance © Copyright Salford Systems 2012
  • 8. Model Compression: Early Days • TreeNet has always offered model truncation • Instead of using the fully articulated model stop the process early • In 2005 this method was being used by a major web portal – TreeNet model used to predict likely response to item presented to visitor on a web page (ad, link, photo, story) – To implement real time response TN model limited to first 30 trees – Sacrificed considerable predictive accuracy to have a model that could score fast enough in real time – Truncated TreeNet at 30 trees still was better than other alternatives – Consider that model might have been rebuilt every hour © Copyright Salford Systems 2012
  • 9. Illustrative Example: Boston Housing Data Set Set Up Model © Copyright Salford Systems 2012
  • 10. TreeNet Controls 1000 trees, Least Squares, AUTO Learnrate © Copyright Salford Systems 2012
  • 11. Post Processor Controls: What Type of Post Processing © Copyright Salford Systems 2012
  • 12. Post Processor Details: Use all defaults • Standardizing the “trees” gives all equal weight in regularized regression • Worth experimenting with unstandardized – larger variance trees will dominate © Copyright Salford Systems 2012
  • 13. Two Stage Modeling Process • First Stage here is a TreeNet but in SPM could also be – Single CART Tree (focus would be on nodes eg from maximal tree) – Ensemble of CART trees (bagger) – MARS model (basis functions from maximal model) – Random Forests • In ISLE mode we need to operate on a collection of variables created by a learning machine – these can come from any of our tree engines or MARS • We will get first stage results: a model • Then get second stage: model refinement – Model compression or model selection (eg tree pruning) © Copyright Salford Systems 2012
  • 14. TreeNet Results Test Set R2=.87875 MSE=7.407 © Copyright Salford Systems 2012
  • 15. TreeNet Results: Residual Stats One substantial outlier more than 5 IQR outside central data range © Copyright Salford Systems 2012
  • 16. TreeNet and Compressed TreeNet Both Models Reported Below • The dashed lines show evolution of the compressed model • Because we can choose any of our 1000 trees to start the compressed model starts off much better than the original TreeNet and it has a coefficient © Copyright Salford Systems 2012
  • 17. ISLE Reweighted TreeNet: Test Data Results © Copyright Salford Systems 2012
  • 18. TreeNet vs ISLE Residuals ISLE is wider in the center but narrower top to bottom TreeNet Residuals ISLE Compressed TreeNet © Copyright Salford Systems 2012
  • 19. Comment on the First Tree • It is interesting to observe that in this example the compressed model with just one tree in it outperforms the TreeNet model with just one tree • Trees are built without look ahead but having a menu of 1000 trees to choose from allows the 2nd stage model to do better • Worst case scenario is that 2nd stage chooses same first tree • Coefficient can spread out the predictions © Copyright Salford Systems 2012
  • 20. TreeNet Model Compression • TreeNet has set a high bar for predictive accuracy in the data mining field • We now offer several ways in which a TreeNet can be further improved by post-processing • Consider that a TreeNet model is built one step at a time without knowledge of where we will end up – Some trees are exact or almost exact copies of other trees – Some trees may exhibit some “wandering” before the right direction is found – Trees are each built on a different random subset of the data and some trees may just be “unlucky” – Post processing can combine multiple copies of essentially the same tree and skip any unnecessary wandering © Copyright Salford Systems 2012
  • 21. How Much Compression is Possible? • Our experience derives from working with data from several industries (retail sales, online web advertising, credit risk, direct marketing) • Compression of 80% is not uncommon for the best model generated by the post-processing • However, user is free to truncate the compressed model as it is also built up sequentially (we add one tree at a time to the model) • User can thus choose from a possibly broad range of tradeoffs opting for even greater compression available from a less accurate model • In the BOSTON example 90% compression also performs quite well (about 40 trees instead of the optimal 91 trees) © Copyright Salford Systems 2012
  • 22. A Comment on the Theory behind ISLE • In Friedman’s paper on ISLE he provides a rationale for this approach quite different from ours • Consider that our goal is to learn a model from data where it is clear that a linear regression is not adequate • How to automatically manufacture basis functions that capture more complex structure than raw variables – Imagine offering high order polynomials – Some have suggested adding Xi*Xj interactions and also 1/Xi as new predictors plus log(Xi) for all strictly positive regressors – Friedman proposes TreeNet as a vehicle for generating such new variables in the search for a more faithful model (to the truth) – Think of TreeNet as a search engine for features (constructed predictors) © Copyright Salford Systems 2012
  • 23. From Trees to Nodes • In a second round of work on the idea of post-processing a tree ensemble Friedman suggested working with nodes • Every node in a decision tree (other than the root) defines a potentially interesting subset of data • Analysts have long thought about the terminal nodes of a CART tree in this way – Each terminal node is a segment or can be thought of as an interesting rule – Cardell and Steinberg proposed blending CART and logistic regression in this way (each terminal node is a dummy variable) • Now we extend this thinking to all nodes below the root • Tibshirani proposed using all the nodes of a maximal tree in a Lasso model to “prune” the tree © Copyright Salford Systems 2012
  • 24. Nodes in a Single TreeNet Tree Tree grown to have T=6 terminal nodes • Typical TreeNet has T=6 terminal nodes • One level down has two nodes • Next level has 4 nodes (3 terminal) • Next 2 levels have 2 nodes each • Total is 10 non-root nodes • Will always be T + (T-1) -1 = 2(T-1) • Represent each node as a 0/1 indicator • Record passes through this node (1) or does not pass through this node (0) • With 10 node indicators per each 6-terminal tree a 1,000 tree TreeNet will generate 10,000 node indicators • Now we want to post-process this node representation of the TreeNet • Methodology can generate an immense number of predictors © Copyright Salford Systems 2012
  • 25. Use Regularized Regression to Post Process • Essential because even if we start with a small data set (rows and columns) we might generate thousands of trees • The regularized regression is used to – SELECT trees (only a subset of the original trees will be used) – REWEIGHT trees (originally all had equal weight) • The new model is still an ensemble of regression trees but now recombined differently – Some trees might get a negative weight • New model could have two advantages – Could be MUCH smaller than original model (good for deployment) – Could be more accurate on holdout data • No guarantees but results often attractive © Copyright Salford Systems 2012
  • 26. Variations on Node Post Processing • Pure: nodes (only node dummies in 2nd stage model) • Hybrid: nodes + trees (mix of ISLE and nodes) • Hybrid: raw predictors + nodes (Friedman’s preferred) • Hybrid: raw predictors + ISLE variables • Hybrid: raw predictors + ISLE trees + nodes • In addition we could add the original TreeNet prediction to any of these sets of predictors • Ideal interaction detection: include TreeNet prediction from a pure additive model and node indicators as regressors © Copyright Salford Systems 2012
  • 27. Raw Predictor Problems • Much of our empirical work involves incomplete data (missing values) and the 2nd stage model requires complete data (listwise deletion) • While the hybrid models involving raw variables can capture nonlinearity and interactions the raw predictors act as everyday regressors – Issue of functional form – Issue of outliers • Using ISLE variables may be far better for working with data for which careful cleaning and repair is not an option © Copyright Salford Systems 2012
  • 28. Same Data Post-Processing Nodes • In this example running only on nodes does not do well • See the upper dotted performance curves • Still we will examine the outputs generated • Which method works best will vary with specifics of the data © Copyright Salford Systems 2012
  • 29. Pure RuleSeeker • Each variable in model is a node, or a RULE • Worthwhile to examine mean target, lift, support and agreement with test data • All shown above © Copyright Salford Systems 2012
  • 30. Rule table: Display is Sortable • Number of terms in a rule is determined by location of node in tree • Deep nodes can involve more variables (minimum is one, max is equal to depth of tree) © Copyright Salford Systems 2012
  • 31. Rule Statistics More columns from the Rule Table Display © Copyright Salford Systems 2012
  • 32. Lift Report: High Lifts Represent Interesting Sergments Dot for each rule (here displaying test data results) © Copyright Salford Systems 2012
  • 33. Parametric Bootstrap For Interaction Statistics © Copyright Salford Systems 2012
  • 34. Final Details • We have described RuleSeeker as a way to post-process a TreeNet model and this is a fundamental use of the method • When our goal from the start is to extract rules then we are advised to modify the TreeNet control in two ways – Allow the sizes of the trees to vary at random – Use very small subsets of the data when growing each tree • Friedman recommends an average tree size of 4 terminal nodes and using a Poisson distribution to generate varying tree sizes (will often yield a few trees with 10-16 nodes) • Friedman describes experiments in which each tree in the TreeNet is grown on just 5% of the available data – TreeNet first stage is inferior to standard TreeNet but 2nd stage could actually outperform the standard TreeNet © Copyright Salford Systems 2012
  • 35. RuleSeeker and Huge Data • If the RuleSeeker approach can in fact outperform standard TreeNet this suggests a sampling approach to massive data sets • Extract rather small (possibly stratified) samples from each of many data repositories • Grow a Treenet tree • Repeat random draws to grow subsequent trees • Friedman’s approach does not grow very many trees (200) • The 2nd stage regression must be run on a much larger sample but regression is much easier to distribute than trees © Copyright Salford Systems 2012
  • 36. RuleSeeker Summary • A RuleSeeker model has several interesting dimensions – It is a post-processed version of a TreeNet – RuleSeeker model could offer better performance than original TN – RuleSeeker model might also be more compact – Rules extracted could be seen as important INTERACTIONS – Rules could be studied as rules • Compare train vs test Lift (want good agreement) • Consider tradeoff of Lift versus Support – Rules can guide targeting but only worthwhile if support is sufficient © Copyright Salford Systems 2012
  • 37. Big Data • Currently we support 64-bit single server • Using typical modern servers means 32-cores and 512GB RAM – Shortly we expect to see 200 cores and 2TB RAM at modest prices – Our training data can reach about 1/3 RAM without disk thrashing – 200GB training data (50 million rows by 1000 predictors) • MapReduce/Hadoop appears to be the emerging standard for massively parallel data stores and computation • Our approach will be bagging models that extract random samples from each of the data stores • Each mapper and reducer are expected to have 4GB RAM • We will require reducers to be equipped with 16GB © Copyright Salford Systems 2012