Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LOCAL SEARCH OPTIMIZATION FOR
HYPERPARAMETER TUNING
FUNDA GUNE...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
OVERVIEW
1. Training Predictiv...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
PREDICTIVE MODELS
Transformati...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
MODEL TRAINING – OPTIMIZATION ...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
MODEL TRAINING – OPTIMIZATION ...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
HYPERPARAMETERS
• Quality of t...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
MODEL TUNING
• Traditional App...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
HYPERPARAMETER OPTIMIZATION CH...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
PARALLEL HYBRID DERIVATIVE-FRE...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
SAMPLE TUNING RESULTS – AVERAG...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
SAMPLE TUNING RESULTS – TUNING...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
SINGLE PARTITION VS. CROSS VAL...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
PARALLEL MODE: DATA PARALLEL V...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
“DATA PARALLEL” – DISTRIBUTED ...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
“DATA AND MODEL PARALLEL” – DI...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
HYPERPARAMETER OPTIMIZATION RE...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
LSO FOR HYPER-
PARAMETER TUNING
MANY TUNING OPPORTUNITIES AND ...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
THANK YOU!
Funda.Gunes@sas.com
Patrick.Koch@sas.com
sas.com/vi...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
Parallel&Serial,Pub/Sub,
WebServices,MQs
Source-based
Engines
...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
EXPERIMENTAL
RESULTS
EXPERIMENT SETTINGS
• Decision tree:
• De...
Upcoming SlideShare
Loading in …5
×

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal Data Scientist, SAS Institute Inc. at MLconf ATL 2016

783 views

Published on

Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.

Published in: Technology
  • Be the first to comment

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal Data Scientist, SAS Institute Inc. at MLconf ATL 2016

  1. 1. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LOCAL SEARCH OPTIMIZATION FOR HYPERPARAMETER TUNING FUNDA GUNES, PATRICK KOCH
  2. 2. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING OVERVIEW 1. Training Predictive Models • Optimization problem • Parameters – optimization and model training 2. Tuning Predictive Models • Manual, grid search, optimization • Optimization challenges 3. Sample Results 4. Distributed / Parallel Processing 5. Opportunities and Challenges
  3. 3. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING PREDICTIVE MODELS Transformation of relevant data to high-quality descriptive and predictive models Ex.: Neural Network • How to determine model configuration? • How to determine weights, activation functions? …. …. …. Input layer Hidden layer Output layer x1 x2 x3 x4 xn wij wjk f1(x) f2(x) fm(x)
  4. 4. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING MODEL TRAINING – OPTIMIZATION PROBLEM • Objective Function • 𝑓 𝑤 = 1 𝑛 𝑖=1 𝑛 𝐿 𝑤; 𝑥𝑖, 𝑦𝑖 + 𝜆1 𝑤 1 + 𝜆2 2 𝑤 2 2 • Loss, 𝐿(𝑤; 𝑥𝑖, 𝑦𝑖), for observation (𝑥𝑖, 𝑦𝑖) and weights, 𝑤 • Stochastic Gradient Descent (SGD) • Variation of Gradient Descent 𝑤 𝑘+1 = 𝑤 𝑘 − 𝜂𝛻𝑓(𝑤 𝑘) • Approximate 𝛻𝑓 𝑤 𝑘 with gradient of mini-batch sample: {𝑥 𝑘𝑖, 𝑦 𝑘𝑖 } • 𝛻𝑓𝑡(𝑤) = 1 𝑚 𝑖=1 𝑚 𝛻𝐿(𝑤; 𝑥 𝑘𝑖, 𝑦 𝑘𝑖) • Choice of 𝜆1, 𝜆2, 𝜂, 𝑚 can have significant effects on solution quality
  5. 5. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING MODEL TRAINING – OPTIMIZATION PARAMETERS (SGD) • Learning rate, η • Too high, diverges • Too low, slow performance • Momentum, μ • Too high, could “pass” solution • Too low, no performance improvement • Other parameters • Mini-batch size, m • Adaptive decay rate, β • Annealing rate, r • Communication frequency, c • Regularization parameters, λ1 and λ2 • Too low, has little effect • Too high, drives iterates to 0
  6. 6. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING HYPERPARAMETERS • Quality of trained model governed by so-called ‘hyperparameters’ No clear defaults agreeable to a wide range of applications • Optimization options (SGD) • Neural Net training options • Number of hidden layers • Number of neurons in each hidden layer • Random distribution for initial connection weights (normal, uniform, Cauchy) • Error function (gamma, normal, Poisson, entropy) …. …. …. Input layer Hidden layer Output layer x1 x2 x3 x4 xn wij wjk f1(x) f2(x) fm(x)
  7. 7. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING MODEL TUNING • Traditional Approach: manual tuning Even with expertise in machine learning algorithms and their parameters, best settings are directly dependent on the data used in training and scoring • Hyperparameter Optimization: grid vs. random vs. optimization x1 x2 Standard Grid Search x1 x2 Random Latin Hypercube x1 x2 Random Search
  8. 8. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING HYPERPARAMETER OPTIMIZATION CHALLENGES x Objective blows up Categorical / Integer Variables T(x) Noisy/Nondeterministic Flat-regions. Node failure Tuning objective, T(x), is validation error score (to avoid increased overfitting effect)
  9. 9. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING PARALLEL HYBRID DERIVATIVE-FREE OPTIMIZATION STRATEGY Default hybrid search strategy: 1. Initial Search: Latin Hypercube Sampling (LHS) 2. Global search: Genetic Algorithm (GA) • Supports integer, categorical variables • Handles nonsmooth, discontinuous space 3. Local search: Generating Set Search (GSS) • Similar to Pattern Search • First-order convergence properties • Developed for continuous variables All naturally parallelizable methods Can run any solver combination in parallel
  10. 10. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING SAMPLE TUNING RESULTS – AVERAGE IMPROVEMENT • 13 test problems • Single 30% validation partition • Single Machine • Conservative default tuning process: • 5 Iterations • 10 configurations per iteration • Run 10 times each • Averaged results ML Average Error % Reduction DT 5.2 GBT 5.0 RF 4.4 DT-P 3.0 Higher is better No Free Lunch
  11. 11. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING SAMPLE TUNING RESULTS – TUNING TIME • 13 test problems • Single 30% validation partition • Single Machine • Conservative default tuning process: • 5 Iterations • 10 configurations per iteration • Run 10 times each • Averaged results ML Average % Error Average Time DT 15.3 9.2 DT-P 13.9 12.8 RF 12.7 49.1 GBT 11.7 121.8
  12. 12. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING SINGLE PARTITION VS. CROSS VALIDATION • For each problem: • Tune with single 30% partition • Score best model on test set • Repeat 10 times • Average difference between test error and validation error • Repeat process with 5-fold cross validation • Cross validation for small- medium data sets • 5x cost increase for sequential tuning process • manage in parallel / threaded environment
  13. 13. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING PARALLEL MODE: DATA PARALLEL VS. MODEL PARALLEL LHS GA GSS D NM … Node 1 Node 2 Node 3 Node k ... x f(x) Train Score LHS GA GSS D NM … Train Node 1 Score x f(x) Train Node 2 Score Train Node k Score ... Hybrid Solver Manager Hybrid Solver Manager ... Data Distributed / Parallel Model Parallel
  14. 14. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING “DATA PARALLEL” – DISTRIBUTED TRAIN, SEQUENTIAL TUNE 15 36 34 40 65 112 140 226 0 50 100 150 200 250 1 2 4 8 16 32 64 128 Time(seconds) Number of Nodes for Training IRIS Forest Tuning Time 105 Train / 45 Validate 0 100 200 300 400 500 600 1 2 4 8 16 32 64 Time(seconds) Number of Nodes for Training Credit Data Tuning Time 49k Train / 21k Validate
  15. 15. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING “DATA AND MODEL PARALLEL” – DISTRIBUTED TRAIN, PARALLEL TUNE • Distributed train, parallel tune: • Train: 4 workers • Tune: n parallel 0 2 4 6 8 10 12 2 4 8 16 32 Time(hours) Maximum Models Trained in Parallel Handwritten Data Train 60k / Validate 10k LHS GA GSS D NM … Train, Score Node 1 Node 2 Node 3 Node 4 Train, Score Node 5 Node 6 Node 7 Node 8 Hybrid Solver Manager ... ...
  16. 16. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING HYPERPARAMETER OPTIMIZATION RESULTS • Default Neural Network, 9.6 % error • Iteration 1 – Latin Hypercube best, 6.2% error • Very similar configuration can have very different error… • Best after tuning: 2.6% error 0 20 40 60 80 100 0 50 100 150 Error% Evaluation Iteration 1: Latin Hypercube Sample Error 0 2 4 6 8 10 12 0 5 10 15 20 Objective(MisclassificationError%) Iteration Handwritten Data Train 60k/Validate 10k Tuned Neural Net Hidden Layers 2 Neurons 200, 200 Learning Rate 0.012 Annealing Rate 2.3e-05
  17. 17. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. LSO FOR HYPER- PARAMETER TUNING MANY TUNING OPPORTUNITIES AND CHALLENGES • Initial Implementation • PROCs NNET, TREESPLIT, FOREST, GRADBOOST • SAS® Viya™ Data Mining and Machine Learning • Other search methods, extending hybrid solver framework • Bayesian / surrogate-based Optimization • New hybrid search strategies • Selecting best machine learning algorithm • Parallel tuning across algorithms & strategies for effective node usage • Combining models Better (generalized) Models = Better Predictions = Better Decisions
  18. 18. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. THANK YOU! Funda.Gunes@sas.com Patrick.Koch@sas.com sas.com/viya
  19. 19. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. Parallel&Serial,Pub/Sub, WebServices,MQs Source-based Engines Microservices UAA Query Gen Folders CAS Mgmt Data Source Mgmt Analytics GUIs etc.… BI GUIs Env Mgr Model Mgmt Log Audit UAAUAA Data Mgmt GUIs In-Memory Engine In-Cloud In-Database In-Hadoop In-Stream Solutions APIs Infrastructures Platforms SAS ® Viya ™ PLATFORM Analytics Data ManagementFraud and Security Intelligence Business Visualization Risk Management ! Customer Intelligence Cloud Analytics Services (CAS) DISTRIBUTED COMPUTING
  20. 20. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. EXPERIMENTAL RESULTS EXPERIMENT SETTINGS • Decision tree: • Depth • Number of bins for interval variables • Splitting criterion • Random Forest: • Number of trees • Number of variables at each split • Fraction of observations used to build a tree • Gradient Boosting: • Number of trees • Number of variables at each split • Fraction of observations used to build a tree • L1 regularization • L2 regularization • Learning rate • Neural Networks: • Number of hidden layers • Number of neurons in each hidden layer • LBFGS optimization parameters • SGD optimization parameters For this study: • Linux cluster with 144 nodes • Each node has 16 cores • Viya Data Mining and Machine Learning HYPERPARAMETERS

×