SlideShare a Scribd company logo
1 of 37
Hyperparameter Search
wxshi
1
1.超参优化算法
2.超参优化工具
3.多标签分类实验结果
2
 传统手动调参,不能保证得到最佳的参数组合,也会消耗更多的时间
 两种常用的自动调参的方式:
• 并行搜索(parallel search):Grid Search,Random Search 等,不能利用相互之间的参数优化信息
• 序列优化(sequential optimisation):Bayesian Optimization 等,耗时
Random
Search
Grid
Search
Bayesian
Optimization
HyperBand
Population-
based
Training
Hyperparameter search
3
为网格中指定的所有给定超参数值的每个排列构建模
型,评估并选择最佳模型
Benefit:
• Explainable
• Easily parallelizable
Problem:
尝试每一种超参组合,并更具交叉验证分数选择最佳
组合,耗时低效
Hyperparameter search--Grid Search
4
随机搜索从超参数空间中随机选择参数组合,参数由
给定的固定迭代次数的情况下选择
Benefit:
• Better coverage on important parameters Easily
• Easily parallelizable
• Hard to beat on high dimensions
Problem:
• 不能保证给出的是最佳的参数组合
• 耗时低效
Hyperparameter search--Random Search
5
 优化问题
Hyperparameter search--Bayesian Optimization
一组超参组合𝑋 = 𝑥1, 𝑥2, … , 𝑥𝑛,假设超参函数与模型优化地损失函数存在函数关系𝑓: 𝑥 → 𝑅,需要在𝑥 ∈ 𝑋
内找到
𝑥
∗
= 𝑎𝑟𝑔𝑚𝑖𝑛𝑓(𝑥)
在每一个Iteration t = 1, … , T 中,𝑥𝑡 ∈ 𝑋对应𝑓 𝑥𝑡 ,但大多情况下,只能观测到一个噪声值𝑦𝑡 = 𝑓 𝑥𝑡 + 𝜖,
𝜖 ∈ 𝑁 0, 𝜎2 ,加入观测数据集𝐷1:𝑡 = 𝑥1, 𝑦1 , … , 𝑥𝑡, 𝑦𝑡
• f is explicitly unknown and multimodal.
• Evaluations of f may be perturbed.
• Evaluations of f are expensive.
如何选择xt ?
6
Hyperparameter search--Bayesian Optimization
 General idea: surrogate modelling
1. Use a surrogate model of f to carry out the optimization.
2. Define an utility function to collect new data points
satisfying some optimality criterion: optimization as
decision.
3. Study decision problems as inference using the surrogate
model: use a probabilistic model able to calibrate both,
epistemic and aleatoric uncertainty.
贝叶斯优化是一类近似逼近的方法,用各种代理函数(代理模型)来拟合超参数与模型评价之间的关系,
然后选择有希望的超参数组合进行迭代,最后得出效果最好的超参数组合。
7
 Bayesian Optimization
Hyperparameter search--Bayesian Optimization
BO的核心
• 概率代理模型
高斯过程,random forest、t-student process等
• 采集函数
EI,MPI,LCB,KG,ES等
8
tradeoff between improving on an already good point and
evaluation new points in under explored areas
Problem:
• 对于具有未知平滑度和有噪声的高维/非凸函数,这
类算法很难进行拟合和优化
• 结合了启发式算法的BO算法,很难做到并行化
• 效果不稳定
• 需要消耗大量资源及时间
Hyperparameter search--Bayesian Optimization
Benefit:
• Can utilize prior information
• Semi-Parallelizable
11
Hyperparameter search--HyperBand
• 根据之前的搜索结果,推断出下一个值得探索的位置,更快找到最优解,如贝叶斯优化算法
• 加快在探索过程中每种超参组合的评价速度,以在相同的时间完成更多组超参的探索和评价,从
而更快找到超参数的目的,如HyperBand 12
 HyperBand
Intuition:
• Compare relative performance
• Terminate bad performing trials
• Continue better trials for longer period of time
Notes:
• Can be combined with Bayesian Optimization
• Can be easily parallelized
Hyperparameter search--HyperBand
框架:SUCCESSIVE HALVING
• requires the number of configurations n and budget B
as an input
• n vs
𝐵
𝑛
13
 HyperBand
Hyperparameter search--HyperBand
• r: 单个超参数组合实际所能分配的预算;
• R: 单个超参数组合所能分配的最大预算;
• smax: 用来控制总预算的大小。
• B: 总共的预算
• η: 用于控制每次迭代后淘汰参数设置的比例
• get_hyperparameter_configuration(n):采样得到n组
不同的超参数设置
• run_then_return_val_loss(t,ri):根据指定的参数设置
和预算计算valid loss。L表示在预算为ri的情况下
各个超参数设置的验证误差
14
𝐴 = 𝜋𝑟2
(a)中的序列优化过程只有一个模型在不断优化,消耗大量时间。
(b)中的并行搜索可以节省时间,但是相互之间没有任何交互,不利于信息利用。
(c)PBT算法结合了二者的优点
Hyperparameter search--PBT
15
 PBT
Hyperparameter search--PBT
16
Hyperparameter search--PBT
 Exploit
 Truncation Selection:
• Rank all agents in the population
• If the current agent is in the bottom 20% of the population,
sample another agent uniformly from the top 20% of the
population and copy its weights and hyperparameters
 Binary Tournament / T-Test Selection
• Uniformly sample another agent in the population
• If the sampled agent has a higher score, the weights and
hyperparameters are copied to replace the current agent
 Explore
 Perturb
• Each hyperparameter is independently randomly
perturbed by a factor of 1.2 or 0.8
 Resample
• Each hyperparameter is resampled from the
original prior distribution defined with some
probability
17
Hyperparameter search--PBT
• the hyperparameters are clearly being
focused on the best part of the sampling
range, and adapted over time
• agents which are lucky in environment
exploration are quickly propagated to more
workers, meaning that all members of the
population benefit from the exploration luck
of the remainder of the population.
18
Hyperparameter search--PBT
19
 Population-based training
Main idea:
• Evaluate a population in parallel
• Terminate lowest performers
• Copy weights of the best performers and mutates hyperparameters
Benefits:
• Easily parallelizable
• Can search over ‘schedules’
• Terminates bad performers
Hyperparameter search
21
https://docs.ray.io/en/latest/tune/index.html
https://github.com/ray-project/ray
https://deephyper.readthedocs.io/en/latest/
https://github.com/deephyper/deephyper
https://www.wandb.com/
https://github.com/wandb/client
https://docs.determined.ai/latest/tutorials/quick-start.html
https://github.com/determined-ai/determined
https://github.com/topics/hyperparameter-search
https://parameter-sherpa.readthedocs.io/en/latest/
https://github.com/sherpa-ai/sherpa
https://www.neuraxle.org/stable/index.html
https://github.com/Neuraxio/Neuraxle
https://developer.nvidia.com/blog/powering-automl-enabled-
ai-model-training-with-clara-train/
超参优化工具
22
https://districtdatalabs.silvrback.com/parameter-tuning-with-hyperopt
https://github.com/hyperopt
HyperOpt
https://optuna.readthedocs.io/en/stable/
https://github.com/optuna/optuna
https://github.com/tykimos/keras-tuner
Keras-Tuner
https://github.com/automl/HpBandSter
HpBandSter
超参优化工具
23
ray.readthedocs.io/en/latest/tune.html
Tune
Ray.tune: Distributed Hyperparameter Search
https://docs.ray.io/en/latest/index.html
 Ray: 是一种高度集成的Automl框架
 Ray.tune: 基于Ray分布式计算框架,集
成了多种超参优化方法,是拓展性强的
超参优化工具
24
Ray.tune
• 可扩展的搜索算法实现,如基于模型的优化(HyperOpt)和HyperBand
• 与可视化工具集成,如TensorBoard,rllab的VisKit和平行坐标可视化
• 灵活的试验性变量生成,包括网格搜索,随机搜索和条件参数分布
• 资源感知调度,包括支持并行运行的算法,这些算法本身可以并行和分布
25
• Tune 可接受用户定义的python function或者class
• 支持多个Trails并行运算(Trail由Schedulers进行安排和管理)
底层是可选的参数搜索算法
上层是Tune两种模
型训练方式的API
Ray.tune
trail:每组超参配置(hyperparameter configurations)组成的评估
26
 Tune两种模型训练方式的API:
Ray.tune
27
Ray.tune
• 定义神经网络函数,如my_train_func
• 编写实例参数(包括算法参数)—搜索空间,如train_spec
• 选择调度算法,如
scheduler = HyperBandScheduler(
time_attr="training_iteration", # 时间单位,绑定max_t
reward_attr="mean_accuracy", # 目标值
max_t=400)
• 选择搜索算法,如
algo = HyperOptSearch(space, max_concurrent=4, reward_attr="mean_accuracy")
• 调用搜索和调度算法,运行各个参数实例
tune.run_experiments(train_spec, search_alg=algo, scheduler=scheduler)
• 从搜索过程中获取最佳模型
best_model = Tune.get_best_model(My_Model._build_model(), trials, metric="mean_accuracy")
train_spec = {
"run": my_train_func,
"trial_resources": {"cpu": 20,"gpu": 2 }, # 异步搜索算法下,有
几个gpu就可以同时训练几个模型
"stop": { # 停止条件(可以指定里面的任何指标)
"mean_accuracy": 0.2,
"training_iteration": 10, # 迭代次数
"stop_loss_num": 5, }, #loss有5次不下降我们就结束
"config": {
"checkpoint_dir": "checkpoint_dir",
"epochs": 1,
"batch_size": 64,
"lr": grid_search([10**-4, 10**-5]), # 使用网格参数搜索
"decay": lambda spec: spec.config.lr / 100.0, # Tune还支
持用户指定的lambda函数的采样参数
"dropout": grid_search([0.25, 0.5]), },
"num_samples": 4, #这指定了计划运行的试验次数,从超参数
空间中采样的次数,而不是批次的大小。 }
28
Ray.tune
TrialRunner是最核心的数据结构,它管理一系列的
Trial对象,并且执行一个事件循环,将这些任务通
过TrialExecutor提交到Ray cluster运行
RayTrialExecutor会负责资源的管理
TrialScheduler:调度器
SearchAlgorithm:搜索算法,
(默认为BasicVariantGenerator)
主要用于产生新的参数
• Ray.tune主流程
29
 Tune Algorithm Offerings
Trial Schedulers
Provided
Search Algorithms
Provided
• Population-based Training
• HyperBand
• ASHA
• Median-stopping Rule
• BOHB
• HyperOpt(TPE)
• Bayesian Optimization
• SigOpt
• Nevergrad
• Scikit-Optimize
• Ax/Botorch (PyTorch BayesOpt)
Ray.tune
scheduler=Scheduler(metric="accuracy", mode="max") 30
 PopulationBasedTraining (PBT)
Ray.tune.schedulers
• 实现了基于种群的训练(PBT)算法。
• 并行地训练一组模型
• 性能差地模型定期clone性能最好地模型的状态,并对其超参进行random mutation,以期获得更好地模型
• Unlike other hyperparameter search algorithms, PBT mutates hyperparameters during training time
• If the number of trials exceeds the cluster capacity,they will be time-multiplexed as to balance training progress
31
Ray.tune.schedulers
TrialRunner会调用PopulationBasedTraining的on_trial_result()函数,其主要流程
如下:
1. 如果离上次扰动的时间还没到指定间隔,则返回让该Trial继续训练。
2. 调用_quantiles()函数按设定的比例__quantile_fraction得到所有Trial中表现
好的头部和表现不好的尾部。
3. 如果当前trial是比较好的那一批,那存成checkpoint,等着被其它trial克隆
学习。
4. 如果很不幸地,当前trial属于比较差的那一批,那就从好的那批中随机挑
一个(为trial_to_clone),然后调用_exploit()函数。该函数会调用explore()
函数对trial_to_clone进行扰动,然后将它的参数设置和checkpoint设置到当
前trial。这样,当前trial就“洗心革面”,重新出发了。
5. 如果TrialRunner中有PENDING和PAUSED状态的trial,则请求暂停当前trial
,让出资源。否则的话就继续训练着
 PopulationBasedTraining (PBT)
32
实验数据
1533
755
597
516
574
440
525
0 200 400 600 800 1000 1200 1400 1600 1800
opacity
diabetic retinopathy
glaucoma
macular edema
macular degeneration
retinal vascular occlusion
normal
总的数据标签分布
kaggle公开集:vietai-advance-retinal-disease-detection-2020
训练集:2577
验证机:859
33
实验结果
baseline
Population based training
(num_sample=20) time=11h
Hyperband (num_samples=50)
time = 12h
BO(num_samples=50)
time = 29h
Label sp se f1 matrix sp se f1 matrix sp se f1 matrix sp se f1 matrix
o 0.81 0.90 0.83
398 93
38 330
0.86 0.92 0.88
424 67
29 339
0.81 0.96 0.87
399 92
14 354
0.86 0.91 0.87
422 69
33 335
dr 0.96 0.84 0.84
642 29
30 158
0.99 0.89 0.92
661 10
20 168
0.97 0.87 0.88
653 18
25 163
0.98 0.86 0.89
658 13
26 162
g 0.99 0.69 0.79
702 10
45 102
0.97 0.88 0.88
693 19
17 130
0.98 0.81 0.85
699 13
28 119
0.99 0.83 0.88
703 9
25 122
me 0.95 0.74 0.71
700 40
31 88
0.95 0.78 0.74
701 39
26 93
0.95 0.79 0.74
700 40
25 94
0.95 0.75 0.73
704 36
30 89
md 0.96 0.84 0.84
680 25
24 130
0.96 0.89 0.86
679 26
17 137
0.97 0.87 0.87
684 21
20 134
0.97 0.85 0.86
685 20
23 131
rvo 0.98 0.71 0.77
745 12
30 72
0.99 0.85 0.88
748 9
15 87
0.99 0.81 0.87
751 6
19 83
0.99 0.84 0.88
749 8
16 86
n 1 0.92 0.96
727 0
11 121
0.99 0.94 0.96
724 3
8 124
0.99 0.93 0.96
725 2
9 123
0.99 0.95 0.94
718 9
6 126
mean 0.95 0.806 0.82 -- 0.96 0.879 0.874 -- 0.95 0.863 0.863 -- 0.96 0.856 0.864 --
kaggl
e
0.8099 0.86501 0.85883 34
Population based training
(num_sample=10) time=6h
Population based training
(num_sample=20) time=11h
Label precision recall f1 matrix sp se f1 matrix
o 0.83 0.94 0.87
409 82
22 346
0.86 0.92 0.88
424 67
29 339
dr 0.95 0.87 0.85
639 32
24 164
0.99 0.89 0.92
661 10
20 168
g 0.98 0.81 0.84
696 16
28 119
0.97 0.88 0.88
693 19
17 130
me 0.93 0.76 0.69
688 52
28 91
0.95 0.78 0.74
701 39
26 93
md 0.96 0.88 0.86
680 25
19 135
0.96 0.89 0.86
679 26
17 137
rvo 0.99 0.8 0.85
749 8
20 82
0.99 0.85 0.88
748 9
15 87
n 0.99 0.94 0.95
721 6
8 124
0.99 0.94 0.96
724 3
8 124
mean 0.95 0.857 0.844 -- 0.96 0.879 0.874 --
kaggl
e
0.84426 0.86501
实验结果
35
Hyperband (num_samples=50)
time = 12h 8gpu
HB_aug_without_pretrain_model
(num_samples=50)
HB_aug_with_pretrain_model
(num_samples=50)
time = 17h 6gpu
Label sp se f1 matrix sp se f1 matrix sp se f1 matrix
o 0.81 0.96 0.87
399 92
14 354
0.88 0.91 0.88
430 61
33 335
0.87 0.91 0.88
428 63
32 336
dr 0.97 0.87 0.88
653 18
25 163
0.97 0.85 0.88
654 17
28 160
0.97 0.86 0.88
653 18
27 161
g 0.98 0.81 0.85
699 13
28 119
0.99 0.82 0.87
702 10
27 120
0.98 0.81 0.85
699 13
28 119
me 0.95 0.79 0.74
700 40
25 94
0.96 0.79 0.78
711 29
25 94
0.97 0.76 0.77
715 25
29 90
md 0.97 0.87 0.87
684 21
20 134
0.97 0.88 0.88
687 18
19 135
0.96 0.89 0.86
679 26
17 137
rvo 0.99 0.81 0.87
751 6
19 83
0.99 0.85 0.88
749 8
15 87
0.99 0.86 0.89
750 7
14 88
n 0.99 0.93 0.96
725 2
9 123
0.99 0.95 0.96
722 5
6 126
0.99 0.94 0.96
724 3
8 124
mean 0.951 0.863 0.863 -- 0.96 0.864 0.876 -- 0.961 0.866 0.87 --
kaggl
e
0.85883
实验结果
36
PBT (num_samples=20)
实验结果
BO (num_samples=20)
37
hb_aug (num_samples=50) 过拟合
实验结果
Hyperband (num_samples=50)
38
hb_aug_with_pretrain_model
(num_samples=50)
实验结果
39
40

More Related Content

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

hyperparamater search netowrk technnique

Editor's Notes

  1. 为了解决上述的优化问题,一个最简单的想法,就是把对于所有满足约束条件的函数进行采样,然后分别找出这些函数的最小值位置,把最常出现位置作为预测的最小值位置。举一个例子,在[0,1]区间上找到使得函数f(x)取最小值的位置,在4个位置得到了对应的f(x), 我们不知道真实的f(x)函数到底是什么样的,但是满足这4个点在函数上的约束条件的函数中,在这些函数的最小值位置出现的频率最多的,就可以当做是预测的最小值得位置,比如直方图中的0.58,0.7的位置都是最有可能出现预测最小值的位置。 贝叶斯优化就是采用一种近似逼近的方法。。。。
  2. 贝叶斯优化使用回归方法创建目标函数的模型,使用该模型选择要获取的下一个点,然后更新模型。之所以叫贝叶斯优化,是因为该算法通过使用已获取数据的似然性和函数类型的先验来计算目标函数的后验分布,从而选择下一个点。
  3. 贝叶斯优化本质上是一种回归模型,利用回归模型预测的函数值来选择下一个搜索点。
  4. 目前现有的超参数优化算法相对于之前的随机搜索或者网格搜索,主要在两个方向做出努力。一种是以贝叶斯优化算法为代表的优化算法,根据之前的搜索结果,推断出下一个值得探索的位置,更快找到最优解。另一个是加快在走索过程中每种超参组合的评价速度,以在相同的时间完成更多组超参的探索和评价,从而更快找到超参数的目的 (a)热图显示了二维搜索空间上的验证错误,红色表示验证错误较低的区域,并且假定的配置按顺序选择,如数字所示。 (b)该图显示了验证错误与分配给每个配置(即,图中的每一行)的资源的关系。 配置评估方法将更多资源分配给有前途的配置。
  5. HyperBand是基于SUCCESSIVE HALVING扩展的,做的事情大致是这样的,对于一系列超参数组合分配相同的资源,然后对这些超参数组合进行评价,扔掉表现较差的一半超参数,重复上述流程直到只剩下一组超参数。这个算法需要超参数组合的总数量n与总资源B作为输入,这样,每个待评价的组合平均分配B/n​的资源。但是,对于一个确定的B来说,我们很难知道是应该选用较大的n 配备较少的资源还是较小的n配置较多的资源,哪种选择会得到更好的结果。
  6. 找最优参数,同时需要考虑时间计算资源等因素。 Hyperband对于一个确定的B,在可行的n范围内对n进行网格搜索。Hyperband 需要两个输入,R,对于一种超参组合最多被分配的资源,n控制在每轮中保留的超参数组合的比例。 Get_hyperparameter_configuration是指按照超参数空间定义的某种分布,例如均匀分布,这个分布越倾向于高质量的参数,hyperband的结果越好 Run_then_return _val_loss, 根据输入的超参数集合以及对应的loss,返回表现排名靠前k的超参 HyperBand做的事情就是权衡,预设尽可能多的超参数组合数量,并且每组超参数所分配的预算也尽可能的多,从而确保尽可能地找到最优超参数。
  7. PBT 其实是2017年的时候deepmind提出来的一种超参搜索方法。 (a)序列优化要完成多次训练(可能需要提前停止),选择新的超参并使用新的超参从头开始训练模型。这个固定的顺序过程,尽管使用最少的计算资源,但是消耗大量的时间。 (b)对超参数的并行随机/网格搜索会同时训练多个具有不同权重初始化和超参数的模型,并认为其中一种模型将得到最佳优化。这仅需要一次训练时间,但是需要使用更多的计算资源来并行训练许多模型。 (c)PBT开始于并行搜索,随机采样超参数和权重初始化。但是,每个训练运行都会异步地定期评估其性能。如果总体中的某个模型表现不佳,它将通过替换为性能更好的模型来利用总体中的其余部分,并且在继续训练之前,它将通过修改更好的模型的超参数来探索新的超参数。此过程允许在线优化超参数,并将计算资源集中在最有可能产生良好结果的超参数和权重空间上。
  8. PBT是一种异步自动调参方法,PBT的主要流程为 首先PBT算法随机初始化多个模型,每训练一段时间设置一个检查点(checkpoint),然后根据其他模型的好坏调整自己的模型。若自己的模型较好,则继续训练。若不好,则替换(exploit)成更好的模型参数,并添加随机扰动(explore)再进行训练。其中checkpoint的设置是人为设置每过多少step之后进行检查。扰动要么在原超参数或者参数上加噪声,要么重新采样获得。 Step:对模型训练一步。至于一步是一次iteration还是一个epoch还是其它可以根据需要指定。 Eval:在验证集上做评估。 Ready: 在上一次做过该操作后,经过指定的时间或者迭代次数,选取群体中的一个模型来进行下面的exploit和explore操作(即perturbation)。 Exploit: 将那些经过评估比较烂的模型用那些比较好的模型替代。 Explore: 对上一步产生的复制体模型加随机扰动,如加上随机值或重采样
  9. Exploit:论文中每个实验都采用了两种Exploit策略,一种是截断选择,根据得分对所有的agents进行排序,如果当前agent位于population的最后20%,则从总排名中最高的20%中统一采样另一个agent,并复制其权重和超参数。另一种是T检验选择或者Binary Tournament,T检验方法是,统一抽样集群中的其他agent,并使用Welch T检验比较最后10个得分,如果采样的agent具有较高的平均得分,并满足t检验,则将复制权重和超参数。Binary Tournament 的方法为,集群中的每一个成员随机地选择另一个成员,如果另一个成员得分更高时复制其所有参数。 Explore:在论文中的一个实验中也尝试了两种策略,一种是扰动。每个超参独立的受到1.2或者0.8的扰动影子随机扰动。另外一种就是重采样,每个超参都已一定的概率从原始的先验分布中重采样。
  10. 这个是论文中的几个不同实验的对比结果。左图蓝色曲线是用PBT的训练曲线,黑色曲线是不用PBT的训练训练曲线。细线代表每个成员,而粗线代表每个时间步长中前5名的成员的平均值。每个曲线图旁边的小图表示集群中超参在整个训练过程中是如何调整的,包括学习率衰减等,可以看出,超参数集种在采样范围的最佳部分,并随时间调整。由于PBT在exploit阶段复制了好的agent的权重,使得所有的成员因此受益。
  11. (a)集群大小对PBT性能的影响,可以看出,较小的size会导致交高的方差和次优结果,size>=20时会得到一致的改善。 (b)对比了使用不同exploit的效果 (c)仅对超参执行PBT,即不会对成员之间的模型权重进行复制,或仅对模型权重执行PBT,也就是不对成员之间的超参进行复制,以及对成员之间的超参和模型权重都进行复制,发现,确实是超参和模型的组合性能更佳。 (d)由于PBT允许在训练过程中在线调整超参数,对比完整的PBT性能和使用训练结束时发现的PBT超参集来评估调整的重要性。结论是PBT的优势在于使超参数具有适应性,而不仅仅是在超参空间上找到一个好的先验。
  12. PBT是一种异步自动调参方法,会并行训练一批随机初始化模型,在这个过程中它会周期性地将表现好的模型替换表现不好的模型,同时再加上随机扰动。它在训练过程中对超参进行调节,因此可以快速地发现超参和优异地schedule。
  13. Ray用于构建和运行分布式应用程序的框架,tune时基于ray的分布式计算框架,继承了多种超参优化方法,是一种拓展性强的超参优化工具。
  14. Tune可接受用户定义的Python function或class,并根据从超参空间中取出的一组超参配置(hyperparameter configurations)对其进行评估;每组超参配置(hyperparameter configurations)组成的评估可称为一次Trail,并且Tune支持多个Trails并行运行。其中配置(configuration)可以从Tune中生成,也可以从用户指定的搜索算法中获得。而Trail由Schedulers进行安排和管理
  15. 基于function的API,需要使用reporter监控模型状态,每一个批次返回一次指标情况,监控指标可以自定义; 继承Trainable class的API,需要覆盖类方法,一般来说,只需实现_train、_save和_restore子类即可
  16. Tune提供了许多算法来优化超参数搜索。