SlideShare a Scribd company logo
1 of 27
Download to read offline
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
MediaEval 2014: 
THU-HCSIL Approach to Emotion in Music Task 
using Multi-level Regression 
Yuchao Fan, Mingxing Xu 
THU-HCSIL Team 
Tsinghua University 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Outline 
! Feature Extraction 
 Feature selection 
 Feature combination 
! Regression 
 Data Observation 
 Framework of Proposed Approach 
! Evaluation 
! Conclusion 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Feature Selection and Combination 
! Original Feature —— EMO-LARGE features 
 Provided by organizers 
 Extracted by OpenSmile 
 6552 Dimensions 
 56 Low-Level Descriptors 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Feature Selection and Combination 
! Feature Selection by Ranking Feature Importance 
 Tree-based Estimators – Random Forest  Extra-trees 
! A multitude of decision trees, compute feature importance through permutation 
! Different node-splitting strategy 
! Toolbox : scikit-learn 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
 Selected Feature 
RF 
n F 
! : the first n important features in Random Forest (RF) 
ET 
n F 
! : the first n important features in Extra-Trees (ET) 
! Selected Feature (SLT): 
SLT { RF ET } 
n n n F = F ∩F
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Feature Selection and Combination 
Selected Feature Arousal Valence 
n 1280 1280 
Dimension 661 522 
RMSE THU feature performance for Arousal 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
0.19 
0.185 
0.18 
0.175 
0.17 
80 160 320 640 1280 2560 5120 n
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Feature Selection and Combination 
! Supplement Feature 
 Music-related, total 217 dimensions 
! THU feature set = {Selected Feature ∪ Supplement Feature} 
 Arousal: 878 dim (661 + 217) 
 Valence: 739 dim (522 + 217) 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Dev. Set 
Arousal 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
Valence
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution (4x4) 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution (6x6) 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution (8x8) 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution (10x10) 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Global Reg. VS Local Reg. 
! Arousal RMSE distribution 
 Left: Global Reg, RMSE Avg. = 0.1764 
 Right: Local Reg, RMSE Avg. = 0.1490 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Global Reg. VS Local Reg. 
! Valence RMSE distribution 
 Left: Global Reg, RMSE Avg. = 0.1785 
 Right: Local Reg, RMSE Avg. = 0.1517 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Global Reg. VS Local Reg. 
SYSTEM RMSE AROUSAL VALENCE 
GLOBAL REG. 0.1764 0.1785 
LOCAL REG. 0.1490 0.1517 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Regression Results 
result∈[−0.5,−0.25) 
Mean = -0.2770, Std = 0.1439 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Regression Results 
result∈[−0.25,0) 
Mean = -0.0827, Std = 0.1816 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Regression Results 
result∈[0,0.25) 
Mean = 0.1548, Std = 0.2304 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Regression Results 
result∈[0.25,0.5) 
Mean = 0.3697, Std = 0.1917 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Ground Truth Distribution of Regression Results 
Range 
1 
Range 
2 
Range 
3 
Range 
4 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Framework of Proposed Approach 
Training 
G R 
L R 
1 
L R 
2 
噯噯 
L R 
6 
Testing 
Training Set 
Tr X 
Boundary 
Compution 1 
Tr X 
Tr X 
2 
噯噯 
Tr X 
6 
Booster 
Trainer 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
Test sample 
G R L 
i R 
Subset 
Selection 
i   Ind(x)
噯噯 
Selection 
Boundary 
Compution X 
1 
Tr X 
Booster 
Trainer 
R 
1 
L R 
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Tr X 
6 
噯噯 
L R 
6 
2 
噯噯 
2 
噯噯 
Framework of Proposed Approach 
Testing 
i   Ind(x) 
L R 
6 
Testing 
Tr X 
6 
G R L 
i R 
G R L 
i R 
Test sample 
x Regression r0 Regression 
Test x Regression r0 Regression 
r1 
r1 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
Subset 
Selection 
i   Ind(x) 
Post- 
Smoothing
Table 2: Regression results with different models. 
ET = Extra-Trees, RF = Random Forest. 
Model 
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Arousal Valence 
MAE RMSE MAE RMSE 
Algorithm Details 
Tool 
Linear 0.159 0.203 0.158 0.199 
Pace 0.153 0.195 0.154 0.197 WEKA 
M5Rule 0.154 0.195 0.159 0.205 
ET 0.150 0.188 0.149 0.191 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
China (973 Program, 
2014, Barcelona, Spain 
improved by reducing the scope of model, we propose a 2- 
level regression method in this section. Figure 1 shows the 
framework of this method. 
The first-level regression is just a naive procedure com-monly 
used in regression module. For x in test set X, we 
predict the test result r0 = f(x,RG) by model RG, a global 
model trained with entire training dataset XTr. 
For each emotion dimension, we divide [−0.75, 0.75] (based 
on the ground truth distribution of XTr) into scikit-RF 0.142 0.177 0.143 0.183 
6 bins learn 
with 
equal Booster length 0.141 (0.25). 0.176 For x in 0.138 X, its bin 0.178 index i = XGBoost 
Ind(x) = 
⌈(f(x,RG)+0.75)/0.25⌉. The 2nd-level regression result of x 
is r1 = f(x,RL 
i ), where RL 
i is the local model for the ith bin 
trained by the subset of training dataset XTr 
i , which is de-termined 
by XTr 
i = {x|x ∈ XTr and g(x) ∈ [ai, bi]}, where 
g(x) is the ground truth of x and the boundaries {ai, bi} are 
computed on dev set XD as follows: 
We first define Gi = {g(x)|Ind(x) = i, x ∈ XD}. Af-ter 
investigating the distribution of Gi, we then select ai = 
min{g|CDFi(g) ≥ 1.5%} and bi = max{g|CDFi(g) ≤ 98.5%}, 
where CDFi is the cumulative distribution function for Gi, 
to eliminate outliers and fix the confidence level of dev set 
XD at 97%. 
4. RESULTS AND CONCLUSIONS 
Results in this section are all modeled on THU feature set 
and experiments are all for dynamic task. 
• Dynamic Run Run Run result results rrun3(Run • Static Run Run 2. 
Result metric. not considering Performance module 5. REFERENCES 
[1] A. Music Workshop, [2] Eyben, Opensmile: audio international [3] Breiman,
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
r 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
x 0 r 
1 Figure 1: Process framework for multi-level regres-sion 
system. 
Table 2: Regression results with different models. 
ET = Extra-Trees, RF = Random Forest. 
Arousal Valence 
Model 
Tool 
MAE RMSE MAE RMSE 
Linear 0.159 0.203 0.158 0.199 
Pace 0.153 0.195 0.154 0.197 WEKA 
M5Rule 0.154 0.195 0.159 0.205 
ET 0.150 0.188 0.149 0.191 
scikit-learn 
RF 0.142 0.177 0.143 0.183 
Booster 0.141 0.176 0.138 0.178 XGBoost 
g(x) is the ground truth of x and the boundaries {ai, bi} are 
computed on dev set XD follows: 
We first define Gi = {g(x)|Ind(x) = i, x ∈ XD}. Af-ter 
investigating the distribution of Gi, we then select ai = 
min{g|CDFi(g) ≥ 1.5%} and bi = max{g|CDFi(g) ≤ 98.5%}, 
where CDFi is the cumulative distribution function for Gi, 
Dynamic 
Static and run regression • Dynamic Run Run Run result results rrun3(Run • Static Run Run 2. 
Result metric. not considering Performance module 5. REFERENCES 
How to Select Regression Models ? 
Performance of Different Regression Models 
• 5-fold cross validation is used and is song 
independent (separated by song IDs randomly) 
• 1/5 of training set is used as Dev. dataset
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
One-level VS Multi-level Regression 
NOTES: XGBoost, an open-source toolbox, was used for 
regression in both cases. 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
RL 
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Evaluation on Test Data 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University 
RL 
1 
RL 
2 
噯噯 
RL 
6 
Testing 
r 
1 L 
i R 
level regres-sion 
different models. 
Tool 
Regression Strategy 
RMSE 
Arousal Valence 
One-level 0.176 0.178 
Multi-level 0.175 0.177 
Table 4: Results on required test data. 
Task Run 
Arousal Valence 
APC RMSE APC RMSE 
Dynamic 
baseline 0.180 0.270 0.110 0.190 
1 0.128 0.124 0.064 0.100 
2 0.127 0.125 0.074 0.099 
3 0.170 0.119 0.091 0.094 
4 0.167 0.120 0.098 0.094 
Static 5 0.770 0.107 0.459 0.097 
RL 
噯噯 
RL 
and run 5 is for static task. XGBoost lib is employed for 
regression in all runs. 
• Dynamic task (subtask 2): 
Run 1. One-level regression. 
Run 2. Two-level regression. 
Run 3. Run 1 + smooth. For one clip, we replace seg-ment 
result in Run 1 with the average of 4 adjacent segment 
Testing 
Tr 
Tr 
Tr 
Booster 
Trainer 
RG 
1 
2 
6 
Regression 1 r 
i Ind(x) 
framework for multi-level regres-sion 
results with different models. 
= Random Forest. 
Valence 
Tool 
MAE RMSE 
0.158 0.199 
0.154 0.197 WEKA 
0.159 0.205 
0.149 0.191 
scikit-learn 
0.143 0.183 
0.138 0.178 XGBoost 
regression and multi-level regression. 
Regression Strategy 
RMSE 
Arousal Valence 
One-level 0.176 0.178 
Multi-level 0.175 0.177 
Table 4: Results on required test data. 
Task Run 
Arousal Valence 
APC RMSE APC RMSE 
Dynamic 
baseline 0.180 0.270 0.110 0.190 
1 0.128 0.124 0.064 0.100 
2 0.127 0.125 0.074 0.099 
3 0.170 0.119 0.091 0.094 
4 0.167 0.120 0.098 0.094 
Static 5 0.770 0.107 0.459 0.097 
and run 5 is for static task. XGBoost lib is employed for 
regression in all runs. 
• Dynamic task (subtask 2): 
Run 1. One-level regression. 
Run 2. Two-level regression. 
Run 3. Run 1 + smooth. For one clip, we replace seg-ment 
result in Run 1 with the average of 4 adjacent segment 
results and itself as following: 
rrun3(i) = mean(rrun1 
i−2 , ..., rrun1 
i+2 ), i = 3, 4, ..., 88. 
Run 4. Run 2 + smooth. 
• Static task (subtask 1): 
Run 5. For each clip, average over 90 segment results from 
Run 2. 
Result shows that our result outperforms baseline on RMSE 
NOTES:
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Conclusion 
! Randomized decision trees based feature ranking 
for feature selection 
! Multi-level regression proposed, which consists of 
a global regression and several local regressions 
! Post-smoothing adopted for prediction refinement 
In f uture, we will focus on more sophisticated method to define neighbor 
set for training local regression models and different smoothing strategies 
for post-processing. 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University
Lab 
of 
Human 
Computer 
Speech 
Interaction, 
Key 
Lab 
of 
Pervasive 
Computing 
Questions ? / Comments ? / Suggestions ? 
THANKS 
Department 
of 
Computer 
Science 
 
Technology, 
Tsinghua 
University

More Related Content

Viewers also liked

UPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection TaskUPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection Task
multimediaeval
 

Viewers also liked (14)

04 sem cert hparticipation_final
04 sem cert hparticipation_final04 sem cert hparticipation_final
04 sem cert hparticipation_final
 
UNED @ Retrieving Diverse Social Images Task
UNED @ Retrieving Diverse Social Images TaskUNED @ Retrieving Diverse Social Images Task
UNED @ Retrieving Diverse Social Images Task
 
4845 Programa de Embajadores Rotarios 2016 2017
4845 Programa de Embajadores Rotarios 2016 20174845 Programa de Embajadores Rotarios 2016 2017
4845 Programa de Embajadores Rotarios 2016 2017
 
4845 Distrito Rotary International Argentina Paraguay Presentación
4845 Distrito Rotary International Argentina Paraguay Presentación 4845 Distrito Rotary International Argentina Paraguay Presentación
4845 Distrito Rotary International Argentina Paraguay Presentación
 
LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014
 
Overview of the MediaEval 2014 Visual Privacy Task
Overview of the MediaEval 2014 Visual Privacy Task Overview of the MediaEval 2014 Visual Privacy Task
Overview of the MediaEval 2014 Visual Privacy Task
 
Synchronizing Multi-User Photo Galleries with MRF
Synchronizing Multi-User Photo Galleries with MRFSynchronizing Multi-User Photo Galleries with MRF
Synchronizing Multi-User Photo Galleries with MRF
 
UPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection TaskUPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection Task
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014
 
Emotion in Music Task at MediaEval 2014
Emotion in Music Task at MediaEval 2014Emotion in Music Task at MediaEval 2014
Emotion in Music Task at MediaEval 2014
 
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata TaskStravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
 
RECOD at MediaEval 2014: Violent Scenes Detection Task
RECOD at MediaEval 2014: Violent Scenes Detection TaskRECOD at MediaEval 2014: Violent Scenes Detection Task
RECOD at MediaEval 2014: Violent Scenes Detection Task
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
 

Similar to MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression

Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
Hayaru SHOUNO
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
André Panisson
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
cscpconf
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 

Similar to MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression (20)

Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Bat algorithm and applications
Bat algorithm and applicationsBat algorithm and applications
Bat algorithm and applications
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
Human robot
Human robotHuman robot
Human robot
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
R Basics
R BasicsR Basics
R Basics
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
A Hybrid Bat Algorithm
A Hybrid Bat AlgorithmA Hybrid Bat Algorithm
A Hybrid Bat Algorithm
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
 

More from multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
drm1699
 

Recently uploaded (20)

Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 

MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression

  • 1. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression Yuchao Fan, Mingxing Xu THU-HCSIL Team Tsinghua University Department of Computer Science Technology, Tsinghua University
  • 2. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Outline ! Feature Extraction Feature selection Feature combination ! Regression Data Observation Framework of Proposed Approach ! Evaluation ! Conclusion Department of Computer Science Technology, Tsinghua University
  • 3. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Feature Selection and Combination ! Original Feature —— EMO-LARGE features Provided by organizers Extracted by OpenSmile 6552 Dimensions 56 Low-Level Descriptors Department of Computer Science Technology, Tsinghua University
  • 4. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Feature Selection and Combination ! Feature Selection by Ranking Feature Importance Tree-based Estimators – Random Forest Extra-trees ! A multitude of decision trees, compute feature importance through permutation ! Different node-splitting strategy ! Toolbox : scikit-learn Department of Computer Science Technology, Tsinghua University Selected Feature RF n F ! : the first n important features in Random Forest (RF) ET n F ! : the first n important features in Extra-Trees (ET) ! Selected Feature (SLT): SLT { RF ET } n n n F = F ∩F
  • 5. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Feature Selection and Combination Selected Feature Arousal Valence n 1280 1280 Dimension 661 522 RMSE THU feature performance for Arousal Department of Computer Science Technology, Tsinghua University 0.19 0.185 0.18 0.175 0.17 80 160 320 640 1280 2560 5120 n
  • 6. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Feature Selection and Combination ! Supplement Feature Music-related, total 217 dimensions ! THU feature set = {Selected Feature ∪ Supplement Feature} Arousal: 878 dim (661 + 217) Valence: 739 dim (522 + 217) Department of Computer Science Technology, Tsinghua University
  • 7. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Dev. Set Arousal Department of Computer Science Technology, Tsinghua University Valence
  • 8. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution (4x4) Department of Computer Science Technology, Tsinghua University
  • 9. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution (6x6) Department of Computer Science Technology, Tsinghua University
  • 10. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution (8x8) Department of Computer Science Technology, Tsinghua University
  • 11. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution (10x10) Department of Computer Science Technology, Tsinghua University
  • 12. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Global Reg. VS Local Reg. ! Arousal RMSE distribution Left: Global Reg, RMSE Avg. = 0.1764 Right: Local Reg, RMSE Avg. = 0.1490 Department of Computer Science Technology, Tsinghua University
  • 13. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Global Reg. VS Local Reg. ! Valence RMSE distribution Left: Global Reg, RMSE Avg. = 0.1785 Right: Local Reg, RMSE Avg. = 0.1517 Department of Computer Science Technology, Tsinghua University
  • 14. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Global Reg. VS Local Reg. SYSTEM RMSE AROUSAL VALENCE GLOBAL REG. 0.1764 0.1785 LOCAL REG. 0.1490 0.1517 Department of Computer Science Technology, Tsinghua University
  • 15. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Regression Results result∈[−0.5,−0.25) Mean = -0.2770, Std = 0.1439 Department of Computer Science Technology, Tsinghua University
  • 16. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Regression Results result∈[−0.25,0) Mean = -0.0827, Std = 0.1816 Department of Computer Science Technology, Tsinghua University
  • 17. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Regression Results result∈[0,0.25) Mean = 0.1548, Std = 0.2304 Department of Computer Science Technology, Tsinghua University
  • 18. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Regression Results result∈[0.25,0.5) Mean = 0.3697, Std = 0.1917 Department of Computer Science Technology, Tsinghua University
  • 19. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Ground Truth Distribution of Regression Results Range 1 Range 2 Range 3 Range 4 Department of Computer Science Technology, Tsinghua University
  • 20. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Framework of Proposed Approach Training G R L R 1 L R 2 噯噯 L R 6 Testing Training Set Tr X Boundary Compution 1 Tr X Tr X 2 噯噯 Tr X 6 Booster Trainer Department of Computer Science Technology, Tsinghua University Test sample G R L i R Subset Selection i Ind(x)
  • 21. 噯噯 Selection Boundary Compution X 1 Tr X Booster Trainer R 1 L R Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Tr X 6 噯噯 L R 6 2 噯噯 2 噯噯 Framework of Proposed Approach Testing i Ind(x) L R 6 Testing Tr X 6 G R L i R G R L i R Test sample x Regression r0 Regression Test x Regression r0 Regression r1 r1 Department of Computer Science Technology, Tsinghua University Subset Selection i Ind(x) Post- Smoothing
  • 22. Table 2: Regression results with different models. ET = Extra-Trees, RF = Random Forest. Model Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Arousal Valence MAE RMSE MAE RMSE Algorithm Details Tool Linear 0.159 0.203 0.158 0.199 Pace 0.153 0.195 0.154 0.197 WEKA M5Rule 0.154 0.195 0.159 0.205 ET 0.150 0.188 0.149 0.191 Department of Computer Science Technology, Tsinghua University China (973 Program, 2014, Barcelona, Spain improved by reducing the scope of model, we propose a 2- level regression method in this section. Figure 1 shows the framework of this method. The first-level regression is just a naive procedure com-monly used in regression module. For x in test set X, we predict the test result r0 = f(x,RG) by model RG, a global model trained with entire training dataset XTr. For each emotion dimension, we divide [−0.75, 0.75] (based on the ground truth distribution of XTr) into scikit-RF 0.142 0.177 0.143 0.183 6 bins learn with equal Booster length 0.141 (0.25). 0.176 For x in 0.138 X, its bin 0.178 index i = XGBoost Ind(x) = ⌈(f(x,RG)+0.75)/0.25⌉. The 2nd-level regression result of x is r1 = f(x,RL i ), where RL i is the local model for the ith bin trained by the subset of training dataset XTr i , which is de-termined by XTr i = {x|x ∈ XTr and g(x) ∈ [ai, bi]}, where g(x) is the ground truth of x and the boundaries {ai, bi} are computed on dev set XD as follows: We first define Gi = {g(x)|Ind(x) = i, x ∈ XD}. Af-ter investigating the distribution of Gi, we then select ai = min{g|CDFi(g) ≥ 1.5%} and bi = max{g|CDFi(g) ≤ 98.5%}, where CDFi is the cumulative distribution function for Gi, to eliminate outliers and fix the confidence level of dev set XD at 97%. 4. RESULTS AND CONCLUSIONS Results in this section are all modeled on THU feature set and experiments are all for dynamic task. • Dynamic Run Run Run result results rrun3(Run • Static Run Run 2. Result metric. not considering Performance module 5. REFERENCES [1] A. Music Workshop, [2] Eyben, Opensmile: audio international [3] Breiman,
  • 23. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing r Department of Computer Science Technology, Tsinghua University x 0 r 1 Figure 1: Process framework for multi-level regres-sion system. Table 2: Regression results with different models. ET = Extra-Trees, RF = Random Forest. Arousal Valence Model Tool MAE RMSE MAE RMSE Linear 0.159 0.203 0.158 0.199 Pace 0.153 0.195 0.154 0.197 WEKA M5Rule 0.154 0.195 0.159 0.205 ET 0.150 0.188 0.149 0.191 scikit-learn RF 0.142 0.177 0.143 0.183 Booster 0.141 0.176 0.138 0.178 XGBoost g(x) is the ground truth of x and the boundaries {ai, bi} are computed on dev set XD follows: We first define Gi = {g(x)|Ind(x) = i, x ∈ XD}. Af-ter investigating the distribution of Gi, we then select ai = min{g|CDFi(g) ≥ 1.5%} and bi = max{g|CDFi(g) ≤ 98.5%}, where CDFi is the cumulative distribution function for Gi, Dynamic Static and run regression • Dynamic Run Run Run result results rrun3(Run • Static Run Run 2. Result metric. not considering Performance module 5. REFERENCES How to Select Regression Models ? Performance of Different Regression Models • 5-fold cross validation is used and is song independent (separated by song IDs randomly) • 1/5 of training set is used as Dev. dataset
  • 24. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing One-level VS Multi-level Regression NOTES: XGBoost, an open-source toolbox, was used for regression in both cases. Department of Computer Science Technology, Tsinghua University
  • 25. RL Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Evaluation on Test Data Department of Computer Science Technology, Tsinghua University RL 1 RL 2 噯噯 RL 6 Testing r 1 L i R level regres-sion different models. Tool Regression Strategy RMSE Arousal Valence One-level 0.176 0.178 Multi-level 0.175 0.177 Table 4: Results on required test data. Task Run Arousal Valence APC RMSE APC RMSE Dynamic baseline 0.180 0.270 0.110 0.190 1 0.128 0.124 0.064 0.100 2 0.127 0.125 0.074 0.099 3 0.170 0.119 0.091 0.094 4 0.167 0.120 0.098 0.094 Static 5 0.770 0.107 0.459 0.097 RL 噯噯 RL and run 5 is for static task. XGBoost lib is employed for regression in all runs. • Dynamic task (subtask 2): Run 1. One-level regression. Run 2. Two-level regression. Run 3. Run 1 + smooth. For one clip, we replace seg-ment result in Run 1 with the average of 4 adjacent segment Testing Tr Tr Tr Booster Trainer RG 1 2 6 Regression 1 r i Ind(x) framework for multi-level regres-sion results with different models. = Random Forest. Valence Tool MAE RMSE 0.158 0.199 0.154 0.197 WEKA 0.159 0.205 0.149 0.191 scikit-learn 0.143 0.183 0.138 0.178 XGBoost regression and multi-level regression. Regression Strategy RMSE Arousal Valence One-level 0.176 0.178 Multi-level 0.175 0.177 Table 4: Results on required test data. Task Run Arousal Valence APC RMSE APC RMSE Dynamic baseline 0.180 0.270 0.110 0.190 1 0.128 0.124 0.064 0.100 2 0.127 0.125 0.074 0.099 3 0.170 0.119 0.091 0.094 4 0.167 0.120 0.098 0.094 Static 5 0.770 0.107 0.459 0.097 and run 5 is for static task. XGBoost lib is employed for regression in all runs. • Dynamic task (subtask 2): Run 1. One-level regression. Run 2. Two-level regression. Run 3. Run 1 + smooth. For one clip, we replace seg-ment result in Run 1 with the average of 4 adjacent segment results and itself as following: rrun3(i) = mean(rrun1 i−2 , ..., rrun1 i+2 ), i = 3, 4, ..., 88. Run 4. Run 2 + smooth. • Static task (subtask 1): Run 5. For each clip, average over 90 segment results from Run 2. Result shows that our result outperforms baseline on RMSE NOTES:
  • 26. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Conclusion ! Randomized decision trees based feature ranking for feature selection ! Multi-level regression proposed, which consists of a global regression and several local regressions ! Post-smoothing adopted for prediction refinement In f uture, we will focus on more sophisticated method to define neighbor set for training local regression models and different smoothing strategies for post-processing. Department of Computer Science Technology, Tsinghua University
  • 27. Lab of Human Computer Speech Interaction, Key Lab of Pervasive Computing Questions ? / Comments ? / Suggestions ? THANKS Department of Computer Science Technology, Tsinghua University