A Psychophysical Design towards Fair Bandwidth Allocation among VoIP Sessions
Mm presentation bkk
1. MODELING THE QOE OF
RATE CHANGES IN
SKYPE/SILK VOIP CALLS
CHIEN-NAN CHEN CING-YU CHU
SU-LING YEH
HAO-HUA CHU
POLLY HUANG
UNIVERSITY OF NATIONAL TAIWAN
ILLINOIS, URBANA- UNIVERSITY
CHAMPAIGN
1
3. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
VOICE OVER IP
Internet
Delay Jitter Packet Loss Bandwidth Fluctuation
3
4. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
RATE ADAPTATION
• Available bandwidth
Ramping up the sending rate
Is the quality improved proportionally?
• Available bandwidth
Tuning down the sending rate
Rate change Disturbing users?
4
5. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
GOAL
• Investigating the relationship of
Sending rate vs. Perceived quality
• To explore the influence of
Rate change magnitude/frequency
• Methodology
• Synthesized VoIP calls
• User study experiments
5
6. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
CONTRIBUTION
• Sending bitrate vs. user perception
Logarithmic Relationship
• Frequency of rate change
Logarithmic Relationship
• Magnitude of rate change
Complicated, but Interesting
• Closed-form models to predict user
perception under bandwidth fluctuation
6
7. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
PRELIMINARY EXPERIMENT
• To confirm the influence of
• sending bitrate
• rate change magnitude
• rate change frequency
• 5-level MOS (Mean Opinion Score)
• 14 participants
7
9. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
RESULT
FIXED-RATE
• MOS vs. sending bitrate
User Variation
Logarithmic
Trend
9
10. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
RESULT
VARIABLE-RATE
• MOS - ΔT plot
Rate change
matters!
10
11. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
EFFECT OF RATE CHANGE
FREQUENCY
• When ΔT varies…
Logarithmic
Trend
11
12. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
EFFECT OF RATE CHANGE
MAGNITUDE
• When sharing the same average bitrate…
Magnitude
MOS
12
13. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
EFFECT OF RATE CHANGE
MAGNITUDE
• However, with the same magnitude…
Higher (hr + lr)
Lower (hr + lr)
13
14. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
SHORT SUMMARY
• Fixed-rate
• MOS – bitrate logarithmic
• Variable-rate
• MOS – ΔT logarithmic
• MOS – (hr, lr)
• hr - lr up MOS down
• hr + lr up MOS up
14
15. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
PROPOSED MODELS
• Fixed-rate model
• Variable-rate model
Massive Data Numerical Fitting
15
16. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
LARGE-SCALE EXPERIMENT
• Same methodology
• 127 participants
• Each track is scored by 30 participants
• Rate selection
r9 r8 r7 r6 r5 r4 r3 r2 r1
5.6 6.1 7.1 8.5
10.714.119.4 27.7 40.6
Bitrate (kbps)
16
17. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
SIGNIFICANCE OF FACTORS
• ANOVA tests
• MOS – sending bitrate
Significant
• Interaction between ΔT and (hr, lr)
Significant
• MOS - ΔT
Test p-value Test p-value Test p-value
r1r2 .31 r6r7 .31 r7r8 .26
r3r4 .42 r6r8 .11 r7r9 .34
r4r5 .31 r6r9 .09 r8r9 .32
17
18. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
MODEL SPECIFICS
FIXED-RATE MODEL
• α=4.091, β=1.515, and γ=1.000
• with R-square = 0.96
Lower bound of user perception (?)
close to the lowest bitrate of SILK
18
19. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
MODEL SPECIFICS
VARIABLE-RATE MODEL
• Logarithmic regression on each (hr, lr)
pair
(r1, r2): p12 x ln(ΔT) + q12
(r1, r3): p13 x ln(ΔT) + q13
(r1, r4): p14 x ln(ΔT) + q14
(r1, r5): p15 x ln(ΔT) + q15
:
SCALE() : SHIFT()
:
19
20. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
MODEL SPECIFICS
SCALE()
• Polynomial regression
• x = hr – lr , y = hr + lr
20
21. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
MODEL SPECIFICS
SHIFT()
• Independent to ΔT
• Basic idea
• ΔT approaches the track duration
• Fluctuation diminishes
21
22. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
EVALUATION
GOODNESS OF FIT
• Training data
• R-square = 0.86
22
23. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
EVALUATION
ACCURACY OF PREDICTION
• 2 dataset independent to training data
• Dataset I: Preliminary experiment
• Dataset II: Additional (New) experiment
23
24. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
PESQ
• Perceptual Evaluation of Speech Quality
• Limited spectrum
• Narrow-band: 8k Hz
• Wide-band: 16k Hz
(SILK: 8k, 12k, 16k and 24 k Hz)
• Requires both original and degraded
audio files
24
25. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
COMPARISON WITH
PESQ – FIXED RATE
model Proposed PESQ model Proposed PESQ
R-square 0.9601 0.7841 Avg. Err. Ratio 3.68% 14.59%
25
26. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
COMPARISON WITH
PESQ – VARIABLE RATE
model Proposed PESQ model Proposed PESQ
R-square 0.2512 -0.3491 Avg. Err. Ratio 8.03% 12.60%
26
27. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
COMPARISON ON
AMR-WB
• AMR-WB audio codec
• Older Codec
• Widely used in 3G network
• 9 difference coding bitrates
• User study experiment
• Same methodology
• 14 participants
27
28. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
COMPARISON ON
AMR-WB
Proposed Proposed
PESQ PESQ
model Proposed PESQ model Proposed PESQ
R-square 0.7878 0.6289 Avg. Err. Ratio 2.18% 2.86%
28
29. Conclusion Evaluation Large Exp. Proposed Model Pre. Exp. Motivation
CONCLUSION
• The logarithmic relationship (Weber-Fechner
Law) is observed in the MOS-bitrate relation-
ship of Skype/SILK
• Rate change frequency (W-F Law) and
magnitude (complicated) have significant
influence on perceived quality
• We have established both fixed- (SIGCOMM’12
W-MUST) and variable-rate models
• User-centric rate adaptation for VoIP
applications (coming next)
29
Hello everyone, my name is Cing-Yu Chu. Today, on behalf of our research group, I am going to introduce our paper "Modeling the QoE of Rate Changes in SKYPE/SILK VoIP Calls". It is a joint work with Chien-Nan Chen who is now in UIUC, and Su-Ling Yeh, Hao-Hua Chu and Polly Huang from National Taiwan University.
Here is the outline of my presentation today. I will start with our motivation and then introduce the preliminary experiment we used to proposed the models for quality prediction. We then tried to find out the coefficients of these models using a large-scale experiment, and evaluate the derived models with both training data and other dataset that are independent to the training data.
So let's begin with what is Voice over IP, or VoIP, application. As shown in this slide, VoIP applications allow users to send their voice data or voice packet into the Internet. The Internet will then help to forward these data or packets to the receivers, or another end user. In this process, we can easily see that the quality of VoIP applications would be influenced by different network conditions such as delay, jitter and packet loss. And all of these can actually be attributed to the bandwidth fluctuation.
Therefore, in order to deal with the bandwidth fluctuation, the rate adaptation has long been a classical topic in VoIP application. Traditionally, to fully utilize the network resource, we ramp up the sending rate when there is extra bandwidth available. We believe this can improve the service quality. However, our concern here is that if the improved quality is proportional to the increased sending rate. On the other hand, when the available bandwidth is insufficient, we have to tune down the sending rate to avoid congestion or packet loss. Combining ramping up and tuning down the sending rate, we actually introduce the quality fluctuation, or we call it rate change event. So our second concern is we wonder if such rate change would disturb users and make them unhappy.
So, in this work, we tried to understand the relationship between the sending bitrate and the perceived quality. Also, we are interested in how the rate change, both magnitude and frequency, would influence the perceived quality. We answer these questions by synthesizing VoIP calls with different sending bitrate and rate change magnitude and frequency. And then we conducted a series of user study experiment to see how real human perceive these VoIP calls.
In the end of this work, we were able to find that the relationship between the sending bitrate and perceived quality is logarithmic. And such logarithmic relationship can also be observed in the influence of rate change frequency. As for the rate change magnitude, we found it to be kind of complicated but it's interesting, and we will talk about the detail later. After we had the above findings, we were able to derive closed-form models to predict the perceived quality.
All of our observations are based on a preliminary experiment. The purpose of this preliminary experiment is to confirm if sending bitrate, rate change magnitude/frequency would really influence the perceived quality. If yes, then how do they influence the perceived quality. In our work, we adopted the 5-level Mean Opinion Score (MOS) which is recommended by ITU, with 5 is the best quality while 1 is the worst. We use MOS to represent the perceived quality throughout our work. In this experiment, totally 14 particpiants were recruited.
Here we have to explain how did we produce the synthesized VoIP calls. Since Skype might be the most popular VoIP application, we chose it as our research target, and SILK is the audio codec adopted by Skype in its latest version. The most desirable property of SILK is that it allows arbitrary coding bitrate from 5 kbps to 40 kbps, which is very suitable for investigating the influence of sending rate and rate change. We used SILK to endcode and decode an original audio track. The length of this audio track is 30 seconds, and it is composed of several simple and meaningful sentences. After being processed by SILK, we could get the degraded audio tracks. These audio tracks could be classified into 2 categories. The first on is called fixed-rate tracks. We evenly chose 10 different rates between the maximum and minimum of SILK's coding bitrate, and we used these bitrate to encode and decode the audio files to form the fixed-rate tracks. The second on is called variable-rate tracks. A variable-rate track is defined by three parameters: high rate, low rate and delta T. We switched the coding bitrate between high rate and low rate every delta T period to introduce the quality fluctuation or rte change.
Here is the result of the fixed-rate case and the figure is the MOS-bitrate plot where the x-axis is the bitrate and the y-axis is the MOS. From this figure, we can see that there exists user variation which is indicated by the error bar of one standard deviation. However, we can still observe a trend if we look at the average user score which is indicated by the red points. The trend we found here is a logarithmic relationsip between the bitrate and the perceived quality.
As for the variable-rate case, the MOS-delta T plot here can tell us how the rate change influence the perceived quality. Again, the red points here are the average user scores, and they are the result of variable-rate tracks whose high rate is 40.6 and low rat is 17.2 kbps. We can clearly observe that the perceived quality changes when we vary the delta T, and the perceived quality becomes better when the delta T is bigger. Furthermore, the three horizontal lines in this figure are the quality of fixed-rate case with bitrate equals the high rate, low rate and the average of high rate and low rate. So it tells us that the quality of a variable-rate track is different from the quality of average bitrate. And also, even though the bitrate of this variable-rate track is never below the low rate, its quality could be worse than the low rate when the rate change is rapid. So, we can conclude that, the rate change plays an important role!
Then, let's look at the influence of frequency and magnitude separately. Here is the result of a few variable-rate tracks. Because the logarithmic regression on all these curves reveal good fitting result. We simply conclude that the influence of rate change frequency is logarithmic.
As for the rate change magnitude, here is the result of variable-rate tracks that shared the same average bitrate. But the upper curve has smaller rate change magnitude and its quality is always better than the lower one. This suggests that when the rate change magnitude is bigger, the quality is worse. However, does it mean the difference between high rate and low rate is the only factor that influence that quality? The answer is NO.
Here are the cases where delta T is 3 seconds. This time we plot x-axis as the difference between high rate and low rate while the y-axis is still the MOS. The red line is the result of tracks that have higher high rate and the blue line is the result of tracks that have lower high rate. So, from this figure, we can see that even though the rate difference is the same, tracks with both high rate and low rate are high have better quality. So we can conclude the quality is determined by only the rate difference but also the level of both high rate and low rate.
Here is the short summary of all the findings in the preliminary experiment. Based on all of these findings, we then proposed 2 models for quality prediction.
Because we have identified the relationship between bitrate and the perceived quality is logarithmic, we simply proposed a logarithmic model for the fixed-rate case. As for the variable-rate case, because we have found that the influence of rate change frequency is logarithmic, we then proposed a logarithmic model in this form and distributed the impact of high rate and low rate into 2 components: SCALE and SHIFT. So what we have to do next is to collect a lot of data and use these data to find out all the coefficients in the proposed models.
So, we have conducted a large-scale experiment. The methodology used here is similar to the preliminary experiment, but this time we have totally recruited 127 participants to make sure each audio track is scored by 30 participants to provide a more reliable result. And also, different from the the preliminary experiment, this time we chose the coding bitrate based on the perceived quality. Since we already have the result from the preliminary experiment, now we can divide the perceived quality evenly based on the curve and find out the corresponding bitrate. This can help us to collect more data points around the region MOS changes fast. It also allows us to examine the result from a quality perspective instead of only from bitrate.
After the data collection, we then applied ANOVA tests to check if the influence of each factor is significant. The result suggested that, the sending has significant influence. And, the interaction between delta T and high rate, low rate is also significant which supports the multiplication of deltat T and SCALE. As for the influence of delta T, we found in most cases, it's significant, however, when the quality of high rate and low rate are very close to each other, the influence in not significant because users seems to be unable to tell if there is a rate change or quality fluctuation.
ok, so, with the collected data, we have found the coefficients of the fixed-rate model with a high r-square value. With this model, we found there is something interestring. If we set the MOS to be 1 which means the worst quality, and try to find out the corresponding bitrate using our model. We can see this bitrate would be close to 5 kbps which is also the minimum of SILK's coding rate. So, we wonder if this means Skype is actually aware of where is the worst quality that users perceive? But of course, we don't know the answer.
As for the variable-rate case, because we need to have the gourd truth of both SCALE and SHIFT. We first grouped all the variable-rate tracks based on their high rate and low rate, which means each group would contain 5 different delta T. We then applied the logarithmic regression on each group and the result of these regression formed the ground truth of SCALE and SHIFT.
With this ground truth, we can now explore how SCALE and SHIFT interact with the high rate and low rate. As I mentioned earlier, the SCALE is not only determined by the rate difference but also the level of both high rate and low rate. So, we used 2 variables, x and y, to represent the difference and level of high rate and low rate. We then applied a polynomial regression on the collected data and got this 3D plot. With this 3D plot, we are albe to observe how the SCALE changes with difference high rate and low rate. In order to explain this figure easier and observe the trend clearly, we convert this 3D plot into a contour plot. Here, we can clearly see that the SCALE becomes larger when the rate difference increases. Actually, we would like to interpret the SCALE as the sensitivity to rate change. So a larger SCALE would amplify the influence of dealta T. On the other hands, a smaller SCALE means the sensitivity to rate change is low so that users can not perceive there is a rate change. From this contour, we can also observe that when the level of both high rate and low rate are higher, the SCALE is smaller. It is because that in such case, the quality of the high rate and low rate is quite close to each other, so users can not really tell the difference and leads to a lower sensitivity.
And the last part of the variable-rate model is SHIFT. It is a term that doesn't interact with delta T. Because the derivation of SHIFT is more like a pure numerical fitting task, we would like to suggest that people who are interested in this part can refer to our paper for more detail.
To evaluate our model, we first check if the derived model can capture the training data properly. Here the x-axis is the predicted score while the y-axis is the average user score, so the diagonal black line represents perfect prediction. As we can see from this figure, all the points are dense around the perfect prediction which is also supported by a high r-square value. This means our model is able to capture the user perception well.
To see if our model is still applicable when the content or participants are different, we used 2 dataset that are independent to the training dataset. The first one is the data collected from the preliminary experiment, the second one is an additional experiment. And the audio content of this new dataset is different from the preliminary and large-scale experiment. The prediction accuracy is illustrated in this figure and it suggests that our model is actually robust enough when the participant or the audio content is different.
So, in the end of my presentation, I would like conclude our work with that, we have identified the relationship between the bitrate and the perceived quality is logarithmic which actually echoes the well-known psychophysics law call Weber-Fechner Law. This law describe that the relationship between the intensity of stimulus and human perception is logarithmic. So, in our case, we can regard the bitrate as a kind of stimulus. And we further explored the influence of rate change, both frequency and magnitude. We then derived closed-form models for quality prediction. So, we are now interested if we can use these models to design a user-centric rate adaptation mechanism for VoIP applications, and it is also what we are working on.
Thanks for your attention and I am willing to take question.