Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

                                     User Satisfaction
                        Sheng‐Wei (Kuan‐Ta) Chen 
        Are users satisfied with our system?
               User survey
               Market response
QoE metrics

       FTP applications: data throughput rate
       Web applications: response time and page load time
What path should Skype choose?


QoS and QoE 
       QoS (Quality of service)
             The quality level of “native” performance metric
A typical relationship between QoS and QoE

                             Hard to tell “very bad”
Mapping between QoS and QoE

    Which QoS metric is most influential on users’ perceptions 
How to measure QoE: A quick review

       Subjective evaluation procedures
             Human studies, not scalable
Subjective Evaluation Procedures

       Single Stimulus Method (SSM)
       Single Stimulus Continuous 
       Quality Ev...
Objective Evaluation Methods

Refereneced models
  speech‐layer model: PESQ (ITU‐T P.862)
     Compare original and degrad...
Our goals

   An objective QoE assessment framework
        passive measurement (thus scalable)
        easy to construct ...
Our contributions

          USI          =        2.15 × log(bit rate) − 1.55 × log(jitter)
Talk outline

               The Question
Setting things up


                               Port Mirrori...
Capturing Skype traffic

   1. Identify Skype hosts and ports
             Track hosts sending http to “”
Extracting Skype calls

   1. Take these sessions
             Average packet rate within (10, 100) pkt/sec
Probing RTTs

       As we take traces
       Send ICMP ping, application‐level ping & traceroute
Trace Summary

                                                                 Direct sessions

Talk outline

               The Question
The intuition behide our analysis

       The conversation quality (i.e., QoE) perceived by call 
       parties is more o...
First, getting a better sense

                                 call duration                                   (QoE)

Is call duration related to each factor?

       For each factor
             Scatter plot of the factor to the call durat...
Call duration vs. jitter
                                                                       (std dev of received bytes...
Effect of Jitter – Hypothesis Testing

                 The probability distribution of hanging up a call

Effect of Source Rate
                                                       (the bandwidth Skype intended to use)
The better sense

                           call duration

     correlated?                                             ...
Linear regression?

       Assumptions no longer hold
             errors are not independent and not n...
Cox regression modeling
   The Cox regression model provides a good fit
       the effect of treatment on patients’ surviv...
Functional Form Checks

                     h(t|Z) ∝ exp(β t Z)
          Assumption                                must ...
The Logarithm Fits Better (Bit rate)

           After taking logarithm …

Kuan‐Ta Chen  / Quantifying Skype User Sa...
The Logarithm Fits Better (Jitter)

           After taking logarithm …

Kuan‐Ta Chen  / Quantifying Skype User Sati...
Final model & interpretation
                           variable              coef            std. err.   signif.
Hang‐up rate and USI

    Hang-up rate =
    2.15 × log(bit rate) − 1.55 × log(jitter) − 0.36 × RTT

    User satisfacti...
Average session time (min)
                                      Actual and Predicted Time vs. USI

Kuan‐Ta Chen  / Qua...
The multi‐path scenario


          path          avail ba...
Talk outline

               The Question
User satisfaction: Validation

                       Call duration

User satisfaction: One step further

                        Call duration

Identifying talk bursts

   The problem
       Every voice packet is encrypted with 256‐bit AES 
       (Advanced Encrypti...
What we need to achieve

       Input: a time series of packet sizes

       Output: estimated ON/OFF periods (ON = tal...
Speech activity detection

   1.     Wavelet de‐noising
             Removing high‐frequency fluctuations
   2.     Detect...
Speech detection algorithm: Validation
   The speech detection algorithm is validated with:
       synthesized sin waves (...
Validation with synthesized sin waves
                                                                 true ON periods
Validation with speech recordings
                                                                 true ON periods
Speech interactivity analysis

                     Avg. Response Delay:
USI vs. Speech interactivity

           higher USI                             higher USI              higher USI 
Talk outline

               The Question

       should put more attention to delay jitters (rather then 
       focus on network delay only)
   QoE‐aware systems that can optimize user experience in 
     run time
       Is it worth to sacrifice 20 m...
Future work (1)
       larger data sets (p2p traffic is hard to collect)
       diverse locations

Future work (2)
   Beyond “call duration”
                                                                 Call Behavior
Thank You!

   Sheng‐Wei (Kuan‐Ta) Chen
Upcoming SlideShare
Loading in …5

Quantifying Skype User Satisfaction


Published on

The success of Skype has inspired a generation of peer-to-peer-based solutions for satisfactory real-time multimedia services over the Internet. However, fundamental questions, such as whether VoIP services like Skype are good enough in terms of user satisfaction, have not been formally addressed. One of the major challenges lies in the lack of an easily accessible and objective index to quantify the degree of user satisfaction.

In this work, we propose a model, geared to Skype, but generalizable to other VoIP services, to quantify VoIP user satisfaction based on a rigorous analysis of the call duration from actual Skype traces. The User Satisfaction Index (USI) derived from the model is unique in that 1) it is composed by objective source- and network-level metrics, such as the bit rate, bit rate jitter, and round-trip time, 2) unlike speech quality measures based on voice signals, such as the PESQ model standardized by ITU-T, the metrics are easily accessible and computable for real-time adaptation, and 3) the model development only requires network measurements, i.e., no user surveys or voice signals are necessary. Our model is validated by an independent set of metrics that quantifies the degree of user interaction from the actual traces.

Published in: Technology
  • Be the first to comment

Quantifying Skype User Satisfaction

  1. 1. Quantifying User Satisfaction Sheng‐Wei (Kuan‐Ta) Chen  Institute of Information Science, Academia Sinica Collaborators: Chun‐Ying Huang Polly Huang Chin‐Laung Lei (National Taiwan University) 2008/10/30
  2. 2. Motivation Are users satisfied with our system? User survey Market response User satisfaction metric To make a system self‐adaptable in real time for better user  experience User satisfaction metric Need of a Quality‐of‐Experience (QoE) metric! Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 2
  3. 3. QoE metrics FTP applications: data throughput rate Web applications: response time and page load time VoIP applications: voice quality (fidelity, loudness, noise),  conversational delay, echo Online games: interactivity, responsiveness, consistency,  fairness QoE is multi‐dimensional esp. for  real‐time interactive applications! Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 3
  4. 4. What path should Skype choose? Internet Which path is “the best”? path avail bandwidth loss rate delay 10 Kbps 2% 100 ms 20 Kbps 1% 300 ms 30 Kbps 3% 500 ms Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 4
  5. 5. QoS and QoE  QoS (Quality of service) The quality level of “native” performance metric Communication networks: delay, loss rate Voice/audio codec: fidelity DBMS: query completion time QoE (Quality of experience) How users “feel” about a service Usually multi‐dimensional, and tradeoffs exist between  different dimensions (download time vs. video quality,  responsivess vs. smoothness) However, a unified (scalar) index is normally desired! Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 5
  6. 6. A typical relationship between QoS and QoE Hard to tell “very bad” from “extremely bad” QoE Marginal benefit is small QoS,  e.g., network bandwidth Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 6
  7. 7. Mapping between QoS and QoE Which QoS metric is most influential on users’ perceptions  (QoE)? Source rate? Loss? Delay? Jitter? Combination of the above?  Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 7
  8. 8. How to measure QoE: A quick review Subjective evaluation procedures Human studies, not scalable Costly! Objective evaluation procedures Statistical models based on subjective evaluation results Pros: Computation without human involvement Cons: (Over‐)simplifications of model parameters E.g., use a single “loss rate” to capture the packet loss process E.g., assume every voice/video packet is equally important  Not consider external effects such as loudness and quality of handsets Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 8
  9. 9. Subjective Evaluation Procedures Single Stimulus Method (SSM) Single Stimulus Continuous  Quality Evaluation (SSCQE) Double Stimulus Continuous  Quality Scale (DSCQS) Double Stimulus Impairment  Scale (DSIS) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 9
  10. 10. Objective Evaluation Methods Refereneced models speech‐layer model: PESQ (ITU‐T P.862) Compare original and degraded signals Unreferenced models (no original signals required) speech‐layer model: P.VTQ (ITU‐T P.563) Detect unnatural voices, noise, mute/interruptions in degraded signals network‐layer model:  E‐model (ITU‐T G.107) Regression model based on delay, loss rate, and 20+ variables Equations are over‐complex for physical interpretation, e.g. ∙ ¸ Xolr 8 1 Xolr Is = 20 {1 + ( ) }8 − 8 8 Xolr = OLR + 0.2(64 + N o − RLR)
  11. 11. Our goals An objective QoE assessment framework passive measurement (thus scalable) easy to construct models (for your own application) easy to access input parameters easy to compute in real time Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 11
  12. 12. Our contributions USI = 2.15 × log(bit rate) − 1.55 × log(jitter) − 0.36 × RTT bit rate: data rate of voice packets jitter: receiving rate jitter (level of network congestion) RTT: round‐trip times between two parties An index for Skype user satisfaction derived from real‐life Skype call sessions verified by users’ speech interactivities in calls accessible and computable in real time Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 12
  13. 13. Talk outline The Question Measurement Modeling Validation Significance Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 13
  14. 14. Setting things up Uplink Port Mirroring Relayed Traffic Traffic Monitor L3 switch Dedicated Skype node Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 14
  15. 15. Capturing Skype traffic 1. Identify Skype hosts and ports Track hosts sending http to “” Track their ports sending UDP within 10 seconds (host, port) Other parties which communicate with discovered host‐ port pairs 2. Record packets Whose source or destination ∈ these (host, port) Reduce the # of traced packets to 1‐2% Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 15
  16. 16. Extracting Skype calls 1. Take these sessions Average packet rate within (10, 100) pkt/sec Average packet size within (30, 300) bytes For longer than 10 seconds 2. Merge two sessions into one relay session  If the two sessions share a common relay node Their start and finish time are close to each other with 30  seconds And their packet rate series are correlated Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 16
  17. 17. Probing RTTs As we take traces Send ICMP ping, application‐level ping & traceroute Exponential intervals Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 17
  18. 18. Trace Summary Direct sessions Campus Network Internet Uplink Port Mirroring Relayed Traffic Traffic Monitor L3 switch Relayed sessions Dedicated Skype node Category Calls Hosts Avg. Time Direct 253 240 29 min Relayed 209 369 18 min Total 462 570 24 min Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 18
  19. 19. Talk outline The Question Measurement Modeling Validation Significance Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 19
  20. 20. The intuition behide our analysis The conversation quality (i.e., QoE) perceived by call  parties is more or less related to the call duration The network conditions of a VoIP call are independent of importance of talk content call parties’ schedule call parties’ talkativeness other incentives to talk (e.g., free of charge) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 20
  21. 21. First, getting a better sense call duration (QoE) correlated? source rate service level relayed? TCP / UDP? jitter network quality RTT (QoS factors) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 21
  22. 22. Is call duration related to each factor? For each factor Scatter plot of the factor to the call duration See whether they are positively, negatively, or not correlated Hypothesis tests Confirm whether they are indeed positively, negatively, or not  correlated Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 22
  23. 23. Call duration vs. jitter (std dev of received bytes/sec) 95% confidence band avg call duration (min) of the average average jitter (Kbps) There are short calls with low jitters The average shows a negative correlation between the 2 variables Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 23
  24. 24. Effect of Jitter – Hypothesis Testing The probability distribution of hanging up a call Null Hypothesis All the survival curves are equivalent Log‐rank test: P < 1e‐20 We have > 99.999% confidence claiming  jitters are correlated with call duration Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 24
  25. 25. Effect of Source Rate (the bandwidth Skype intended to use) Average session time (min) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 25
  26. 26. The better sense call duration correlated? source rate positive service level relayed? negative TCP / UDP? none jitter negative network quality RTT negative (non‐significant) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 26
  27. 27. Linear regression? No! Reasons Assumptions no longer hold errors are not independent and not normally distributed variance of errors are not constant Censorship There are calls that have been going on for a while There are calls that have not yet finished by the time we terminate tracing We can’t simply discard these calls Otherwise we end up with a biased set of calls  with limited call duration Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 27
  28. 28. Cox regression modeling The Cox regression model provides a good fit the effect of treatment on patients’ survival time log‐hazard function is proportional to the weighted sum of factors log h(t|Z) ∝ β t Z Z : factors (bit rate=x, jitter=y, RTT=z, …) β : weights of factors Hazard function (conditional failure rate)  The instantaneous rate at which failures occur for observations that have  survived at time t Pr[t ≤ T < t + ∆t|T ≥ t] h(t) = lim ∆t→0 ∆t Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 28
  29. 29. Functional Form Checks h(t|Z) ∝ exp(β t Z) Assumption                                must be conformed Explore “true” functional forms of factors by  Human beings are known sensitive to the scale of physical  generalized additive models of the quantity quantity rather than the magnitude • Scale of sound (decibels vs. intensity)  • Musical staff for notes (distance vs. frequency)  Bit rate and jitter  log scale • Star magnitudes (magnitude vs. brightness)  Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 29
  30. 30. The Logarithm Fits Better (Bit rate) After taking logarithm … Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 30
  31. 31. The Logarithm Fits Better (Jitter) After taking logarithm … Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 31
  32. 32. Final model & interpretation variable coef std. err. signif. log(bit rate) ‐2.15 0.13 < 1e‐20 log(jitter) 1.55 0.09 < 1e‐20 RTT 0.36 0.18 4.3e‐02 Interpretation A: bit rate = 20 Kbps B: bit rate = 15 Kbps, other factors same as A The hazard ratio between A and B can be computed by  exp((log(15) – log(20)) × ‐2.15) ≈ 1.86 The probability B will hang up is 1.86 times the probability A will do  so at any instant. Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 32
  33. 33. Hang‐up rate and USI Hang-up rate = 2.15 × log(bit rate) − 1.55 × log(jitter) − 0.36 × RTT User satisfaction index (USI) = −Hang-Up Rate Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 33
  34. 34. Average session time (min) Actual and Predicted Time vs. USI Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 34
  35. 35. The multi‐path scenario Internet path avail bandwidth jitter RTT USI 10 Kbps 2 Kbps 100 ms 3.84 20 Kbps 1 Kbps 300 ms 6.33 30 Kbps 3 Kbps 500 ms 5.43 BUT, is call hang‐up rate a good indication of user satisfaction? Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 35
  36. 36. Talk outline The Question Measurement Modeling Validation Significance Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 36
  37. 37. User satisfaction: Validation Call duration intuition: call duration <‐> satisfaction not confirmed yet Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 37
  38. 38. User satisfaction: One step further Call duration now we’re going to check! Speech interactivity intuition:  interactive and tight speech  ? activities in a cheerful conversation Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 38
  39. 39. Identifying talk bursts The problem Every voice packet is encrypted with 256‐bit AES  (Advanced Encryption Standard) Possible solutions packet rate: no silence suppression in Skype packet size: our choice Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 39
  40. 40. What we need to achieve Input: a time series of packet sizes Output: estimated ON/OFF periods (ON = talk / OFF=  silence) Time Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 40
  41. 41. Speech activity detection 1. Wavelet de‐noising Removing high‐frequency fluctuations 2. Detect peaks and dips 3. Dynamic thresholding Deciding the beginning/end of a talk burst Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 41
  42. 42. Speech detection algorithm: Validation The speech detection algorithm is validated with: synthesized sin waves (500 Hz – 2000 Hz) real speech recordings Force packet size processes contaminated by serious network  impairment (delay and loss) play sound >> capture packet  relay node  (chosen by Skype) size processes average  RTT: 350 ms jitter: 5.1 Kbps Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 42
  43. 43. Validation with synthesized sin waves true ON periods estimated ON periods 3 times for each of 10 test cases correctness (ratio of matched 0.1‐second periods): 0.73 – 0.92  Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 43
  44. 44. Validation with speech recordings true ON periods estimated ON periods 3 times for each of 3 test cases correctness (ratio of matched 0.1‐second periods): 0.71 – 0.85  Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 44
  45. 45. Speech interactivity analysis Responsiveness: Avg. Response Delay: Avg. Burst Length: Responsiveness: whether the other party responds Response delay: how long before the other party responds Burst length: how long does a speech burst last Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 45
  46. 46. USI vs. Speech interactivity higher USI  higher USI  higher USI  higher responsiveness shorter response delay shorter burst length All are statistically significant (at 0.01 significance level) Speech interactivity in conversation supports the proposed  USI Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 46
  47. 47. Talk outline The Question Measurement Modeling Validation Significance Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 47
  48. 48. Implications should put more attention to delay jitters (rather then  focus on network delay only) and the encoding bit rate! Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 48
  49. 49. Significance QoE‐aware systems that can optimize user experience in  run time Is it worth to sacrifice 20 ms latency for reducing 10 ms  jitters (say, with a de‐jitter buffer)? Pick the most appropriate parameters in run time playout scheduling (buffer time) coding scheme (& rate) source rate data path (overlay routing) transmission scheme (redundacy, erasure coding, …) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 49
  50. 50. Future work (1) Measurement larger data sets (p2p traffic is hard to collect) diverse locations Validation user studies comparison with existing models (PESQ, etc) Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 50
  51. 51. Future work (2) Beyond “call duration” Call Behavior Who hangs up a call? Call disconnect‐n‐connect behavior More sophisticated modeling Voice codec Pricing effect ? Time‐of‐day effect Time‐dependent impact Kuan‐Ta Chen  / Quantifying Skype User Satisfaction (MRA 2008) 51
  52. 52. Thank You! Sheng‐Wei (Kuan‐Ta) Chen