Network and Multimedia QoE Management

2,520 views
2,357 views

Published on

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,520
On SlideShare
0
From Embeds
0
Number of Embeds
140
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • http://www.youtube.com/watch?v=AoNrBs1wC4U&feature=related
  • Network and Multimedia QoE Management

    1. 1. Network and Multimedia QoE Management Sheng-Wei (Kuan-Ta) Chen Institute of Information Science Academia Sinica
    2. 2. What is QoE? Quality of Experience = User Satisfaction in Using Computer/Communication Systems
    3. 3. What is QoE Management? Measurement Provisioning Measure user satisfaction Improve system design to provide more satisfactory user experience
    4. 4. Goal of QoE Management To Provide Satisfactory User Experience in Computer/Communication Systems
    5. 5. Motivating Example <ul><li>Network/computation resource is not infinite  conflicting goals everywhere </li></ul>Which path is “the best”? Internet voice data 3% 1% 2% loss rate 500 ms 30 Kbps 300 ms 20 Kbps 100 ms delay 10 Kbps avail bandwidth path
    6. 6. Conflicting Goals in Video Conferencing <ul><li>Audio quality vs. video quality </li></ul><ul><li>Audio/video quality vs. real-timeliness </li></ul>(time-lagged) (low resolution)
    7. 7. Challenges <ul><li>Hard to measure and quantify users’ perception </li></ul><ul><ul><li>not directly observable, massively multidimensional </li></ul></ul><ul><li>Hard to reduce the system’s parameter space </li></ul><ul><ul><li>Network factors (delay, loss, jitter, …) </li></ul></ul><ul><ul><li>Transmission factors (redundancy, compression, …) </li></ul></ul><ul><ul><li>Codec factors (lots of codec-depending parameters) </li></ul></ul><ul><li>Hard to measure and quantify the environment that may affect users’ experience </li></ul><ul><ul><li>ambient noise </li></ul></ul><ul><ul><li>quality of headset </li></ul></ul><ul><ul><li>distance from viewer to display </li></ul></ul>
    8. 8. Our Research Focus Video Conferencing Online Entertainment VoIP
    9. 9. Our Work <ul><li>Selected contributions </li></ul><ul><ul><li>The first QoE measurement methodology based on large-scale user behavior observation </li></ul></ul><ul><ul><li>OneClick: A simple yet efficient framework for QoE measurement experiments </li></ul></ul><ul><ul><li>The first crowdsourcable QoE evaluation methodology </li></ul></ul><ul><li>None of them are incremental work </li></ul>
    10. 10. Our Contribution #1 <ul><li>The first QoE measurement methodology based on large-scale user behavior observation </li></ul><ul><li>Rationale (VoIP as an example) </li></ul><ul><ul><li>The QoE perceived by users is more or less related to their call duration </li></ul></ul>jitter delay network quality service level source rate call duration Correlated TCP / UDP? relayed? (QoE) (QoS factors)
    11. 11. Skype Call Duration vs. Network Quality <ul><li>There are short calls with good network quality </li></ul><ul><li>The average shows negative correlation between the 2 variables </li></ul>Call duration (min) Jitter (Kbps) worse quality 95% confidence band of the average average
    12. 12. Our Contribution #1 (cont) <ul><li>Proportional-hazards modeling </li></ul><ul><ul><li>Skype’s QoE prediction </li></ul></ul><ul><li>Features </li></ul><ul><ul><li>No user studies required (more scalable) </li></ul></ul><ul><ul><li>Can be used to adjust system parameters in run time </li></ul></ul><ul><ul><li>Applies to all real-time interactive applications </li></ul></ul><ul><li>Chen et al, &quot;Quantifying Skype User Satisfaction,&quot; ACM SIGCOMM 2006 . </li></ul><ul><li>Chen et al, &quot;On the Sensitivity of Online Game Playing Time to Network QoS,&quot; IEEE INFOCOM'06 . </li></ul><ul><li>Chen et al, &quot;How Sensitive are Online Gamers to Network Quality?,&quot; Comm. of ACM, 2006 . </li></ul><ul><li>Chen et al, &quot;Effect of Network Quality on Player Departure Behavior in Online Games,&quot; IEEE TPDS’08 . </li></ul>
    13. 13. Our Contribution #2 <ul><li>OneClick: A simple yet efficient framework for QoE measurement experiments </li></ul><ul><li>Knocking at someone’s door </li></ul><ul><ul><li>Knock on the door </li></ul></ul><ul><ul><li>You wait, and you knock on the door again </li></ul></ul><ul><ul><li>You wait, and you knock on the door again and again, and … </li></ul></ul>
    14. 14. Our Contribution #2 (cont) <ul><li>Simple instruction to users: </li></ul><ul><ul><li>Click when you feel dissatisfied </li></ul></ul><ul><ul><li>Click multiple times when you feel even less satisfied </li></ul></ul><ul><li>Estimating QoE from application quality and users’ click event process </li></ul>
    15. 15. Our Contribution #2 (cont) <ul><li>Natural </li></ul><ul><ul><li>We are already doing it to show lost of patience all the time </li></ul></ul><ul><li>Bad-memory proof </li></ul><ul><ul><li>Real-time decisions </li></ul></ul><ul><ul><li>No need to “remember” past experience </li></ul></ul><ul><li>Time-aware </li></ul><ul><ul><li>Capture users’ responses at the time of the problems </li></ul></ul><ul><ul><li>Useful to study recency and habituation effect </li></ul></ul>Chen et al, &quot;OneClick: A Framework for Measuring Network Quality of Experience,” IEEE INFOCOM 2009 .
    16. 16. Our Contribution #3 <ul><li>The first crowdsourcable QoE evaluation framework </li></ul><ul><li>Users’ inputs can be verified </li></ul><ul><ul><li>the transitivity property: A > B and B > C  A > C </li></ul></ul><ul><ul><li>detect inconsistent judgements from problematic users </li></ul></ul><ul><li>Experiments can thus be outsourced to Internet crowd </li></ul><ul><ul><li>lower monetary cost </li></ul></ul><ul><ul><li>wider participant diversity </li></ul></ul><ul><ul><li>maintaining the evaluation results’ quality </li></ul></ul>Chen et al, &quot;A Crowdsourceable QoE Evaluation Framework for Multimedia Content,” ACM Multimedia 2009 .
    17. 17. Quantifying User Satisfaction Collaborators: Chun-Ying Huang Polly Huang Chin-Laung Lei ( National Taiwan University) Sheng-Wei (Kuan-Ta) Chen Institute of Information Science, Academia Sinica Appeared on ACM SIGCOMM 2006
    18. 18. Motivation <ul><li>Are users satisfied with our system? </li></ul><ul><ul><li>User survey </li></ul></ul><ul><ul><li>Market response </li></ul></ul><ul><ul><li>User satisfaction metric </li></ul></ul><ul><li>To make a system self-adaptable in real time for better user experience </li></ul><ul><ul><li>User satisfaction metric </li></ul></ul>Need of a Quality-of-Experience (QoE) metric!
    19. 19. QoE metrics <ul><li>FTP applications: data throughput rate </li></ul><ul><li>Web applications: response time and page load time </li></ul><ul><li>VoIP applications: voice quality (fidelity, loudness, noise), conversational delay, echo </li></ul><ul><li>Online games: interactivity, responsiveness, consistency, fairness </li></ul>QoE is multi-dimensional esp. for real-time interactive applications!
    20. 20. What path should Skype choose? Which path is “the best”? 3% 1% 2% loss rate 500 ms 30 Kbps 300 ms 20 Kbps 100 ms delay 10 Kbps avail bandwidth path Internet
    21. 21. QoS and QoE <ul><li>QoS (Quality of service) </li></ul><ul><ul><li>The quality level of “native” performance metric </li></ul></ul><ul><ul><ul><li>Communication networks: delay, loss rate </li></ul></ul></ul><ul><ul><ul><li>Voice/audio codec: fidelity </li></ul></ul></ul><ul><ul><ul><li>DBMS: query completion time </li></ul></ul></ul><ul><li>QoE (Quality of experience) </li></ul><ul><ul><li>How users “feel” about a service </li></ul></ul><ul><ul><li>Usually multi-dimensional , and tradeoffs exist between different dimensions (download time vs. video quality, responsivess vs. smoothness) </li></ul></ul><ul><ul><li>However, a unified (scalar) index is normally desired! </li></ul></ul>
    22. 22. A typical relationship between QoS and QoE QoS, e.g., network bandwidth QoE Hard to tell “very bad” from “extremely bad” Marginal benefit is small
    23. 23. Mapping between QoS and QoE <ul><li>Which QoS metric is most influential on users’ perceptions (QoE)? </li></ul><ul><li>Source rate? </li></ul><ul><li>Loss? </li></ul><ul><li>Delay? </li></ul><ul><li>Jitter? </li></ul><ul><li>Combination of the above? </li></ul>
    24. 24. How to measure QoE: A quick review <ul><li>Subjective evaluation procedures </li></ul><ul><ul><li>Human studies, not scalable </li></ul></ul><ul><ul><li>Costly! </li></ul></ul><ul><li>Objective evaluation procedures </li></ul><ul><ul><li>Statistical models based on subjective evaluation results </li></ul></ul><ul><ul><li>Pros: Computation without human involvement </li></ul></ul><ul><ul><li>Cons: (Over-)simplifications of model parameters </li></ul></ul><ul><ul><ul><li>E.g., use a single “loss rate” to capture the packet loss process </li></ul></ul></ul><ul><ul><ul><li>E.g., assume every voice/video packet is equally important </li></ul></ul></ul><ul><ul><ul><li>Not consider external effects such as loudness and quality of handsets </li></ul></ul></ul>
    25. 25. Subjective Evaluation Procedures <ul><li>Single Stimulus Method (SSM) </li></ul><ul><li>Single Stimulus Continuous Quality Evaluation (SSCQE) </li></ul><ul><li>Double Stimulus Continuous Quality Scale (DSCQS) </li></ul><ul><li>Double Stimulus Impairment Scale (DSIS) </li></ul>
    26. 26. Objective Evaluation Methods <ul><li>Refereneced models </li></ul><ul><ul><li>speech-layer model: PESQ (ITU-T P.862) </li></ul></ul><ul><ul><ul><li>Compare original and degraded signals </li></ul></ul></ul><ul><li>Unreferenced models (no original signals required) </li></ul><ul><ul><li>speech-layer model: P.VTQ (ITU-T P.563) </li></ul></ul><ul><ul><ul><li>Detect unnatural voices, noise, mute/interruptions in degraded signals </li></ul></ul></ul><ul><ul><li>network-layer model: E-model (ITU-T G.107) </li></ul></ul><ul><ul><ul><li>Regression model based on delay, loss rate, and 20+ variables </li></ul></ul></ul><ul><ul><ul><li>Equations are over-complex for physical interpretation, e.g. </li></ul></ul></ul>I s = 2 0 · f 1 + ( X o l r 8 ) 8 g 1 8 ¡ X o l r 8 ¸ X o l r = O L R + 0 : 2 ( 6 4 + N o ¡ R L R )
    27. 27. Our goals <ul><li>An objective QoE assessment framework </li></ul><ul><li>passive measurement (thus scalable) </li></ul><ul><li>easy to construct models (for your own application) </li></ul><ul><li>easy to access input parameters </li></ul><ul><li>easy to compute in real time </li></ul>
    28. 28. Our contributions <ul><li>An index for Skype user satisfaction </li></ul><ul><li>derived from real-life Skype call sessions </li></ul><ul><li>verified by users’ speech interactivities in calls </li></ul><ul><li>accessible and computable in real time </li></ul>round-trip times between two parties RTT: receiving rate jitter (level of network congestion) jitter: data rate of voice packets bit rate: U S I = 2 : 1 5 £ l o g ( b i t r a t e ) ¡ 1 : 5 5 £ l o g ( j i t t e r ) ¡ 0 : 3 6 £ R T T
    29. 29. Talk outline <ul><li>The Question </li></ul><ul><li>Measurement </li></ul><ul><li>Modeling </li></ul><ul><li>Validation </li></ul><ul><li>Significance </li></ul>
    30. 30. Setting things up
    31. 31. Capturing Skype traffic <ul><li>1. Identify Skype hosts and ports </li></ul><ul><ul><li>Track hosts sending http to “ui.skype.com” </li></ul></ul><ul><ul><li>Track their ports sending UDP within 10 seconds </li></ul></ul><ul><ul><li> (host, port) </li></ul></ul><ul><ul><li>Other parties which communicate with discovered host-port pairs </li></ul></ul><ul><li>2. Record packets </li></ul><ul><ul><li>Whose source or destination  these (host, port) </li></ul></ul><ul><li>Reduce the # of traced packets to 1-2% </li></ul>
    32. 32. Extracting Skype calls <ul><li>1. Take these sessions </li></ul><ul><ul><li>Average packet rate within (10, 100) pkt/sec </li></ul></ul><ul><ul><li>Average packet size within (30, 300) bytes </li></ul></ul><ul><ul><li>For longer than 10 seconds </li></ul></ul><ul><li>2. Merge two sessions into one relay session </li></ul><ul><ul><li>If the two sessions share a common relay node </li></ul></ul><ul><ul><li>Their start and finish time are close to each other with 30 seconds </li></ul></ul><ul><ul><li>And their packet rate series are correlated </li></ul></ul>
    33. 33. Probing RTTs <ul><li>As we take traces </li></ul><ul><li>Send ICMP ping, application-level ping & traceroute </li></ul><ul><ul><li>Exponential intervals </li></ul></ul>
    34. 34. Trace Summary 24 min 18 min 29 min Avg. Time 570 462 Total 369 209 Relayed 240 Hosts 253 Calls Direct Category Internet Direct sessions Relayed sessions
    35. 35. Talk outline <ul><li>The Question </li></ul><ul><li>Measurement </li></ul><ul><li>Modeling </li></ul><ul><li>Validation </li></ul><ul><li>Significance </li></ul>
    36. 36. The intuition behide our analysis <ul><li>The conversation quality (i.e., QoE) perceived by call parties is more or less related to the call duration </li></ul><ul><li>The network conditions of a VoIP call are independent of </li></ul><ul><ul><li>importance of talk content </li></ul></ul><ul><ul><li>call parties’ schedule </li></ul></ul><ul><ul><li>call parties’ talkativeness </li></ul></ul><ul><ul><li>other incentives to talk (e.g., free of charge) </li></ul></ul>
    37. 37. First, getting a better sense <ul><li>jitter </li></ul><ul><li>RTT </li></ul>network quality service level source rate call duration correlated? TCP / UDP? relayed? (QoE) (QoS factors)
    38. 38. Is call duration related to each factor? <ul><li>For each factor </li></ul><ul><ul><li>Scatter plot of the factor to the call duration </li></ul></ul><ul><ul><li>See whether they are positively, negatively, or not correlated </li></ul></ul><ul><li>Hypothesis tests </li></ul><ul><ul><li>Confirm whether they are indeed positively, negatively, or not correlated </li></ul></ul>
    39. 39. Call duration vs. jitter <ul><ul><li>There are short calls with low jitters </li></ul></ul><ul><ul><li>The average shows a negative correlation between the 2 variables </li></ul></ul>avg call duration (min) jitter (Kbps) (std dev of received bytes/sec) 95% confidence band of the average average
    40. 40. Effect of Jitter – Hypothesis Testing Null Hypothesis All the survival curves are equivalent Log-rank test: P < 1e-20 We have > 99.999% confidence claiming jitters are correlated with call duration The probability distribution of hanging up a call
    41. 41. Effect of Source Rate (the bandwidth Skype intended to use) Average session time (min)
    42. 42. The better sense <ul><li>jitter </li></ul><ul><li>RTT </li></ul>network quality service level source rate call duration correlated? TCP / UDP? relayed? positive negative negative negative none (non-significant)
    43. 43. Linear regression? <ul><li>No! </li></ul><ul><li>Reasons </li></ul><ul><li>Assumptions no longer hold </li></ul><ul><ul><li>errors are not independent and not normally distributed </li></ul></ul><ul><ul><li>variance of errors are not constant </li></ul></ul><ul><li>Censorship </li></ul><ul><ul><li>There are calls that have been going on for a while </li></ul></ul><ul><ul><li>There are calls that have not yet finished by the time we terminate tracing </li></ul></ul><ul><ul><li>We can’t simply discard these calls </li></ul></ul><ul><ul><li>Otherwise we end up with a biased set of calls with limited call duration </li></ul></ul>
    44. 44. Cox regression modeling <ul><li>The Cox regression model provides a good fit </li></ul><ul><li>the effect of treatment on patients’ survival time </li></ul><ul><li>log-hazard function is proportional to the weighted sum of factors </li></ul>: factors (bit rate=x, jitter=y, RTT=z, …) : weights of factors The instantaneous rate at which failures occur for observations that have survived at time t Hazard function (conditional failure rate) ¯ Z h ( t ) = l i m ¢ t ! 0 P r [ t · T < t + ¢ t j T ¸ t ] ¢ t l o g h ( t j Z ) / ¯ t Z
    45. 45. Functional Form Checks <ul><li>Assumption must be conformed </li></ul><ul><li>Explore “true” functional forms of factors by generalized additive models </li></ul><ul><li>Bit rate and jitter  log scale </li></ul>h ( t j Z ) / e x p ( ¯ t Z ) <ul><li>Human beings are known sensitive to the scale of physical quantity rather than the magnitude of the quantity </li></ul><ul><ul><li>Scale of sound (decibels vs. intensity) </li></ul></ul><ul><ul><li>Musical staff for notes (distance vs. frequency) </li></ul></ul><ul><ul><li>Star magnitudes (magnitude vs. brightness) </li></ul></ul>
    46. 46. The Logarithm Fits Better (Bit rate) After taking logarithm …
    47. 47. The Logarithm Fits Better (Jitter) After taking logarithm …
    48. 48. Final model & interpretation 0.18 0.09 0.13 std. err. 4.3e-02 0.36 RTT < 1e-20 1.55 log(jitter) < 1e-20 signif. -2.15 coef log(bit rate) variable A: bit rate = 20 Kbps B: bit rate = 15 Kbps, other factors same as A The hazard ratio between A and B can be computed by exp((log(15) – log(20)) × -2.15 ) ≈ 1.86  The probability B will hang up is 1.86 times the probability A will do so at any instant. Interpretation
    49. 49. Hang-up rate and USI Hang-up rate = User satisfaction index (USI) =
    50. 50. Actual and Predicted Time vs. USI Average session time (min)
    51. 51. The multi-path scenario is call hang-up rate a good indication of user satisfaction? BUT, 5.43 6.33 3.84 USI 3 Kbps 1 Kbps 2 Kbps jitter 500 ms 30 Kbps 300 ms 20 Kbps 100 ms RTT 10 Kbps avail bandwidth path Internet
    52. 52. Talk outline <ul><li>The Question </li></ul><ul><li>Measurement </li></ul><ul><li>Modeling </li></ul><ul><li>Validation </li></ul><ul><li>Significance </li></ul>
    53. 53. User satisfaction: Validation Call duration ? intuition: call duration <-> satisfaction not confirmed yet
    54. 54. User satisfaction: One step further Speech interactivity Call duration ? now we’re going to check! intuition: interactive and tight speech activities in a cheerful conversation
    55. 55. Identifying talk bursts <ul><li>The problem </li></ul><ul><li>Every voice packet is encrypted with 256-bit AES (Advanced Encryption Standard) </li></ul><ul><li>Possible solutions </li></ul><ul><li>packet rate: no silence suppression in Skype </li></ul><ul><li>packet size: our choice </li></ul>
    56. 56. What we need to achieve <ul><li>Input: a time series of packet sizes </li></ul><ul><li>Output: estimated ON/OFF periods (ON = talk / OFF= silence) </li></ul>Time
    57. 57. Speech activity detection <ul><li>Wavelet de-noising </li></ul><ul><ul><li>Removing high-frequency fluctuations </li></ul></ul><ul><li>Detect peaks and dips </li></ul><ul><li>Dynamic thresholding </li></ul><ul><ul><li>Deciding the beginning/end of a talk burst </li></ul></ul>
    58. 58. Speech detection algorithm: Validation <ul><li>The speech detection algorithm is validated with: </li></ul><ul><li>synthesized sin waves (500 Hz – 2000 Hz) </li></ul><ul><li>real speech recordings </li></ul>relay node (chosen by Skype) average RTT: 350 ms jitter: 5.1 Kbps Force packet size processes contaminated by serious network impairment (delay and loss) play sound >> capture packet size processes
    59. 59. Validation with synthesized sin waves <ul><li>3 times for each of 10 test cases </li></ul><ul><li>correctness (ratio of matched 0.1-second periods): 0.73 – 0.92 </li></ul>true ON periods estimated ON periods
    60. 60. Validation with speech recordings <ul><li>3 times for each of 3 test cases </li></ul><ul><li>correctness (ratio of matched 0.1-second periods): 0.71 – 0.85 </li></ul>true ON periods estimated ON periods
    61. 61. Speech interactivity analysis Responsiveness: Avg. Response Delay: Avg. Burst Length: whether the other party responds how long before the other party responds how long does a speech burst last Responsiveness: Response delay: Burst length:
    62. 62. USI vs. Speech interactivity <ul><li>All are statistically significant (at 0.01 significance level) </li></ul><ul><li>Speech interactivity in conversation supports the proposed USI </li></ul>higher USI  higher responsiveness higher USI  shorter response delay higher USI  shorter burst length
    63. 63. Talk outline <ul><li>The Question </li></ul><ul><li>Measurement </li></ul><ul><li>Modeling </li></ul><ul><li>Validation </li></ul><ul><li>Significance </li></ul>
    64. 64. Implications <ul><li>should put more attention to delay jitters (rather then focus on network delay only) </li></ul><ul><li>and the encoding bit rate! </li></ul>
    65. 65. Significance <ul><li>QoE-aware systems that can optimize user experience in run time </li></ul><ul><li>Is it worth to sacrifice 20 ms latency for reducing 10 ms jitters (say, with a de-jitter buffer)? </li></ul><ul><li>Pick the most appropriate parameters in run time </li></ul><ul><ul><li>playout scheduling (buffer time) </li></ul></ul><ul><ul><li>coding scheme (& rate) </li></ul></ul><ul><ul><li>source rate </li></ul></ul><ul><ul><li>data path (overlay routing) </li></ul></ul><ul><ul><li>transmission scheme (redundacy, erasure coding, …) </li></ul></ul>
    66. 66. Future work (1) <ul><li>Measurement </li></ul><ul><li>larger data sets (p2p traffic is hard to collect) </li></ul><ul><li>diverse locations </li></ul><ul><li>Validation </li></ul><ul><li>user studies </li></ul><ul><li>comparison with existing models (PESQ, etc) </li></ul>
    67. 67. Future work (2) <ul><li>Beyond “call duration” </li></ul><ul><li>Who hangs up a call? </li></ul><ul><li>Call disconnect-n-connect behavior </li></ul><ul><li>More sophisticated modeling </li></ul><ul><li>Voice codec </li></ul><ul><li>Pricing effect </li></ul><ul><li>Time-of-day effect </li></ul><ul><li>Time-dependent impact </li></ul>Call Behavior ?
    68. 68. How Gamers are Aware of Service Quality? <ul><li>Real-time interactive online games are generally considered QoS-sensitive </li></ul><ul><li>Gamers are always complaining about high “ping-times” or network lags </li></ul><ul><li>Online gaming is increasingly popular despite the best-effort Internet </li></ul>Q1: Are game players really sensitive to network quality as they claim? Q2: If so, how do they react to poor network quality? Appeared on IEEE INFOCOM 2006
    69. 69. Case Study: ShenZhou Online
    70. 70. Traffic Trace Collection 4.7TB / 21.7TB / 26.5TB 325M / 336M / 661M 54,424 N2 4.7TB / 27.3TB / 32.0TB bytes (in/out/both) 342M / 353M / 695M packets (in/out/both) 57,945 conn. # N1 trace
    71. 71. Delay Jitter vs. Session Time (std. dev. of the round-trip times)
    72. 72. Hypothesis Testing -- Effect of Loss Rate Null Hypothesis: All the survival curves are equivalent Log-rank test: P < 1e-20 We have > 99.999% confidence claiming loss rates are correlated with game playing times high loss low loss med loss The CCDF of game session times
    73. 73. Regression Modeling <ul><li>Linear regression is not adequate </li></ul><ul><ul><li>Violating the assumptions (normal errors, equal variance, …) </li></ul></ul><ul><li>The Cox regression model provides a good fit </li></ul><ul><ul><li>Log-hazard function is proportional to the weighted sum of factors </li></ul></ul>(our aim is to compute ) where each session has factors Z (RTT=x, jitter=y, …) The instantaneous rate of quitting a game for a player (session) Hazard function (conditional failure rate) h ( t ) = l i m ¢ t ! 0 P r [ t · T < t + ¢ t j T ¸ t ] ¢ t ¯ l o g h ( t j Z ) / ¯ t Z
    74. 74. Final Model & Interpretation A: RTT = 200 ms B: RTT = 100 ms, other factors same as A Hazard ratio between A and B: exp((log(0.2) – log(0.1)) × 1.27 ) ≈ 2.4 A will more likely leave a game ( 2.4 times probability ) than B at any moment Interpretation < 1e-20 0.04 1.27 log(RTT) 0.01 0.01 0.03 Std. Err. 7e-13 0.09 log(sloss) < 1e-20 0.12 log(closs) < 1e-20 Signif. 0.68 Coef log(jitter) Variable
    75. 75. How good does the model fit?
    76. 76. Relative Influence of QoS Factors Server pakce loss = 15% Delay jitter = 45% Client packet loss = 20% Latency = 20%
    77. 77. An Index for ShenZhou Online <ul><li>Features </li></ul><ul><li>derived from real-life game sessions </li></ul><ul><li>accessible and computable in real time </li></ul><ul><li>implications: delay jitter is more intolerable than delay </li></ul>round-trip times level of network congestion loss rate of client packets loss rate of server packets RTT: jitter: closs: sloss: l o g ( d e p a r t u r e r a t e ) / 1 . 2 7 £ l o g ( R T T ) + 0 . 6 8 £ l o g ( j i t t e r ) + 0 . 1 2 £ l o g ( c l o s s ) + 0 . 0 9 £ l o g ( s l o s s )
    78. 78. App #1: Evaluation of Alternative Designs <ul><li>Suppose now we have two designs (e.g., protocols) </li></ul><ul><li>One leads to lower delay but high jitter: </li></ul><ul><ul><li>100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, … </li></ul></ul><ul><li>One leads to higher delay but lower jitter: </li></ul><ul><ul><li>150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, … </li></ul></ul><ul><li>Which one design shall we choose? </li></ul>time network latency 150 ms
    79. 79. App #2: Overlay Path Selection Internet 5.43 6.33 3.84 score 30 ms (A) 20 ms (G) 50 ms (P) jitter 1% (A) 200 ms (P) 1% (A) 150 ms (A) 5% (P) loss rate 100 ms (G) delay path
    80. 80. Other Applications <ul><li>Deciding smoothing buffer </li></ul><ul><ul><li>Is it worth to sacrifice 20ms latency for reducing 10ms jitters? </li></ul></ul><ul><li>Maintaining fairness </li></ul><ul><ul><li>Allocate more resources to players experience poor QoS </li></ul></ul>
    81. 81. Player Departure Behavior Analysis <ul><li>Player departure rate is decreasing by time </li></ul><ul><li>Golden time is the first 10 minutes : the longer gamers play, the more external factors would affect their decisions to stay or leave </li></ul><ul><ul><li>allocating more resources to players just entered </li></ul></ul>
    82. 82. To be continued … Sheng-Wei (Kuan-Ta) Chen http://www.iis.sinica.edu.tw/~swc
    83. 83. OneClick -- A Framework for Measuring Network Quality of Experience Kuan-Ta Chen, Cheng-Chu Tu, Wei-Cheng Xiao Institute of Information Science, Academia Sinica Appeared on IEEE INFOCOM 2009
    84. 84. QoS and QoE <ul><li>QoS (Quality of service) </li></ul><ul><ul><li>The quality level of system performance metric </li></ul></ul><ul><ul><ul><li>Communication networks: delay, loss rate </li></ul></ul></ul><ul><ul><ul><li>DBMS: query completion time </li></ul></ul></ul><ul><li>QoE (Quality of experience) </li></ul><ul><ul><li>The quality of how users “feel” about a service </li></ul></ul><ul><ul><ul><li>Subjective: Mean Opinion Score (MOS) </li></ul></ul></ul><ul><ul><ul><li>Objective: PSNR (picture), PESQ (voice), VQM (video) </li></ul></ul></ul>
    85. 85. Relationship between QoS and QoE QoS, e.g., network bandwidth QoE Too bad to perceive Marginal benefit is small Comfort range
    86. 86. Knowing the Relationship is Important! <ul><li>So we know </li></ul><ul><ul><li>How to adapt voice/video/game data rate (QoS) for user satisfaction (QoE) </li></ul></ul><ul><li>So we really know </li></ul><ul><ul><li>How to send multimedia data over the Internet </li></ul></ul>
    87. 87. Measuring QoS and QoE <ul><li>QoS (A great body of work) </li></ul><ul><ul><li>Measure network loss, delay, available bandwidth </li></ul></ul><ul><ul><li>Inference topology </li></ul></ul><ul><ul><li>Estimate network capacity </li></ul></ul><ul><ul><li>etc </li></ul></ul><ul><li>QoE (Some work) </li></ul><ul><ul><li>Objective: PSNR (picture), PESQ (voice), VQM (video) </li></ul></ul><ul><ul><li>Subjective: MOS (general) </li></ul></ul><ul><ul><ul><li>Still not quite the human experience </li></ul></ul></ul><ul><ul><ul><li>which is multi-dimensional </li></ul></ul></ul><ul><ul><ul><li>What’s left! </li></ul></ul></ul>
    88. 88. MOS (Mean Opinion Score) <ul><li>Slow in scoring (think/interpretation time) </li></ul><ul><li>People are limited by finite memory </li></ul><ul><li>Cannot capture users’ perceptions over time </li></ul><ul><li>MOS is coarse in scale granularity </li></ul><ul><li>Dissimilar interpretations of the scale among users </li></ul>Problems
    89. 89. Our Ambition Identify a simple and yet efficient way to measure users’ satisfaction
    90. 90. The Idea: Click, Click, Click <ul><li>Web surfing </li></ul><ul><ul><li>Click on a link </li></ul></ul><ul><ul><li>You wait, and you refresh the link </li></ul></ul><ul><ul><li>You wait, and you refresh the link again, and again, and … </li></ul></ul><ul><li>Knocking at someone’s door </li></ul><ul><ul><li>Knock on the door </li></ul></ul><ul><ul><li>You wait, and you knock on the door again </li></ul></ul><ul><ul><li>You wait, and you knock on the door again and again, and … </li></ul></ul>
    91. 91. Introducing OneClick <ul><li>Simple instruction to users: </li></ul><ul><ul><li>Click when you feel dissatisfied </li></ul></ul><ul><ul><li>Click multiple times when you feel even less satisfied </li></ul></ul><ul><li>Clicking rate as the QoE </li></ul>
    92. 92. Nice Things about OneClick <ul><li>Natural </li></ul><ul><ul><li>We are already doing it to show lost of patience all the time </li></ul></ul><ul><li>Bad-memory proof </li></ul><ul><ul><li>Real-time decisions </li></ul></ul><ul><ul><li>No need to “remember” past experience </li></ul></ul><ul><li>Time-aware </li></ul><ul><ul><li>Capture users’ responses at the time of the problems </li></ul></ul><ul><ul><li>Useful to study recency, memory access, and habituation effect </li></ul></ul>
    93. 93. Easy to Implement <ul><li>As a plug-in to your network applications </li></ul><ul><ul><li>Flash version done! </li></ul></ul><ul><li>Co-measurement of QoS and QoE </li></ul>
    94. 94. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><li>Pilot Study </li></ul><ul><li>Validation </li></ul><ul><li>Case Studies </li></ul><ul><li>Conclusion </li></ul>
    95. 95. Human as a QoE Rating System User Application QoS Application QoE Network Setting Click Events affect reflect vary this: observe this:
    96. 96. QoE  QoS Modeling <ul><li>Click events as a counting process </li></ul><ul><li>Poisson regression </li></ul><ul><ul><li>C(t) : QoE </li></ul></ul><ul><ul><ul><li>Clicking rate at time t </li></ul></ul></ul><ul><ul><li>N 1 (t) , N 2 (t) , … : QoS </li></ul></ul><ul><ul><ul><li>Network conditions at time t </li></ul></ul></ul><ul><ul><li>α i : Regression coefficients </li></ul></ul><ul><ul><ul><li>Derived from the maximum likelihood method </li></ul></ul></ul>
    97. 97. Wait a Minute… <ul><li>Response delays? </li></ul><ul><ul><li>Users may not be able to click immediately after they are aware of the degraded quality </li></ul></ul><ul><li>Clicking rate of a user consistent? </li></ul><ul><ul><li>Does a subject give similar ratings in repeated experiments? </li></ul></ul><ul><li>Clicking rate consistent across users? </li></ul><ul><ul><li>Different subjects may have different preference on click decisions. </li></ul></ul>
    98. 98. Pilot Study <ul><li>An 5-minute English song </li></ul><ul><li>Audio quality of AIM Messenger with various network settings </li></ul>
    99. 99. Test Material Compilation <ul><li>For each network setting </li></ul><ul><ul><li>Play the song </li></ul></ul><ul><ul><li>Record the song </li></ul></ul><ul><li>K settings  K recordings </li></ul>A test material = Random non-overlapping segments from K different recordings
    100. 100. Response Delays <ul><li>Try Poisson regression on C(t+x) to N 1 (t) , N 2 (t) , … </li></ul><ul><li>Varying x </li></ul><ul><li>Show the goodness of fit per x </li></ul>
    101. 101. 1-2 Seconds Delay Response delay calibration needed!
    102. 102. Our Solution <ul><li>Shift the click event process by time d </li></ul><ul><li>d is decided by model fitting </li></ul><ul><ul><li>Let d be the x such that the goodness of fit is the best </li></ul></ul><ul><ul><li>Let d be the x such that the residual deviance is the min </li></ul></ul>
    103. 103. Consistency of C(t+d) from Same User
    104. 104. Consistency of C(t+d) from Different Users Cross-user normalization needed!
    105. 105. Calibration and Normalization Added Response Delay Calibration Regression Modeling With Normalization User #1 User #2 OneClick Measurement
    106. 106. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><li>Pilot Study </li></ul><ul><li>Validation </li></ul><ul><li>Case Studies </li></ul><ul><li>Conclusion </li></ul>
    107. 107. Rationale <ul><li>Direct: get people to do OneClick and MOS </li></ul>Click Rate MOS Exact problem we are trying to solve  PESQ/VQM <ul><li>Indirect: get people to do OneClick and PESQ/VQM </li></ul>
    108. 108. PESQ-based Validation <ul><li>PESQ: Perceptual Evaluation of Speech Quality </li></ul><ul><li>OneClick vs. PESQ to evaluate the audio quality of three VoIP applications </li></ul><ul><ul><li>AIM </li></ul></ul><ul><ul><li>MSN Messenger </li></ul></ul><ul><ul><li>Skype </li></ul></ul><ul><li>Network factors </li></ul><ul><ul><li>Loss rates (0% – 30%) </li></ul></ul><ul><ul><li>Bandwidth (10 Kbps – 100 Kbps) </li></ul></ul>[Validation]
    109. 109. Qualitative Comparison [Validation] Network Loss Rate Bandwidth
    110. 110. VQM-based Validation <ul><li>VQM: Video Quality Measurement </li></ul><ul><li>OneClick vs. VQM to evaluate video quality of two video codecs </li></ul><ul><ul><li>H.264 </li></ul></ul><ul><ul><li>WMV9 (Windows Media Video) </li></ul></ul><ul><li>Factors </li></ul><ul><ul><li>Compression bit rate (200 Kbps – 1000 Kbps) </li></ul></ul>[Validation]
    111. 111. Qualitative Comparison [Validation]
    112. 112. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><li>Pilot Study </li></ul><ul><li>Validation </li></ul><ul><li>Case Studies </li></ul><ul><li>Conclusion </li></ul>
    113. 113. Case Studies <ul><li>Evaluation of applications’ QoE </li></ul><ul><ul><li>VoIP applications </li></ul></ul><ul><ul><ul><li>AIM </li></ul></ul></ul><ul><ul><ul><li>MSN Messenger </li></ul></ul></ul><ul><ul><ul><li>Skype </li></ul></ul></ul><ul><ul><li>First-person shooter games </li></ul></ul><ul><ul><ul><li>Halo </li></ul></ul></ul><ul><ul><ul><li>Unreal Tournament </li></ul></ul></ul>
    114. 114. Varying Bandwidth <ul><li>MSN Messenger is generally the worst </li></ul><ul><li>Skype is the best if bw < 80 Kbps, otherwise AIM is the best </li></ul>[Case Study]
    115. 115. Contour Lines of Click Rates <ul><li>Slope of contour line </li></ul><ul><ul><li>Application’s sensitivity to loss vs. bandwidth shortage </li></ul></ul><ul><ul><li>AIM is relatively more sensitive to network losses </li></ul></ul>[Case Study]
    116. 116. Comfort Region <ul><li>Comfort Region: a set of network configurations that leads to satisfactory QoE </li></ul><ul><li>Skype is the best in bw-restricted scenarios (< 60 Kbps) when loss rate is < 10% </li></ul>[Case Study]
    117. 117. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><li>Pilot Study </li></ul><ul><li>Validation </li></ul><ul><li>Case Studies </li></ul><ul><li>Conclusion </li></ul>
    118. 118. Nice about OneClick <ul><li>Natural & fast </li></ul><ul><ul><li>We are already doing it to show lost of patience all the time </li></ul></ul><ul><li>Bad-memory proof </li></ul><ul><ul><li>No need to “remember” past experience </li></ul></ul><ul><li>Time-aware </li></ul><ul><ul><li>Capture users’ responses at the time of the problems </li></ul></ul><ul><li>Fine-grain </li></ul><ul><ul><li>The score can be 0.2, 3.5, or even 12.345 </li></ul></ul><ul><li>Normalized user interpretation </li></ul><ul><ul><li>Different interpretations are normalized </li></ul></ul><ul><li>Easy to implement </li></ul><ul><ul><li>http://mmnet.iis.sinica.edu.tw/proj/oneclick/ </li></ul></ul>
    119. 119. OneClick Online
    120. 120. On-Going Work <ul><li>Large-scale experiments (by crowdsourcing) </li></ul><ul><ul><li>http://mmnet.iis.sinica.edu.tw/proj/oneclick/ </li></ul></ul><ul><li>Click rate vs. MOS </li></ul><ul><li>QoE-centric multimedia networking </li></ul><ul><ul><li>As an example, Tuning the Redundancy Control Algorithm of Skype for User Satisfaction, IEEE INFOCOM 2009. </li></ul></ul>
    121. 121. To be continued … Kuan-Ta Chen http://www.iis.sinica.edu.tw/~swc
    122. 122. A Crowdsourceable QoE Evaluation Framework for Multimedia Content Kuan-Ta Chen Academia Sinica Chen-Chi Wu National Taiwan University Yu-Chun Chang National Taiwan University Chin-Laung Lei National Taiwan University Appeared on ACM Multimedia 2009
    123. 123. What is QoE? Quality of Experience = Users’ satisfaction about a service (e.g., multimedia content)
    124. 124. Quality of Experience Poor (underexposed) Good (exposure OK)
    125. 125. Challenges <ul><li>How to quantify the QoE of multimedia content efficiently and reliably? </li></ul>Q=? Q=? Q=? Q=?
    126. 126. Mean Opinion Score (MOS) <ul><li>Idea: Single Stimulus Method (SSM) + Absolute Categorial Rating (ACR) </li></ul>Excellent? Good? Fair? Poor? Bad? vote Fair MOS Quality Impairment 5 Excellent Imperceptible 4 Good Perceptible but not annoying 3 Fair Slightly annoying 2 Poor Annoying 1 Bad Very annoying
    127. 127. <ul><li>ACR-based </li></ul><ul><ul><li>Concepts of the scales cannot be concretely defined </li></ul></ul><ul><ul><li>Dissimilar interpretations of the scale among users </li></ul></ul><ul><ul><li>Only an ordinal scale, not an interval scale </li></ul></ul><ul><ul><li>Difficult to verify users’ scores </li></ul></ul><ul><li>Subjective experiments in laboratory </li></ul><ul><ul><li>Monetary cost (reward, transportation) </li></ul></ul><ul><ul><li>Labor cost (supervision) </li></ul></ul><ul><ul><li>Physical space/time/hardware constraints </li></ul></ul>Drawbacks of MOS-based Evaluations <ul><li>ACR-based </li></ul><ul><ul><li>Concepts of the scales cannot be concretely defined </li></ul></ul><ul><ul><li>Dissimilar interpretations of the scale among users </li></ul></ul><ul><ul><li>Only an ordinal scale, not an interval scale </li></ul></ul><ul><ul><li>Difficult to verify users’ scores </li></ul></ul><ul><li>Subjective experiments in laboratory </li></ul><ul><ul><li>Monetary cost (reward, transportation) </li></ul></ul><ul><ul><li>Labor cost (supervision) </li></ul></ul><ul><ul><li>Physical space/time/hardware constraints </li></ul></ul>Solve all these drawbacks
    128. 128. <ul><li>ACR-based </li></ul><ul><ul><li>Concepts of the scales cannot be concretely defined </li></ul></ul><ul><ul><li>Dissimilar interpretations of the scale among users </li></ul></ul><ul><ul><li>Only an ordinal scale, not an interval scale </li></ul></ul><ul><ul><li>Difficult to verify users’ scores </li></ul></ul><ul><li>Subjective experiments in laboratory </li></ul><ul><ul><li>Monetary cost (reward, transportation) </li></ul></ul><ul><ul><li>Labor cost (supervision) </li></ul></ul><ul><ul><li>Physical space/time/hardware constraints </li></ul></ul>Drawbacks of MOS-based Evaluations Crowdsourcing Paired Comparison
    129. 129. Contribution
    130. 130. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><ul><li>Paired Comparison </li></ul></ul><ul><ul><li>Crowdsourcing Support </li></ul></ul><ul><ul><li>Experiment Design </li></ul></ul><ul><li>Case Study & Evaluation </li></ul><ul><ul><li>Acoustic QoE </li></ul></ul><ul><ul><li>Optical QoE </li></ul></ul><ul><li>Conclusion </li></ul>
    131. 131. Current Approach: MOS Rating Excellent? Good? Fair? Poor? Bad? Vote ?
    132. 132. Our Proposal: Paired Comparison Which one is better? Vote A B B
    133. 133. Properties of Paired Comparison <ul><li>Generalizable across different content types and applications </li></ul><ul><li>Simple comparative judgment </li></ul><ul><ul><li>dichotomous decision easier than 5-category rating </li></ul></ul><ul><li>Interval-scale QoE scores can be inferred </li></ul><ul><li>The users’ inputs can be verified </li></ul>
    134. 134. Choice Frequency Matrix 10 experiments, each containing C(4,2)=6 paired comparisons A B C D A B C D 0 4 2 1 6 0 3 0 8 7 0 1 9 10 9 0
    135. 135. Inference of QoE Scores <ul><li>Bradley-Terry-Luce (BTL) model </li></ul><ul><ul><li>input: choice frequency matrix </li></ul></ul><ul><ul><li>output: an interval-scale score for each content (based on maximum likelihood estimation) </li></ul></ul>n content: T 1 ,…, T n P ij : the probability of choosing T i over T j u ( T i ) is the estimated QoE score of the quality level T i <ul><li>Basic Idea </li></ul><ul><ul><li>P 12 = P 23  u(T 1 ) - u(T 2 ) = u(T 2 ) - u(T 3 ) </li></ul></ul>
    136. 136. Inferred QoE Scores 0 0.63 0.91 1
    137. 137. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><ul><li>Paired Comparison </li></ul></ul><ul><ul><li>Crowdsourcing Support </li></ul></ul><ul><ul><li>Experiment Design </li></ul></ul><ul><li>Case Study & Evaluation </li></ul><ul><ul><li>Acoustic QoE </li></ul></ul><ul><ul><li>Optical QoE </li></ul></ul><ul><li>Conclusion </li></ul>
    138. 138. <ul><li>Crowdsourcing </li></ul><ul><li>= Crowd + Outsourcing </li></ul>“ soliciting solutions via open calls to large-scale communities”
    139. 139. Image Understanding <ul><li>Reward: 0.04 USD </li></ul>main theme? key objects? unique attributes?
    140. 140. Linguistic Annotations <ul><li>Word similarity (Snow et al. 2008) </li></ul>USD 0.2 for labeling 30 word pairs
    141. 141. More Examples <ul><li>Document relevance evaluation </li></ul><ul><ul><li>Alonso et al. (2008) </li></ul></ul><ul><li>Document rating collection </li></ul><ul><ul><li>Kittur et al. (2008) </li></ul></ul><ul><li>Noun compound paraphrasing </li></ul><ul><ul><li>Nakov (2008) </li></ul></ul><ul><li>Person name resoluation </li></ul><ul><ul><li>Su et al. (2007) </li></ul></ul>
    142. 142. The Risk <ul><li>Users may give erroneous feedback perfunctorily, carelessly, or dishonestly </li></ul><ul><li>Dishonest users have more incentives to perform tasks </li></ul>Not every Internet user is trustworthy! Need to have an ONLINE algorithm to detect problematic inputs!
    143. 143. Verification of Users’ Inputs (1) <ul><li>Transitivity property </li></ul><ul><ul><li>If A > B and B > C  A should be > C </li></ul></ul><ul><li>Transitivity Satisfaction Rate (TSR) </li></ul>
    144. 144. Verification of Users’ Inputs (2) <ul><li>Detect inconsistent judgments from problematic users </li></ul><ul><ul><li>TSR = 1  perfect consistency </li></ul></ul><ul><ul><li>TSR >= 0.8  generally consistent </li></ul></ul><ul><ul><li>TSR < 0.8  judgments are inconsistent </li></ul></ul>TSR-based reward / punishment (e.g., only pay a reward if TSR > 0.8)
    145. 145. Experiment Design <ul><li>For n algorithms (e.g., speech encoding) </li></ul><ul><ul><li>a source content as the evaluation target </li></ul></ul><ul><ul><li>apply the n algorithms to generate n content w/ different Q </li></ul></ul><ul><ul><li>ask a user to perform paired comparisons </li></ul></ul><ul><ul><li>compute TSR after an experiment </li></ul></ul>reward a user ONLY if his inputs are self-consistent (i.e., TSR is higher than a certain threshold)
    146. 146. Concept Flow in Each Round
    147. 147. Audio QoE Evaluation <ul><li>Which one is better? </li></ul>(SPACE key released) (SPACE key pressed)
    148. 148. Video QoE evaluation <ul><li>Which one is better? </li></ul>(SPACE key released) (SPACE key pressed)
    149. 149. Talk Progress <ul><li>Overview </li></ul><ul><li>Methodology </li></ul><ul><ul><li>Paired Comparison </li></ul></ul><ul><ul><li>Crowdsourcing Support </li></ul></ul><ul><ul><li>Experiment Design </li></ul></ul><ul><li>Case Study & Evaluation </li></ul><ul><ul><li>Acoustic QoE </li></ul></ul><ul><ul><li>Optical QoE </li></ul></ul><ul><li>Conclusion </li></ul>
    150. 150. Audio QoE Evaluation <ul><li>MP3 compression level </li></ul><ul><ul><li>Source clips: one fast-paced and one slow-paced song </li></ul></ul><ul><ul><li>MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 128 Kbps </li></ul></ul><ul><ul><li>127 participants and 3,660 paired comparisons </li></ul></ul><ul><li>Effect of packet loss rate on VoIP </li></ul><ul><ul><li>Two speech codecs: G722.1 and G728 </li></ul></ul><ul><ul><li>Packet loss rate: 0%, 4%, and 8% </li></ul></ul><ul><ul><li>62 participants and 1,545 paired comparisons </li></ul></ul>
    151. 151. Inferred QoE Scores MP3 Compression Level VoIP Packet Loss Rate
    152. 152. Video QoE Evaluation <ul><li>Video codec </li></ul><ul><ul><li>Source clips: one fast-paced and one slow-paced video clip </li></ul></ul><ul><ul><li>Three codecs: H.264, WMV3, and XVID </li></ul></ul><ul><ul><li>Two bit rates: 400 and 800 Kbps </li></ul></ul><ul><ul><li>121 participants and 3,345 paired comparisons </li></ul></ul><ul><li>Loss concealment scheme </li></ul><ul><ul><li>Source clips: one fast-paced and one slow-paced video clip </li></ul></ul><ul><ul><li>Two schemes: Frame copy (FC) and FC with frame skip (FCFS) </li></ul></ul><ul><ul><li>Packet loss rate: 1%, 5%, and 8% </li></ul></ul><ul><ul><li>91 participants and 2,745 paired comparisons </li></ul></ul>
    153. 153. Inferred QoE Scores Video Codec Concealment Scheme
    154. 154. Participant Source <ul><li>Laboratory </li></ul><ul><ul><li>Recruit part-time workers at an hourly rate of 8 USD </li></ul></ul><ul><li>MTurk </li></ul><ul><ul><li>Post experiments on the Mechanical Turk web site </li></ul></ul><ul><ul><li>Pay the participant 0.15 USD for each qualified experiment </li></ul></ul><ul><li>Community </li></ul><ul><ul><li>Seek participants on the website of an Internet community with 1.5 million members </li></ul></ul><ul><ul><li>Pay the participant an amount of virtual currency that was equivalent to one US cent for each qualified experiment </li></ul></ul>
    155. 155. Participant Source Evaluation <ul><li>With crowdsourcing… </li></ul><ul><ul><li>lower monetary cost </li></ul></ul><ul><ul><li>wider participant diversity </li></ul></ul><ul><ul><li>maintaining the evaluation results’ quality </li></ul></ul>Crowdsourcing seems a good strategy for multimedia QoE assessment!
    156. 156. http://mmnet.iis.sinica.edu.tw/link/qoe
    157. 157. Conclusion <ul><li>Crowdsourcing is not without limitations </li></ul><ul><ul><li>physical contact </li></ul></ul><ul><ul><li>environment control </li></ul></ul><ul><ul><li>media </li></ul></ul><ul><li>With paired comparison and user input verification, </li></ul><ul><ul><li>less monetary cost </li></ul></ul><ul><ul><li>wider participant diversity </li></ul></ul><ul><ul><li>shorter experiment cycle </li></ul></ul><ul><ul><li>evaluation quality maintained </li></ul></ul>
    158. 158. Thank You! Kuan-Ta Chen Academia Sinica
    159. 159. Future Plan <ul><li>QoE measurement </li></ul><ul><ul><li>Psychophysical approach </li></ul></ul><ul><ul><li>Exploit social gaming to provide the incentives for large-scale studies </li></ul></ul><ul><ul><li>Goals: cross-application, cross-modal, content-dependent, context-dependent </li></ul></ul><ul><li>QoE provisioning </li></ul><ul><ul><li>QoE-aware communication systems </li></ul></ul><ul><ul><li>Parameter auto configured in run time </li></ul></ul><ul><ul><ul><li>playout scheduling, coding scheme (& rate), overlay path routing, transmission (redundancy, coding, protocol), etc </li></ul></ul></ul>
    160. 160. Acknowledgements Chin-Laung Lei Chun-Ying Huang Polly Huang Yu-Chun Chang Te-Yuan Huang William Tu Hung-Hsuan Chen Chen-Chi Wu Wei-Cheng Xiao
    161. 161. Thank you! Sheng-Wei (Kuan-Ta) Chen http://www.iis.sinica.edu.tw/~swc Institute of Information Science, Academia Sinica

    ×