Successfully reported this slideshow.
Your SlideShare is downloading. ×

EPG content recommendation in large scale: a case study on interactive TV platform

EPG content recommendation in large scale: a case study on interactive TV platform

Download to read offline

EPG content recommendation in large scale: a case study on interactive TV platform -- ICMLA 2013 - Machine Learning with Multimedia Data (7th December 2013, Miami, FL)

EPG content recommendation in large scale: a case study on interactive TV platform -- ICMLA 2013 - Machine Learning with Multimedia Data (7th December 2013, Miami, FL)

More Related Content

EPG content recommendation in large scale: a case study on interactive TV platform

  1. 1. EPG content recommendation in large scale: a case study on interactive TV platform D. Zibriczky, Z. Petres, M. Waszlavik, D. Tikk ICMLA 2013 - Machine Learning with Multimedia Data 7th December 2013. Miami. United States
  2. 2. Outline • Introduction • Problem • Solution • Offline results • Online results • Conclusion Enter date in master2 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  3. 3. Introduction / Consumption trends 3 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  4. 4. Introduction / Electronic Program Guide 4 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  5. 5. Introduction / Goal • SaskTel • Finding relevant contents with minimal effort • Time-shifting • Multiple devices per household • Graphical User Interface • Increasing content consumption / watching length • Increasing click through rate (CTR) using Gravity’s GUI Enter date in master5 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  6. 6. Problem / Recommendation concept • User: Device  Users cannot be distinguished explicitly  More than one device per household • Item: Scheduled contents (time, program id, channel id)  Typically series or programs without episodes  Metadata: Information about the items • Event: Remote controller / set-top-box based implicit feedbacks  Switching channel, set to record, rewind, replay, stop, pause  Next schedule, watching duration • Recommendable items  Set of series or programs that are broadcasted at the moment of recommendation request or later (on now, on later scenario) • Recommendation  Sorting recommendable items by prediction values  Other recommendation logic (randomization, mixing, etc..) Enter date in master6 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  7. 7. Problem / Difficulties • Implicit feedbacks only (no explicit data) • Huge but noisy data set (zapping, leave-on, irrelevant events, …) • Cold start problem (new items, short lifetime) • Small recommendable set at a time • Context dependency (time, multiple users per household) • Difference between offline and online optimization Enter date in master7 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  8. 8. Solution / Baselines • Most popular channels • Most popular contents (series or programs) • Users’ favourite channels • Users’ favourite contents (series or programs) Enter date in master8 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  9. 9. Solution / Content-based filtering • Cosine Similarity • User model: Weighted average of meta vectors • Prediction: Cosine similarity of vectors • Improvement: Term frequency based weighting (TFIDF) Enter date in master9 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States M The Simpsons How I Met Your Mother Futurama … Genre = Animation 1 0 1 … Genre = Comedy 1 1 1 … … … … … … Director = Matt Groening 1 0 1 … Director = Carter Bays 0 1 0 … Actor = Dan Castellaneta 1 0 0 … Actor = Billy West 0 0 1 … … … … … … User 1 0.53 0.81 … 0.18 0.00 0.18 0.00 …
  10. 10. Solution / Collaborative Filtering • Matrix Factorization • User model: User factors • Prediction: Dot product of latent factors • Solver: Alternating Least Squares with Coordinate Descent (IALS1) Enter date in master10 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States R The Simpsons How I Met Your Mother Futurama … User 1 1 … User 2 1 1 u2*i3 … User 3 1 … … … … … … Item factors i11 i21 i31 … i21 i22 i32 … User factors u11 u12 u21 u22 u31 u32 … …
  11. 11. Solution / Hybrid filtering Enter date in master11 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States R* The Simpsons How I Met Your Mother Futurama … User 1 1 0 0 … User 2 1 1 0 … User 3 0 1 0 … … … … … … Genre = Animation 1 0 1 … Genre = Comedy 1 1 1 … … … … … … Director = Matt Groening 1 0 1 … Director = Carter Bays 0 1 0 … Actor = Dan Castellaneta 1 0 0 … Actor = Billy West 0 0 1 … … … … … … User factors u11 u12 u21 u22 u31 u32 … … pu11 pu12 pu22 pu22 … … … … … … … … … … … … Item factors i11 i21 I31 … i21 i22 I32 … • Hybrid IALS1
  12. 12. TP factors Solution / Channel recommendation • Tensor factorization (ITALS1) • Prediction: Hadamard product of latent factors • Improvement: Watching duration based weighting Enter date in master12 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States R (4:00-12:00) Channel Sports 1 Channel Sports 2 Channel News 1 … User 1 1 1 … User 2 1 … User 3 1 … … … … … … R (12:00-20:00) Channel Sports 1 Channel Sports 2 Channel News 1 … User 1 1 … User 2 1 1 … User 3 1 … … … … … … R (20:00-4:00) Channel Sports 1 Channel Sports 2 Channel News 1 … User 1 1 … User 2 1 … User 3 u3°i2°t3 … … … … … … User factors u11 u12 u21 u22 u31 u32 … … Item factors i11 i21 i31 … i21 i22 i32 … t11 t12 t21 t22 t31 t32
  13. 13. Solution / Item grouping Enter date in master13 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  14. 14. Solution / Preprocessing Enter date in master14 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Original set 308M 82M 23M Train set 22M Test set 676K Event type based filtering by significance 1 2 3 3 1 2 3 Filtering by leave-on and short duration Splitting by time
  15. 15. Offline results / Measurement Enter date in master15 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States • Metrics:  Recall@N  Mean Reciprocial Rank (MRR) • Item splits:  Having events on training set or not o old items o new items  Popularity 20-80 split o popular items o tail items  Episode of a series or not o series o non-series
  16. 16. • Recall@15 * Items are grouped by series ids or program ids ** Items are grouped by channel ids *** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series Offline results / Comparison Enter date in master16 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Algorithm Type All items Old items New items Popular items Tail items Series Non- series Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91% Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00% Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28% Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09% CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93% IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65% ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06% Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63% Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41% Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322
  17. 17. • Recall@15 * Items are grouped by series ids or program ids ** Items are grouped by channel ids *** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series Offline results / Comparison Enter date in master17 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Algorithm Type All items Old items New items Popular items Tail items Series Non- series Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91% Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00% Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28% Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09% CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93% IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65% ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06% Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63% Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41% Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322
  18. 18. Online results / User Interface Enter date in master18 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  19. 19. Online results / Measurement Enter date in master19 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States • Metrics:  Click-through rate (CTR)  Watching Length Ratio (WR): The average watching length of the contents that were watched at least 1 minutes by the user.  Completed Watched Ratio (CWR): The average ratio of the events in the content was watched at least 90% of it’s remaining length. • Methods:  EPG-Z: Standard consumption method (EPG and channel zapping)  R4U: Recommended 4 U
  20. 20. Online results / Measurement Enter date in master20 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States • Metrics:  Click-through rate (CTR)  Watching Length Ratio (WR): The average watching length of the contents that were watched at least 1 minutes by the user.  Completed Watched Ratio (CWR): The average ratio of the events in the content was watched at least 90% of it’s remaining length. • Methods:  EPG-Z: Standard consumption method (EPG and channel zapping)  R4U: Recommended 4 U EPG-Z vs. R4U?
  21. 21. Online results / Clicks Enter date in master21 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States 0% 10% 20% 30% 40% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Distribution of clicks by position • Users like to click on the first item. • 80% of the clicks comes from one of the Top5 positions. • More clicks in the 15th position (2.2%) than in the 14th (1.3%).
  22. 22. Online results / CTR Enter date in master22 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States 33,20% 35,30% 33,12% 26,49% 39,05% 52,07% 33,16% All items Non-series Series Tail items Popular items New items Old items Click-through rate by different item splits
  23. 23. Online results / CTR by usage Enter date in master23 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States 0% 10% 20% 30% 40% 50% 60% 70% 80% 1 10 100 Average CTR vs. # of rec. requests from the first use of R4U
  24. 24. Online results / Watching behavior Enter date in master24 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Watching Length Ratio Completed Watching Ratio Item splits EPG-Z R4U EPG-Z R4U Old items 30.02% 42.04% 16.02% 31.03% New items 21.11% 35.51% 8.01% 23.12% Popular items 30.81% 44.19% 16.30% 32.27% Long-tail items 28.01% 38.43% 15.11% 27.66% Series 31.04% 43.00% 16.92% 31.51% Non-series 17.94% 15.26% 5.31% 7.22% All items 29.90% 42.02% 15.91% 30.53%
  25. 25. Online results / Watching behavior • Contents selected via R4U are watched 40% longer and completed with almost twice more probability than in standard way. Enter date in master25 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Watching Length Ratio Completed Watching Ratio Item splits EPG-Z R4U EPG-Z R4U Old items 30.02% 42.04% 16.02% 31.03% New items 21.11% 35.51% 8.01% 23.12% Popular items 30.81% 44.19% 16.30% 32.27% Long-tail items 28.01% 38.43% 15.11% 27.66% Series 31.04% 43.00% 16.92% 31.51% Non-series 17.94% 15.26% 5.31% 7.22% All items 29.90% 42.02% 15.91% 30.53%<< <<
  26. 26. Online results / Offline vs. Online metrics • High correlation between Recall/MRR and Completed Waching Ratio Enter date in master26 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States Offline Online Recall@15 MRR CWR Old items 56.98% 0.1070 31.03% New items 26.15% 0.0405 23.12% Popular items 57.64% 0.1097 32.27% Long-tail items 43.61% 0.0712 27.66% Series 57.91% 0.1094 31.51% Non-series 24.41% 0.0322 7.22% All items 55.48% 0.1038 30.53%
  27. 27. Conclusion • Linear recommendation difficulties. • Metadata based item modeling (CBF) is quite effective, additional improvement by combining with CF. • Users prefer first items, they don’t do much effort. • High click-through rate, especially for new items. • R4U affects user behavior and satisfaction. • Contents selected via R4U are watched 40% longer and completed with almost twice more probability than in standard way. • High correlation between the proposed offline and online metrics. Enter date in master27 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
  28. 28. ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

×