EPG content recommendation in large scale: a case study on interactive TV platform
Feb. 19, 2016•0 likes•1,037 views
Download to read offline
Report
Data & Analytics
EPG content recommendation in large scale: a case study on interactive TV platform -- ICMLA 2013 - Machine Learning with Multimedia Data (7th December 2013, Miami, FL)
EPG content recommendation in large scale: a case study on interactive TV platform
1. EPG content recommendation in
large scale: a case study on
interactive TV platform
D. Zibriczky, Z. Petres, M. Waszlavik, D. Tikk
ICMLA 2013 - Machine Learning with Multimedia Data
7th December 2013. Miami. United States
2. Outline
• Introduction
• Problem
• Solution
• Offline results
• Online results
• Conclusion
Enter date in master2 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
3. Introduction / Consumption trends
3 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
4. Introduction / Electronic Program Guide
4 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
5. Introduction / Goal
• SaskTel
• Finding relevant contents with minimal effort
• Time-shifting
• Multiple devices per household
• Graphical User Interface
• Increasing content consumption / watching length
• Increasing click through rate (CTR) using Gravity’s GUI
Enter date in master5 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
6. Problem / Recommendation concept
• User: Device
Users cannot be distinguished explicitly
More than one device per household
• Item: Scheduled contents (time, program id, channel id)
Typically series or programs without episodes
Metadata: Information about the items
• Event: Remote controller / set-top-box based implicit feedbacks
Switching channel, set to record, rewind, replay, stop, pause
Next schedule, watching duration
• Recommendable items
Set of series or programs that are broadcasted at the moment of
recommendation request or later (on now, on later scenario)
• Recommendation
Sorting recommendable items by prediction values
Other recommendation logic (randomization, mixing, etc..)
Enter date in master6 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
7. Problem / Difficulties
• Implicit feedbacks only (no explicit data)
• Huge but noisy data set (zapping, leave-on, irrelevant events, …)
• Cold start problem (new items, short lifetime)
• Small recommendable set at a time
• Context dependency (time, multiple users per household)
• Difference between offline and online optimization
Enter date in master7 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
8. Solution / Baselines
• Most popular channels
• Most popular contents (series or programs)
• Users’ favourite channels
• Users’ favourite contents (series or programs)
Enter date in master8 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
9. Solution / Content-based filtering
• Cosine Similarity
• User model: Weighted average of meta vectors
• Prediction: Cosine similarity of vectors
• Improvement: Term frequency based weighting (TFIDF)
Enter date in master9 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
M The Simpsons
How I Met
Your Mother
Futurama …
Genre = Animation 1 0 1 …
Genre = Comedy 1 1 1 …
… … … … …
Director = Matt Groening 1 0 1 …
Director = Carter Bays 0 1 0 …
Actor = Dan Castellaneta 1 0 0 …
Actor = Billy West 0 0 1 …
… … … … …
User 1
0.53
0.81
…
0.18
0.00
0.18
0.00
…
10. Solution / Collaborative Filtering
• Matrix Factorization
• User model: User factors
• Prediction: Dot product of latent factors
• Solver: Alternating Least Squares with Coordinate Descent (IALS1)
Enter date in master10 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
R The Simpsons
How I Met
Your Mother
Futurama …
User 1 1 …
User 2 1 1 u2*i3 …
User 3 1 …
… … … … …
Item factors
i11 i21 i31 …
i21 i22 i32 …
User factors
u11 u12
u21 u22
u31 u32
… …
11. Solution / Hybrid filtering
Enter date in master11 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
R* The Simpsons
How I Met
Your Mother
Futurama …
User 1 1 0 0 …
User 2 1 1 0 …
User 3 0 1 0 …
… … … … …
Genre = Animation 1 0 1 …
Genre = Comedy 1 1 1 …
… … … … …
Director = Matt Groening 1 0 1 …
Director = Carter Bays 0 1 0 …
Actor = Dan Castellaneta 1 0 0 …
Actor = Billy West 0 0 1 …
… … … … …
User factors
u11 u12
u21 u22
u31 u32
… …
pu11 pu12
pu22 pu22
… …
… …
… …
… …
… …
… …
Item factors
i11 i21 I31 …
i21 i22 I32 …
• Hybrid IALS1
12. TP
factors
Solution / Channel recommendation
• Tensor factorization (ITALS1)
• Prediction: Hadamard product of latent factors
• Improvement: Watching duration based weighting
Enter date in master12 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
R (4:00-12:00)
Channel
Sports 1
Channel
Sports 2
Channel
News 1
…
User 1 1 1 …
User 2 1 …
User 3 1 …
… … … … …
R (12:00-20:00)
Channel
Sports 1
Channel
Sports 2
Channel
News 1
…
User 1 1 …
User 2 1 1 …
User 3 1 …
… … … … …
R (20:00-4:00)
Channel
Sports 1
Channel
Sports 2
Channel
News 1
…
User 1 1 …
User 2 1 …
User 3 u3°i2°t3 …
… … … … …
User factors
u11 u12
u21 u22
u31 u32
… …
Item factors
i11 i21 i31 …
i21 i22 i32 …
t11
t12
t21
t22
t31
t32
13. Solution / Item grouping
Enter date in master13 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
14. Solution / Preprocessing
Enter date in master14 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Original set
308M
82M
23M
Train set
22M
Test set
676K
Event type based filtering by significance
1
2
3 3
1
2
3
Filtering by leave-on and short duration
Splitting by time
15. Offline results / Measurement
Enter date in master15 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
• Metrics:
Recall@N
Mean Reciprocial Rank (MRR)
• Item splits:
Having events on training set or not
o old items
o new items
Popularity 20-80 split
o popular items
o tail items
Episode of a series or not
o series
o non-series
16. • Recall@15
* Items are grouped by series ids or program ids
** Items are grouped by channel ids
*** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series
Offline results / Comparison
Enter date in master16 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Algorithm Type All items
Old
items
New
items
Popular
items
Tail
items
Series
Non-
series
Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91%
Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00%
Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28%
Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09%
CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93%
IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65%
ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06%
Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63%
Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41%
Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322
17. • Recall@15
* Items are grouped by series ids or program ids
** Items are grouped by channel ids
*** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series
Offline results / Comparison
Enter date in master17 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Algorithm Type All items
Old
items
New
items
Popular
items
Tail
items
Series
Non-
series
Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91%
Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00%
Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28%
Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09%
CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93%
IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65%
ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06%
Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63%
Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41%
Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322
18. Online results / User Interface
Enter date in master18 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
19. Online results / Measurement
Enter date in master19 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
• Metrics:
Click-through rate (CTR)
Watching Length Ratio (WR): The average watching length of the contents
that were watched at least 1 minutes by the user.
Completed Watched Ratio (CWR): The average ratio of the events in the
content was watched at least 90% of it’s remaining length.
• Methods:
EPG-Z: Standard consumption method (EPG and channel zapping)
R4U: Recommended 4 U
20. Online results / Measurement
Enter date in master20 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
• Metrics:
Click-through rate (CTR)
Watching Length Ratio (WR): The average watching length of the contents
that were watched at least 1 minutes by the user.
Completed Watched Ratio (CWR): The average ratio of the events in the
content was watched at least 90% of it’s remaining length.
• Methods:
EPG-Z: Standard consumption method (EPG and channel zapping)
R4U: Recommended 4 U
EPG-Z vs. R4U?
21. Online results / Clicks
Enter date in master21 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
0%
10%
20%
30%
40%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Distribution of clicks by position
• Users like to click on the first item.
• 80% of the clicks comes from one of the Top5 positions.
• More clicks in the 15th position (2.2%) than in the 14th (1.3%).
22. Online results / CTR
Enter date in master22 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
33,20%
35,30%
33,12%
26,49%
39,05%
52,07%
33,16%
All items
Non-series
Series
Tail items
Popular items
New items
Old items
Click-through rate by different item splits
23. Online results / CTR by usage
Enter date in master23 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 10 100
Average CTR vs. # of rec. requests from the first use of R4U
24. Online results / Watching behavior
Enter date in master24 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Watching Length Ratio Completed Watching Ratio
Item splits EPG-Z R4U EPG-Z R4U
Old items 30.02% 42.04% 16.02% 31.03%
New items 21.11% 35.51% 8.01% 23.12%
Popular items 30.81% 44.19% 16.30% 32.27%
Long-tail items 28.01% 38.43% 15.11% 27.66%
Series 31.04% 43.00% 16.92% 31.51%
Non-series 17.94% 15.26% 5.31% 7.22%
All items 29.90% 42.02% 15.91% 30.53%
25. Online results / Watching behavior
• Contents selected via R4U are watched 40% longer and completed
with almost twice more probability than in standard way.
Enter date in master25 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Watching Length Ratio Completed Watching Ratio
Item splits EPG-Z R4U EPG-Z R4U
Old items 30.02% 42.04% 16.02% 31.03%
New items 21.11% 35.51% 8.01% 23.12%
Popular items 30.81% 44.19% 16.30% 32.27%
Long-tail items 28.01% 38.43% 15.11% 27.66%
Series 31.04% 43.00% 16.92% 31.51%
Non-series 17.94% 15.26% 5.31% 7.22%
All items 29.90% 42.02% 15.91% 30.53%<< <<
26. Online results / Offline vs. Online metrics
• High correlation between Recall/MRR and Completed Waching Ratio
Enter date in master26 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
Offline Online
Recall@15 MRR CWR
Old items 56.98% 0.1070 31.03%
New items 26.15% 0.0405 23.12%
Popular items 57.64% 0.1097 32.27%
Long-tail items 43.61% 0.0712 27.66%
Series 57.91% 0.1094 31.51%
Non-series 24.41% 0.0322 7.22%
All items 55.48% 0.1038 30.53%
27. Conclusion
• Linear recommendation difficulties.
• Metadata based item modeling (CBF) is quite effective, additional
improvement by combining with CF.
• Users prefer first items, they don’t do much effort.
• High click-through rate, especially for new items.
• R4U affects user behavior and satisfaction.
• Contents selected via R4U are watched 40% longer and completed
with almost twice more probability than in standard way.
• High correlation between the proposed offline and online metrics.
Enter date in master27 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States
28. ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States