SlideShare a Scribd company logo
1 of 36
Optimising digital content delivery




          Tamas Jambor
     University College London
      EPSRC Industrial CASE
Structure of the talk

•   Problem description
•   Features of the data
•   Baseline algorithms
•   Modified algorithms for content delivery
    – Time-aware models
• Evaluating efficient content delivery
• Future work
Background




• Video traffic increasing over the internet
Increased video traffic

• Peak-time traffic slows connection speed
• Delivering videos beforehand
  –   Cheaper to deliver
  –   Reduce peak time traffic
  –   User can watch content instantly (slow connection)
  –   HD content can be delivered (slow connection)
Features of the data

• Film Data (views and previews)
   – 1 July 2009 – 31 January 2010
   – 2.3 million entries, 64 000 users, 1300 assets
• Removing inconsistencies
   – Unknown entries
   – Assets end earlier than assets start
• After filtering
   – 1.9 million entries, 64 000 users, 1267 items
Training and test sets

• Requirements
  – Any user has to have at least one preview or view in the
    training and one view in the test
  – No previews in the test
• Training
  – 1 July 2009 – 31 December 2009
  – 1.2 million entries, 26 000 users, 1267 items
• Test
  – 1 January 2010 – 31 January 2010
  – 72000 entries, 26 000 users, 1267 items
Unique features of the dataset

• Implicit feedback carries less information
  – Feedback is expressed before an opinion could be
    formed
     • User might not like the item
  – Implicit feedback recommender systems make
    assumptions on missing rating scores
     • User is not interested
     • User does not know the item
Unique features of the dataset

• Preview information
   – Weak indication of interest
0.16
0.14
0.12
 0.1
                                      Purchased after one
0.08                                  day
0.06                                  Purchased within one
                                      day
0.04
0.02
  0
          Per Item         Per User
Baseline algorithm

• Implicit SVD
                                    T    2                    2         2
  min            wu ,i (ru ,i   q du )
                                    i            (       qi           du )
   q ,d
          u ,i                                       i            u

• Fix item or user
                          T     u            1       T   u
          du          (Y C Y             I ) Y C r (u )
Baseline algorithm

• Advantage of this approach
  – Task can be divided to independent chunks (user/item)
  – Scalable solution
  – It can be computed in a parallel fashion
• Weights
  – Addition information / assumption about data
Weights

• Weight can be assigned for each user-item pair
  – Previews
     wu ,i       P (t | p, u ) (1       ) P (t | p, i )
     • Item that are previewed before are more likely to be watched
  – Confidence decay in time
                  t tr
    wu ,i    e
Popular items

                                                       Frequency   Avr(days)   SD(days)   Available (days)
I Now Pronounce You Chuck & Larry (PictureBox)         4469        8.30        8.29       28.00
Curious George: A Very Monkey Christmas (PictureBox)   3753        8.73        7.21       31.00
Kingdom                                                3709        8.96        8.05       28.00
Santa Claus (PictureBox)                               3654        3.37        2.72       18.00
Munster's Scary Little Christmas (PictureBox)          3654        8.38        8.09       28.00
Inside Man (PictureBox)                                3530        9.31        8.35       28.00
Step Up (PictureBox)                                   3326        9.05        8.40       28.00
Wiz                                                    3291        14.29       12.04      41.46
Smokin' Aces (PictureBox)                              3253        7.68        7.64       28.00
Break-Up                                               3203        9.32        7.84       27.96
Jarhead (PictureBox)                                   3041        8.84        7.90       28.00
Stealing Christmas (PictureBox)                        3026        3.69        3.03       18.00
Hangover                                               3006        11.10       6.88       26.56
Viewing habits
                       Patch Adams   Elizabeth - The Golden Age

                  70


                  60


                  50
Number of views




                  40


                  30


                  20


                  10


                  0




                                      Date
Viewing habits


• Viewing behaviour
  – During the day
     • Differentiate who is watching
  – During the week
     • Weekends/weekdays
  – Categories
     • Some content are likely to be watched at specific times
Viewing habits
                            1         t
• Gaussian CDF   (t , , )     1 erf
                            2             2   2
Prediction

• For known items
      rc ,t    rb     (td ,     c ,d     ,    c ,d   )    (tw ,   c ,w   ,   c ,w   )
  – Baseline prediction
  – Daily Gaussian distribution for category
  – Weekly Gaussian distribution for category
• For new items
    rc ,t     rc    (t d ,    c ,d   ,       c ,d   )    (t w ,   c,w   ,    c,w   )
  – Prediction for the category
  – Daily Gaussian distribution for category
  – Weekly Gaussian distribution for category
Evaluation method
                        hu
• Top-N Hit rate lu
                        vu
  – h = num. assets watched ∩ (top-N) recommended
  – v = sum the assets watched
• Overall performance        1   M
                         l             li
                             M   i 1
  – Average performance across all users (M)
Results: Top-15 Performance
                                           Top-15 Hit Rate       Number of users
0.25                                                                                                    9000


                                                                                                        8000

 0.2
                                                                                                        7000


                                                                                                        6000
0.15
                                                                                                        5000


                                                                                                        4000
 0.1
                                                                                                        3000


                                                                                                        2000
0.05

                                                                                                        1000


  0                                                                                                     0
       500--Above   200--500   100--200   50--100       20--50       10--20        5--10   1--5   All
Efficient caching
                                        WCC

                           Content
                           Provider                  STB

• Pre-cache items that are predicted to be relevant
  –   Cheaper to deliver
  –   Reduce peak time traffic
  –   User can watch content instantly (slow connection)
  –   HD content can be delivered (slow connection)
Predictive caching

     CONTENT



                                        MODELS
                                                                CACHE LIST
1.   Assets
2.   Size
3.   Schedule (window start/end)
4.   Category                                                   •Cost per customer
                                   1.   Personalised Top-N      •Overall cost
     CUSTOMERS
                                   2.   Popular items
                                   3.   Marketing suggestions




1.   View History (time)
Cost function


  call    cbe * nbe    caf * naf
• Cost of delivering best effort (BE)
• Cost of delivering in real time (AF)
Assumptions of the model

• Two (or more) different pricing for different
  delivery methods
• Fixed line speed
• Simplified markets
• Ignore network infrastructure
Preliminary Evaluation

• Hit rate
   – Not sensitive to sparsity
   – Good to measure performance
• Precision
   – Sensitive to sparsity and relevant items
Results: Hit rate
            0.3



           0.25



            0.2
Hit rate




           0.15



            0.1



           0.05



             0
                  1   6   11   16    21            26           31   36   41   46
                                    Number of retrieved items
Results: Average precision
                    0.0018


                    0.0016


                    0.0014


                    0.0012
Average precision




                     0.001


                    0.0008


                    0.0006


                    0.0004


                    0.0002


                        0
                             1   6   11   16    21            26           31   36   41   46
                                               Number of retrieved items
Sparse data
                                                           Average views

                                0.3



                               0.25
Average views (2010 January)




                                0.2



                               0.15



                                0.1



                               0.05



                                 0
                                      0   25   50   75   100            125   150   175   200   225
                                                               Profile size
Sparse data – how many items to upload

• Non-personalised
  – Variation between upload once a day to upload once in
    a month
• Personalised
  – How many items the use watched recently
Predictive cashing

• Error I:
   – Predict the number of items the user will watch
      • Control the maximum number of items cached
• Error II:
   – Prediction accuracy
      • Only predict for less risky users
Maximum number of items cached


             caf vu
    nu ,be
               cbe
• Example
  – User will watch 5 items in the coming month (predicted)
  – Deliver real time(AF): £0.70
  – Deliver before(BE): £0.30
             0.70 * 5
    nu ,be            11.66
              0.30
Performance


         hu ,be
    lu
         nu ,be
  – Hits on cached items
  – Numbersize of items cached
• Overall performance
                              N
                             i    1
                                    hi ,be
                       l     M
                              j 1
                                    n j ,be
Performance of the system

      cbe
  l
      caf

• To save on cost compare
  – The performance of the system
  – Ratio between the two delivery methods
Example

  – Performance
     • 3 hits on 5 delivered items, 2 items streamed
              hu ,be     3
      lu                     0 .6
               nbe       5
     • Deliver real time(AF): £0.70
     • Deliver before(BE): £0.30
              cbe      0.3
       l                     0.42
              caf      0.7
  – Cost
       call         cbe * nbe caf * naf   2 * 0.7 5 * 0.3 2.9
     • (expected to be less than streaming only)
Evaluation II

• Upload ratio      nbe     caf
                     v      cbe
     • Number of items cached
     • Example (caf=£0.7,cbe=£0.3): for every watched item we can
       cache maximum 2.3 items
• Upload hits       hbe     cbe
                    nbe     caf
     • Performance of the model
     • Example (caf=£0.7,cbe=£0.3): for ever cached item we need at
       least 0.42 hits
• If both satisfied cost saving is guaranteed
Results – Combining personalised and non-
personalised recommenders
               0.02

              0.018

              0.016

              0.014

              0.012
Upload hits




               0.01

              0.008

              0.006

              0.004

              0.002

                 0
                      0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4    0.45    0.5   0.55       0.6   0.65   0.7   0.75   0.8   0.85   0.9   0.95   1
                                                                              Personalised vs popular
Unique characteristics of the system

• Recommender algorithm
  – Low risk approach
  – No prediction if it is not likely to get it right
• Caching strategy
  – Only for users who will use the system
  – Predict the number of items to be uploaded
Future work

•   Test the system on other datasets
•   Redefine baseline algorithm
•   Availability might influence choice
•   Adaptive temporal approach
    – Controlling the update of the system
       • How much data is flowing in
       • How much performance loss the system expects

More Related Content

Similar to Optimising digital content delivery

Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?Richard Bishop
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...Balázs Hidasi
 
Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorialTerence Gao
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Domonkos Tikk
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
NumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notesNumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notesSpider Financial
 
Improving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image ContextImproving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image ContextWesley De Neve
 
Image ORB feature
Image ORB featureImage ORB feature
Image ORB featureGavin Gao
 
Ascd 2013
Ascd 2013Ascd 2013
Ascd 2013avega4
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at NetflixDataWorks Summit
 
IHC 2011 - Widgets Internship
IHC 2011 - Widgets InternshipIHC 2011 - Widgets Internship
IHC 2011 - Widgets InternshipEduardo Oliveira
 
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...HostedbyConfluent
 
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Andrea Scharnhorst
 
Social Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and RetrievalSocial Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and Retrievaljustinvw
 
Model Compression
Model CompressionModel Compression
Model CompressionDarshanG13
 
9.20 o13.2 k hogan
9.20 o13.2 k hogan9.20 o13.2 k hogan
9.20 o13.2 k hoganNZIP
 
Evaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated DatabasesEvaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated DatabasesMiguel Araújo
 
Processing images with Deep Learning
Processing images with Deep LearningProcessing images with Deep Learning
Processing images with Deep LearningJulien SIMON
 

Similar to Optimising digital content delivery (20)

Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?BCS SIGiST - How Fast is the Cloud?
BCS SIGiST - How Fast is the Cloud?
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
 
Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorial
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
NumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notesNumXL 1.55 LYNX release notes
NumXL 1.55 LYNX release notes
 
Group2 presentation2
Group2 presentation2Group2 presentation2
Group2 presentation2
 
Improving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image ContextImproving Image Tag Recommendation Using Favorite Image Context
Improving Image Tag Recommendation Using Favorite Image Context
 
Image ORB feature
Image ORB featureImage ORB feature
Image ORB feature
 
Ascd 2013
Ascd 2013Ascd 2013
Ascd 2013
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
 
IHC 2011 - Widgets Internship
IHC 2011 - Widgets InternshipIHC 2011 - Widgets Internship
IHC 2011 - Widgets Internship
 
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
 
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
 
Social Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and RetrievalSocial Book Search: A Combination of Personalized Recommendations and Retrieval
Social Book Search: A Combination of Personalized Recommendations and Retrieval
 
Model Compression
Model CompressionModel Compression
Model Compression
 
9.20 o13.2 k hogan
9.20 o13.2 k hogan9.20 o13.2 k hogan
9.20 o13.2 k hogan
 
Evaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated DatabasesEvaluating Data Freshness in Large Scale Replicated Databases
Evaluating Data Freshness in Large Scale Replicated Databases
 
Processing images with Deep Learning
Processing images with Deep LearningProcessing images with Deep Learning
Processing images with Deep Learning
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Optimising digital content delivery

  • 1. Optimising digital content delivery Tamas Jambor University College London EPSRC Industrial CASE
  • 2. Structure of the talk • Problem description • Features of the data • Baseline algorithms • Modified algorithms for content delivery – Time-aware models • Evaluating efficient content delivery • Future work
  • 3. Background • Video traffic increasing over the internet
  • 4. Increased video traffic • Peak-time traffic slows connection speed • Delivering videos beforehand – Cheaper to deliver – Reduce peak time traffic – User can watch content instantly (slow connection) – HD content can be delivered (slow connection)
  • 5. Features of the data • Film Data (views and previews) – 1 July 2009 – 31 January 2010 – 2.3 million entries, 64 000 users, 1300 assets • Removing inconsistencies – Unknown entries – Assets end earlier than assets start • After filtering – 1.9 million entries, 64 000 users, 1267 items
  • 6. Training and test sets • Requirements – Any user has to have at least one preview or view in the training and one view in the test – No previews in the test • Training – 1 July 2009 – 31 December 2009 – 1.2 million entries, 26 000 users, 1267 items • Test – 1 January 2010 – 31 January 2010 – 72000 entries, 26 000 users, 1267 items
  • 7. Unique features of the dataset • Implicit feedback carries less information – Feedback is expressed before an opinion could be formed • User might not like the item – Implicit feedback recommender systems make assumptions on missing rating scores • User is not interested • User does not know the item
  • 8. Unique features of the dataset • Preview information – Weak indication of interest 0.16 0.14 0.12 0.1 Purchased after one 0.08 day 0.06 Purchased within one day 0.04 0.02 0 Per Item Per User
  • 9. Baseline algorithm • Implicit SVD T 2 2 2 min wu ,i (ru ,i q du ) i ( qi du ) q ,d u ,i i u • Fix item or user T u 1 T u du (Y C Y I ) Y C r (u )
  • 10. Baseline algorithm • Advantage of this approach – Task can be divided to independent chunks (user/item) – Scalable solution – It can be computed in a parallel fashion • Weights – Addition information / assumption about data
  • 11. Weights • Weight can be assigned for each user-item pair – Previews wu ,i P (t | p, u ) (1 ) P (t | p, i ) • Item that are previewed before are more likely to be watched – Confidence decay in time t tr wu ,i e
  • 12. Popular items Frequency Avr(days) SD(days) Available (days) I Now Pronounce You Chuck & Larry (PictureBox) 4469 8.30 8.29 28.00 Curious George: A Very Monkey Christmas (PictureBox) 3753 8.73 7.21 31.00 Kingdom 3709 8.96 8.05 28.00 Santa Claus (PictureBox) 3654 3.37 2.72 18.00 Munster's Scary Little Christmas (PictureBox) 3654 8.38 8.09 28.00 Inside Man (PictureBox) 3530 9.31 8.35 28.00 Step Up (PictureBox) 3326 9.05 8.40 28.00 Wiz 3291 14.29 12.04 41.46 Smokin' Aces (PictureBox) 3253 7.68 7.64 28.00 Break-Up 3203 9.32 7.84 27.96 Jarhead (PictureBox) 3041 8.84 7.90 28.00 Stealing Christmas (PictureBox) 3026 3.69 3.03 18.00 Hangover 3006 11.10 6.88 26.56
  • 13. Viewing habits Patch Adams Elizabeth - The Golden Age 70 60 50 Number of views 40 30 20 10 0 Date
  • 14. Viewing habits • Viewing behaviour – During the day • Differentiate who is watching – During the week • Weekends/weekdays – Categories • Some content are likely to be watched at specific times
  • 15. Viewing habits 1 t • Gaussian CDF (t , , ) 1 erf 2 2 2
  • 16. Prediction • For known items rc ,t rb (td , c ,d , c ,d ) (tw , c ,w , c ,w ) – Baseline prediction – Daily Gaussian distribution for category – Weekly Gaussian distribution for category • For new items rc ,t rc (t d , c ,d , c ,d ) (t w , c,w , c,w ) – Prediction for the category – Daily Gaussian distribution for category – Weekly Gaussian distribution for category
  • 17. Evaluation method hu • Top-N Hit rate lu vu – h = num. assets watched ∩ (top-N) recommended – v = sum the assets watched • Overall performance 1 M l li M i 1 – Average performance across all users (M)
  • 18. Results: Top-15 Performance Top-15 Hit Rate Number of users 0.25 9000 8000 0.2 7000 6000 0.15 5000 4000 0.1 3000 2000 0.05 1000 0 0 500--Above 200--500 100--200 50--100 20--50 10--20 5--10 1--5 All
  • 19. Efficient caching WCC Content Provider STB • Pre-cache items that are predicted to be relevant – Cheaper to deliver – Reduce peak time traffic – User can watch content instantly (slow connection) – HD content can be delivered (slow connection)
  • 20. Predictive caching CONTENT MODELS CACHE LIST 1. Assets 2. Size 3. Schedule (window start/end) 4. Category •Cost per customer 1. Personalised Top-N •Overall cost CUSTOMERS 2. Popular items 3. Marketing suggestions 1. View History (time)
  • 21. Cost function call cbe * nbe caf * naf • Cost of delivering best effort (BE) • Cost of delivering in real time (AF)
  • 22. Assumptions of the model • Two (or more) different pricing for different delivery methods • Fixed line speed • Simplified markets • Ignore network infrastructure
  • 23. Preliminary Evaluation • Hit rate – Not sensitive to sparsity – Good to measure performance • Precision – Sensitive to sparsity and relevant items
  • 24. Results: Hit rate 0.3 0.25 0.2 Hit rate 0.15 0.1 0.05 0 1 6 11 16 21 26 31 36 41 46 Number of retrieved items
  • 25. Results: Average precision 0.0018 0.0016 0.0014 0.0012 Average precision 0.001 0.0008 0.0006 0.0004 0.0002 0 1 6 11 16 21 26 31 36 41 46 Number of retrieved items
  • 26. Sparse data Average views 0.3 0.25 Average views (2010 January) 0.2 0.15 0.1 0.05 0 0 25 50 75 100 125 150 175 200 225 Profile size
  • 27. Sparse data – how many items to upload • Non-personalised – Variation between upload once a day to upload once in a month • Personalised – How many items the use watched recently
  • 28. Predictive cashing • Error I: – Predict the number of items the user will watch • Control the maximum number of items cached • Error II: – Prediction accuracy • Only predict for less risky users
  • 29. Maximum number of items cached caf vu nu ,be cbe • Example – User will watch 5 items in the coming month (predicted) – Deliver real time(AF): £0.70 – Deliver before(BE): £0.30 0.70 * 5 nu ,be 11.66 0.30
  • 30. Performance hu ,be lu nu ,be – Hits on cached items – Numbersize of items cached • Overall performance N i 1 hi ,be l M j 1 n j ,be
  • 31. Performance of the system cbe l caf • To save on cost compare – The performance of the system – Ratio between the two delivery methods
  • 32. Example – Performance • 3 hits on 5 delivered items, 2 items streamed hu ,be 3 lu 0 .6 nbe 5 • Deliver real time(AF): £0.70 • Deliver before(BE): £0.30 cbe 0.3 l 0.42 caf 0.7 – Cost call cbe * nbe caf * naf 2 * 0.7 5 * 0.3 2.9 • (expected to be less than streaming only)
  • 33. Evaluation II • Upload ratio nbe caf v cbe • Number of items cached • Example (caf=£0.7,cbe=£0.3): for every watched item we can cache maximum 2.3 items • Upload hits hbe cbe nbe caf • Performance of the model • Example (caf=£0.7,cbe=£0.3): for ever cached item we need at least 0.42 hits • If both satisfied cost saving is guaranteed
  • 34. Results – Combining personalised and non- personalised recommenders 0.02 0.018 0.016 0.014 0.012 Upload hits 0.01 0.008 0.006 0.004 0.002 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Personalised vs popular
  • 35. Unique characteristics of the system • Recommender algorithm – Low risk approach – No prediction if it is not likely to get it right • Caching strategy – Only for users who will use the system – Predict the number of items to be uploaded
  • 36. Future work • Test the system on other datasets • Redefine baseline algorithm • Availability might influence choice • Adaptive temporal approach – Controlling the update of the system • How much data is flowing in • How much performance loss the system expects