SlideShare a Scribd company logo
Music Video
 Redundancy and Half-Life
       in YouTube

Matthias Prellwitz and Michael L. Nelson
  matthias.prellwitz@googlemail.com
           mln@cs.odu.edu



                                             TPDL 2011
                                       Berlin, Germany
                                                9/26/11
Linking to a particular copy
“Rolling Stones - Satisfaction”




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          2
Metadata lost
when YouTube video disappears

                                      video title        The Rolling Stones – Satisfaction
                                      url                http://www.youtube.com/watch?v=214szPQBUYc




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                  3
Metadata hard to recover from
Search Engines




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          4
But nearly 300 copies remain in YouTube




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          5
Linking music-related URIs

‣ Transparent URI semantics
  ‣ http://www.last.fm/music/John+Lennon/_/Imagine
  ‣ http://www.ilike.com/artist/The+Cribs/track/I%27m+A+Realist
  ‣ http://www.last.fm/music/Johnny+Cash/_/Highwayman
‣ Opaque URI semantics
  ‣ http://vids.myspace.com/index.cfm?fuseaction=vids.individual&videoid=5168491
  ‣ http://www.youtube.com/watch?v=VST2KKIYn50




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                               6
Popular Music
US T 40 Singles Charts of 9/25/10
    op




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          7
Popular Music
Selected Music Blogs




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          8
Popular Music
The 500 Greatest Songs of all Time




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          9
Total Result Size Range
US T 40 Singles Charts of 9/25/10
    op

                                                              123,239
                                                                      Lady Gaga
                                                               83,298
                                                                      Alejandro
                                                               43,945




                                                                  66 Selena Gomez & The
                                                                  35 Scene
                                                                  26 A Year Without Rain




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                       10
Total Result Size Range
Selected Music Blogs

                                                              264,753
                                                                      Lady Gaga
                                                              256,205
                                                                      Bad Romance
                                                              232,936




                                                                     Mariah Carey featuring
                                                                   0
                                                                     Juelz Santana & Bone
                                                                   0
                                                                     Thugs-n-Harmony
                                                                   0
                                                                     Don't Forget About Us




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                          11
Total Result Size Range
The 500 Greatest Songs of all Time

                                                              174,088
                                                                      Michael Jackson
                                                              162,937
                                                                      Billie Jean
                                                              145,076




                                                                    0
                                                                      The Isley Brothers
                                                                    0
                                                                      That Lady (Part 1 and 2)
                                                                    0




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                             12
URI Unavailability
Rooted from a selected collection




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          13
URI Unavailability
Expected Half-life




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          14
URI Publication and Removal Rate




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          15
Lifetimes of unavailable videos




             Years




            Month


TPDL 2011     Music Video Redundancy and Half-Life in YouTube
9/26/11       Matthias Prellwitz and Michael L. Nelson          16
Reasons for no unavailable videos




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          17
When a YouTube video disappears

‣ video title The Rolling Stones - Satisfaction
‣ url         http://www.youtube.com/watch?v=214szPQBUYc
‣ Published 2009-06-13 13:44                 Removed 2010-04-09 (300 days online)




                                                              HTTP/1.1 404 Not FoundContent-Type:
                                                              text/html; charset=utf-8




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                18
Metadata purged from YouTube Databases

‣ Video feed
     curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc"

     HTTP/1.1 404 Not Found
     Content-Type: text/html; charset=UTF-8
     Private video


‣ Related videos
     curl -I
     "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/related"
     HTTP/1.1 404 Not Found
     Content-Type: text/html; charset=UTF-8
     Parent Video not found


‣ Video comments
     curl -I
     "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/comments"
     HTTP/1.1 200 OK Content-Type: application/atom+xml;
     charset=UTF-8
TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson               19
Metadata Normalization




                                                              Dereferencing ASIN via amazon.com Webservice:
                                                              Artist: Michael Jackson
                                                              Title:  Billie Jean (Single Version)




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                          20
Availability of music-related metadata

‣ parsed out only at the first time a URI showed up in the result list for the first time
  ‣ YouTube crawling restrictions




‣ Remaining portion
  ‣ query video title against music related services via search engines
    ‣ Google/Yahoo! with site parameter www.last.fm/music


TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                        21
Retrieving and preserving
a video’s metadata

‣ Active preservation
     attempt once a video copy is available
     ‣ Parse HTML out for structured music-related metadata
       ‣ YouTube generated meta data
       ‣ AmazonMP3 affiliate link
       ‣ search engines with free-form video title against music-related websites

‣ Preserving metadata into the public web infrastructure




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                22
Preservation Prototype




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          23
Metadata preservation
Example: twitter




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          24
Pointing to a Resolver service

‣ http://ytresolve.cs.odu.edu/r/http://www.youtube.com/watch?
     v=214szPQBUYc/


‣ Author-side approach
  ‣ content creator points directly to a resolver service
‣ Server-side approach
  ‣ Plugin/Renderer class automatically rewrites
         YouTube video watch URIs
         to resolver service
‣    Client-side approach
     ‣ Web-Browser plugin intercepts click on
            Youtube video watch URIs
            and redirects to resolver service




TPDL 2011       Music Video Redundancy and Half-Life in YouTube
9/26/11         Matthias Prellwitz and Michael L. Nelson          25
YouTube Resolver service

      http://www.youtube.com/watch?v=214szPQBUYc
      http://www.youtube.com/v/214szPQBUYc
      h
      http://www.youtu.be/214szPQBUYc
      http://www.youtube.com/user/WEASELxLOVER#p/a/u/2/214szPQBUYc



                                                                  HTTP/1.1 200 OK
       HTTP/1.1 404 Not Found                                     HTTP/1.1 303 See Others *
                                                        HTTP        redirect
                                                       Status
                                                        Code
search for preserved metadata
‣ in list of designated accounts

exact best available granularity

  query YouTube API with those                           *)
                                                         http://www.youtube.com/verify_controversy...
      Provided (and evaluate)                            http://www.youtube.com/verify_age...
                                                         https://www.google.com/accounts/ServiceLogin...
         alternative copies
                                                         http://www.youtube.com/das_captcha..


TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                 26
Future Work

‣ Evaluation of preservation and retrieval quality of chosen services
  ‣ exchange services
‣ additional automation of preservation process
  ‣ once YT URI was passed for resolving
‣ Evaluation of retrieved available copies
  ‣ redirect to best copy instead of returning a list to choose
‣ Consider international requesters
  ‣ taking requester’s location (country) into account




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                    27
Summary

     ‣ Pointing to a specific YouTube video copy by its URI has a risk of disappearance
       ‣ alternative copies over time available
       ‣ YouTube URIs unlikely to be cached once gone
       ‣ YouTube metadata only reliable for available URIs
          ‣ active preservation attempt

     ‣ Introducing a level of indirection: Resolver service
       ‣ check URI status and location header
       ‣ search the public web for injected metadata
          ‣ query for alternative copies




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                      28

More Related Content

Viewers also liked

(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
Michael Nelson
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
Michael Nelson
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
Michael Nelson
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
Michael Nelson
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 

Viewers also liked (8)

(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
Michael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
Michael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
Michael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
Michael Nelson
 

More from Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Music Video Redundancy and Half-Life in YouTube

  • 1. Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson matthias.prellwitz@googlemail.com mln@cs.odu.edu TPDL 2011 Berlin, Germany 9/26/11
  • 2. Linking to a particular copy “Rolling Stones - Satisfaction” TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 2
  • 3. Metadata lost when YouTube video disappears video title The Rolling Stones – Satisfaction url http://www.youtube.com/watch?v=214szPQBUYc TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 3
  • 4. Metadata hard to recover from Search Engines TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 4
  • 5. But nearly 300 copies remain in YouTube TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 5
  • 6. Linking music-related URIs ‣ Transparent URI semantics ‣ http://www.last.fm/music/John+Lennon/_/Imagine ‣ http://www.ilike.com/artist/The+Cribs/track/I%27m+A+Realist ‣ http://www.last.fm/music/Johnny+Cash/_/Highwayman ‣ Opaque URI semantics ‣ http://vids.myspace.com/index.cfm?fuseaction=vids.individual&videoid=5168491 ‣ http://www.youtube.com/watch?v=VST2KKIYn50 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 6
  • 7. Popular Music US T 40 Singles Charts of 9/25/10 op TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 7
  • 8. Popular Music Selected Music Blogs TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 8
  • 9. Popular Music The 500 Greatest Songs of all Time TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 9
  • 10. Total Result Size Range US T 40 Singles Charts of 9/25/10 op 123,239 Lady Gaga 83,298 Alejandro 43,945 66 Selena Gomez & The 35 Scene 26 A Year Without Rain TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 10
  • 11. Total Result Size Range Selected Music Blogs 264,753 Lady Gaga 256,205 Bad Romance 232,936 Mariah Carey featuring 0 Juelz Santana & Bone 0 Thugs-n-Harmony 0 Don't Forget About Us TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 11
  • 12. Total Result Size Range The 500 Greatest Songs of all Time 174,088 Michael Jackson 162,937 Billie Jean 145,076 0 The Isley Brothers 0 That Lady (Part 1 and 2) 0 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 12
  • 13. URI Unavailability Rooted from a selected collection TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 13
  • 14. URI Unavailability Expected Half-life TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 14
  • 15. URI Publication and Removal Rate TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 15
  • 16. Lifetimes of unavailable videos Years Month TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 16
  • 17. Reasons for no unavailable videos TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 17
  • 18. When a YouTube video disappears ‣ video title The Rolling Stones - Satisfaction ‣ url http://www.youtube.com/watch?v=214szPQBUYc ‣ Published 2009-06-13 13:44 Removed 2010-04-09 (300 days online) HTTP/1.1 404 Not FoundContent-Type: text/html; charset=utf-8 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 18
  • 19. Metadata purged from YouTube Databases ‣ Video feed curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc" HTTP/1.1 404 Not Found Content-Type: text/html; charset=UTF-8 Private video ‣ Related videos curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/related" HTTP/1.1 404 Not Found Content-Type: text/html; charset=UTF-8 Parent Video not found ‣ Video comments curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/comments" HTTP/1.1 200 OK Content-Type: application/atom+xml; charset=UTF-8 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 19
  • 20. Metadata Normalization Dereferencing ASIN via amazon.com Webservice: Artist: Michael Jackson Title: Billie Jean (Single Version) TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 20
  • 21. Availability of music-related metadata ‣ parsed out only at the first time a URI showed up in the result list for the first time ‣ YouTube crawling restrictions ‣ Remaining portion ‣ query video title against music related services via search engines ‣ Google/Yahoo! with site parameter www.last.fm/music TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 21
  • 22. Retrieving and preserving a video’s metadata ‣ Active preservation attempt once a video copy is available ‣ Parse HTML out for structured music-related metadata ‣ YouTube generated meta data ‣ AmazonMP3 affiliate link ‣ search engines with free-form video title against music-related websites ‣ Preserving metadata into the public web infrastructure TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 22
  • 23. Preservation Prototype TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 23
  • 24. Metadata preservation Example: twitter TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 24
  • 25. Pointing to a Resolver service ‣ http://ytresolve.cs.odu.edu/r/http://www.youtube.com/watch? v=214szPQBUYc/ ‣ Author-side approach ‣ content creator points directly to a resolver service ‣ Server-side approach ‣ Plugin/Renderer class automatically rewrites YouTube video watch URIs to resolver service ‣ Client-side approach ‣ Web-Browser plugin intercepts click on Youtube video watch URIs and redirects to resolver service TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 25
  • 26. YouTube Resolver service http://www.youtube.com/watch?v=214szPQBUYc http://www.youtube.com/v/214szPQBUYc h http://www.youtu.be/214szPQBUYc http://www.youtube.com/user/WEASELxLOVER#p/a/u/2/214szPQBUYc HTTP/1.1 200 OK HTTP/1.1 404 Not Found HTTP/1.1 303 See Others * HTTP redirect Status Code search for preserved metadata ‣ in list of designated accounts exact best available granularity query YouTube API with those *) http://www.youtube.com/verify_controversy... Provided (and evaluate) http://www.youtube.com/verify_age... https://www.google.com/accounts/ServiceLogin... alternative copies http://www.youtube.com/das_captcha.. TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 26
  • 27. Future Work ‣ Evaluation of preservation and retrieval quality of chosen services ‣ exchange services ‣ additional automation of preservation process ‣ once YT URI was passed for resolving ‣ Evaluation of retrieved available copies ‣ redirect to best copy instead of returning a list to choose ‣ Consider international requesters ‣ taking requester’s location (country) into account TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 27
  • 28. Summary ‣ Pointing to a specific YouTube video copy by its URI has a risk of disappearance ‣ alternative copies over time available ‣ YouTube URIs unlikely to be cached once gone ‣ YouTube metadata only reliable for available URIs ‣ active preservation attempt ‣ Introducing a level of indirection: Resolver service ‣ check URI status and location header ‣ search the public web for injected metadata ‣ query for alternative copies TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 28