SlideShare a Scribd company logo
1 of 41
Supporting crowd-sourced listening 
experiences with Web Data 
technologies 
Mathieu d’Aquin, Alessandro Adamou 
Knowledge Media Institute, The Open University 
{mathieu.daquin|alessandro.adamou}@open.ac.uk 
@mdaquin | @anticitizen79
Crowdsourcing in databases 
Aggregating data by soliciting contributions from a 
community. 
examples: 
• Discogs, Setlist.fm, Encyclopedia Metallum 
• Zooniverse (SETIlive, Old Weather etc.) 
• Wikipedia (disputed?) 
• UK Reading Experience Database 
• Historic Cambridge Newspaper Collection
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors.
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors. 
• Risk of recreating the same data multiple 
times: no benefit from prior contributions.
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors. 
• Risk of recreating the same data multiple 
times: no benefit from prior contributions. 
A non-empty initial database can mitigate these 
issues.
Naïve aggregation – same user 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
performer-birthdate: 22 Nov. 1913 
performer-birthplace: Lowestoft 
performer-birth_country: UK 
performer-deathdate: 4 Dec. 1976 
performer-deathplace: Aldeburgh 
performer-birth_country: UK 
performer-occupation: Musician 
composer: Maurice Ravel 
performer-birthdate: …. 
…………. 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
listener-birthplace: … 
listener-birth_country: … 
… 
… 
Not again! I’ll just write “Benjamin Britten”
Reuse 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
composer: Maurice Ravel 
Benjamin Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
birth_country: UK 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
death_country: UK 
occupation: Musician 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
“Benjamin Britten” is still not shared 
across the userbase.
Still, for two users: 
• Two “Benjamin Britten”s 
• Different degrees of detail (possibly 
discordant!) 
• Two different sets of 
performances/experiences for each Britten.
Reuse, cross-user 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
composer: Maurice Ravel 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
Benjamin Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
birth_country: UK 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
death_country: UK 
occupation: Musician 
User A User B
…but does any modern 
database really start up 
empty today?
Reuse, cross-user, cross-database 
Wikipedia 
MusicBrainz 
Geonames 
Benjamin 
Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
occupation: Musician 
“La Valse” 
composer: Maurice Ravel 
part of: Catalogue Marcel 
Marnat des oeuvres del MR 
Lowestoft 
region: Suffolk 
country: UK 
Queen’s Hall 
location: London 
London 
country: UK 
User A 
(performance of) “La Valse” 
location: London 
date: 30 Oct. 1930 
performer: Benjamin Britten 
User B 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
date: 23 Sept. 1930 
listener: Benjamin Britten
Almost every database can be bootstrapped out 
of a rich, human-readable and machine-readable 
data source: the Web [of data]. 
Because the Web isn’t just pages anymore.
Leverage the existing 
(resolved) ambiguities of 
data on the Web.
http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik 
 http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) 
http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik 
 http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) 
These URIs already denote 
unambiguous entities. 
So let’s use them. 
http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
Enter Linked Data
• URIs identify things, not pages 
– They may also produce Web pages 
• Data are encoded in terms of relations 
between the things identified by the URIs 
<http://...Eine_Kleine_Nachtmusik_(album)> 
<http://...performer> <http://...Venom_(band)> ; 
<http://…track> <http://... 4b42269f4510> . 
<http://... 4b42269f4510> <http://…title> “Countess Bathory” . 
• There are standards for making these relations 
machine-readable 
– data representation paradigm: RDF 
– query language: SPARQL
Most of all, reuse URIs! 
If a URI that identifies the song “Countess Bathory” is 
http://musicbrainz.org/recording/c3a0be45-d8c3-4e16- 
b44c-4b42269f4510, that does not make MusicBrainz 
the only authority with the right to provide and store 
data about it using this name.
LED integrates Linked Data 
reuse from external sources 
in the whole data lifecycle.
When entering data 
Searches across items submitted by LED users 
Falls back to Linked Data sources
When finding data 
Search categories (facets) 
are not hardcoded in the 
system. 
Facet values integrate LED 
with external data sources.
Data reconciliation for 
redundancy containment
“We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine 
instrument of Merlin's construction; he plays with great neatness and delicacy; but as 
expression must have meaning, he does not abound in that commodity. […]” 
- Diary of Frances Burney, May 1775 ? 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones Mr. Merlin 
? 
Miss Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Miss Burney) 
• keyboard music (comp. Charles Burney, Echard, Schobert; perf. Charles Burney, Miss Burney…) 
• vocal music (perf. Miss Louisa Harris)
“Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual 
successful velocity, to the amazement and delight of all present... [He] played a concerto of 
Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” 
- Letter from Frances Burney to Samuel Crisp, May 1775 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin 
Esther Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) 
• Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) 
• Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Miss Harris) 
• piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) 
• concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
“We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine 
instrument of Merlin's construction; he plays with great neatness and delicacy; but as 
expression must have meaning, he does not abound in that commodity. […]” 
- Diary of Frances Burney, May 1775 
“Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual 
successful velocity, to the amazement and delight of all present... [He] played a concerto of 
Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” 
- Letter from Frances Burney to Samuel Crisp, May 1775 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin 
Esther Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) 
• Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) 
• Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Louisa Harris) 
• piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) 
• concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
Enhancing the Linked Data 
cloud
Motivation 
• Consolidate existing Web Data by integrating 
new or refined information. 
• Assist information retrieval applications by 
providing an additional data node to traverse. 
• Provide factual support to assess the 
truthfulness of stated assertions on the Web.
BNB: British National Bibliography (The British Library) 
DBpedia: structured data from Wikipedia infoboxes 
LinkedBrainz: MusicBrainz as linked data 
VIAF: Virtual International Authority File 
DBpedia 
BNB 
VIAF 
Geo-names 
Linked 
Brainz
DBpedia 
BNB 
Linked 
Brainz 
Geo- VIAF 
names
LED data are re-published 
as a Linked Open Data set 
• Hosted at http://data.open.ac.uk 
• SPARQL query service at 
http://data.open.ac.uk/query 
• Documentation at 
http://led.kmi.open.ac.uk/linkeddata
What does LED contribute to 
the LD cloud?
New data 
Historical music performances 
Royal Carl Rosa Company – “Faust” 
for orchestra and voice 
date: 14 May, 1917 
location: Garrick Theatre 
Patron’s Fund - “The Birthday of the Infanta” 
date: 9 July, 1931 
location: London (indoors, private space) 
(you won’t find them on last.fm or setlist.fm)
New data 
Portions and quotes of source documents / manuscripts 
Journeying boy : the diaries of the young Benjamin Britten 
1928-1938 
(provided by the British Library) 
Author: Benjamin Britten 
Editor: John Evans 
Published: Faber, London, 2009 
ISBN: 9780571238835 
… 
(provided by LED) 
Diary entries: 
• Page 17, Feb 14 1929: “Still absent from school work. Everso much more snow […]” 
• Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]” 
• Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear Gramophone 
records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov. 1. (Rubenstein) Tchaik.” 
• …
Refinements of existing data 
Mary Somerville 
(Provided by DBpedia) 
Born: 1780-12-26 in Jedburgh 
Died: 1872-11-28 in Naples 
Field: Polymath, Science journalism 
VIAF ID: 27288356 
… 
(integrated by LED) 
Full name: Mary Fairfax Greig Somerville 
Social group: Rulers, chiefs, aristocracy & gentry etc. 
Occupation: Scientist 
Religion: Christian, Protestant 
wrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
Alignments 
dbpedia:Aaron_Copland 
dbpedia:Jane_Austen 
≡ 
≡ 
bnb:CoplandAaron1900-1990 
bnb:AustenJane1775-1817 
These semantic links are not found on the LD cloud. 
By exposing them, we assist Semantic Web applications in the 
retrieval of relevant information from multiple data sources.
Figures on reuse 
Computed on 1102 distinct listening experiences 
Type Unique instances Total reuse Peak 
People 626 1687 184 
Written works 902 990 46 
Geographical locations 492 266 70 
Musical items (songs, albums, performances) 2400 337 22 
Musical genres 78 644 219 
from external data sources 
Source Reused distinct instances 
DBpedia 823 
BNB 339 
data.gov.uk 816 
MusicBrainz SOON
Questions? 
Mathieu d’Aquin, Alessandro Adamou 
Knowledge Media Institute, The Open University 
{mathieu.daquin|alessandro.adamou}@open.ac.uk 
@mdaquin | @anticitizen79

More Related Content

Similar to Supporting crowd-sourced listening experiences with Web Data technologies

Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of MusicDigital History
 
IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014Karen Du Toit
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!Stella Wisdom
 
McNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imageryMcNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imagerymhall1745
 
The Europeana Music Collections
The Europeana Music CollectionsThe Europeana Music Collections
The Europeana Music CollectionsDavid Haskiya
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channelevaminerva
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channelevaminerva
 
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)Nick Patterson
 
Evidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over AlgorithmEvidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over AlgorithmRick Prelinger
 
1 history sound_design
1 history sound_design1 history sound_design
1 history sound_designGints Rutks
 
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nlThe JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nlJazz Trio JazzTraffic
 
Artistic Practice and The Archive
Artistic Practice and The ArchiveArtistic Practice and The Archive
Artistic Practice and The ArchiveAndrew Prescott
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataDavid Newbury
 
JazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' MusicalsJazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' MusicalsJazz Trio JazzTraffic
 
Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology Eleanor Selfridge-Field
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutAdrian Stevenson
 

Similar to Supporting crowd-sourced listening experiences with Web Data technologies (20)

Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of Music
 
IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!
 
McNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imageryMcNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imagery
 
The Europeana Music Collections
The Europeana Music CollectionsThe Europeana Music Collections
The Europeana Music Collections
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channel
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channel
 
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
 
Evidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over AlgorithmEvidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over Algorithm
 
1 history sound_design
1 history sound_design1 history sound_design
1 history sound_design
 
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nlThe JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
 
Artistic Practice and The Archive
Artistic Practice and The ArchiveArtistic Practice and The Archive
Artistic Practice and The Archive
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured Data
 
JazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' MusicalsJazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' Musicals
 
PGS AGM 2023
PGS AGM 2023PGS AGM 2023
PGS AGM 2023
 
Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology
 
მასწ ინგლისურის ტესტი 2012
მასწ ინგლისურის ტესტი 2012მასწ ინგლისურის ტესტი 2012
მასწ ინგლისურის ტესტი 2012
 
Nati
NatiNati
Nati
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
 
Coffee and a byte
Coffee and a byteCoffee and a byte
Coffee and a byte
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Supporting crowd-sourced listening experiences with Web Data technologies

  • 1. Supporting crowd-sourced listening experiences with Web Data technologies Mathieu d’Aquin, Alessandro Adamou Knowledge Media Institute, The Open University {mathieu.daquin|alessandro.adamou}@open.ac.uk @mdaquin | @anticitizen79
  • 2. Crowdsourcing in databases Aggregating data by soliciting contributions from a community. examples: • Discogs, Setlist.fm, Encyclopedia Metallum • Zooniverse (SETIlive, Old Weather etc.) • Wikipedia (disputed?) • UK Reading Experience Database • Historic Cambridge Newspaper Collection
  • 3.
  • 4. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors.
  • 5. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors. • Risk of recreating the same data multiple times: no benefit from prior contributions.
  • 6. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors. • Risk of recreating the same data multiple times: no benefit from prior contributions. A non-empty initial database can mitigate these issues.
  • 7.
  • 8. Naïve aggregation – same user “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten performer-birthdate: 22 Nov. 1913 performer-birthplace: Lowestoft performer-birth_country: UK performer-deathdate: 4 Dec. 1976 performer-deathplace: Aldeburgh performer-birth_country: UK performer-occupation: Musician composer: Maurice Ravel performer-birthdate: …. …………. (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten listener-birthplace: … listener-birth_country: … … … Not again! I’ll just write “Benjamin Britten”
  • 9. Reuse “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten composer: Maurice Ravel Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft birth_country: UK deathdate: 4 Dec. 1976 deathplace: Aldeburgh death_country: UK occupation: Musician (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten “Benjamin Britten” is still not shared across the userbase.
  • 10. Still, for two users: • Two “Benjamin Britten”s • Different degrees of detail (possibly discordant!) • Two different sets of performances/experiences for each Britten.
  • 11. Reuse, cross-user “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten composer: Maurice Ravel (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft birth_country: UK deathdate: 4 Dec. 1976 deathplace: Aldeburgh death_country: UK occupation: Musician User A User B
  • 12. …but does any modern database really start up empty today?
  • 13. Reuse, cross-user, cross-database Wikipedia MusicBrainz Geonames Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft deathdate: 4 Dec. 1976 deathplace: Aldeburgh occupation: Musician “La Valse” composer: Maurice Ravel part of: Catalogue Marcel Marnat des oeuvres del MR Lowestoft region: Suffolk country: UK Queen’s Hall location: London London country: UK User A (performance of) “La Valse” location: London date: 30 Oct. 1930 performer: Benjamin Britten User B (event attended by Benjamin Britten) location: Queen’s Hall date: 23 Sept. 1930 listener: Benjamin Britten
  • 14. Almost every database can be bootstrapped out of a rich, human-readable and machine-readable data source: the Web [of data]. Because the Web isn’t just pages anymore.
  • 15.
  • 16. Leverage the existing (resolved) ambiguities of data on the Web.
  • 17. http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik  http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
  • 18. http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik  http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) These URIs already denote unambiguous entities. So let’s use them. http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
  • 20. • URIs identify things, not pages – They may also produce Web pages • Data are encoded in terms of relations between the things identified by the URIs <http://...Eine_Kleine_Nachtmusik_(album)> <http://...performer> <http://...Venom_(band)> ; <http://…track> <http://... 4b42269f4510> . <http://... 4b42269f4510> <http://…title> “Countess Bathory” . • There are standards for making these relations machine-readable – data representation paradigm: RDF – query language: SPARQL
  • 21. Most of all, reuse URIs! If a URI that identifies the song “Countess Bathory” is http://musicbrainz.org/recording/c3a0be45-d8c3-4e16- b44c-4b42269f4510, that does not make MusicBrainz the only authority with the right to provide and store data about it using this name.
  • 22. LED integrates Linked Data reuse from external sources in the whole data lifecycle.
  • 23. When entering data Searches across items submitted by LED users Falls back to Linked Data sources
  • 24. When finding data Search categories (facets) are not hardcoded in the system. Facet values integrate LED with external data sources.
  • 25. Data reconciliation for redundancy containment
  • 26. “We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine instrument of Merlin's construction; he plays with great neatness and delicacy; but as expression must have meaning, he does not abound in that commodity. […]” - Diary of Frances Burney, May 1775 ? Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones Mr. Merlin ? Miss Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Miss Burney) • keyboard music (comp. Charles Burney, Echard, Schobert; perf. Charles Burney, Miss Burney…) • vocal music (perf. Miss Louisa Harris)
  • 27. “Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual successful velocity, to the amazement and delight of all present... [He] played a concerto of Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” - Letter from Frances Burney to Samuel Crisp, May 1775 Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin Esther Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) • Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) • Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Miss Harris) • piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) • concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
  • 28. “We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine instrument of Merlin's construction; he plays with great neatness and delicacy; but as expression must have meaning, he does not abound in that commodity. […]” - Diary of Frances Burney, May 1775 “Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual successful velocity, to the amazement and delight of all present... [He] played a concerto of Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” - Letter from Frances Burney to Samuel Crisp, May 1775 Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin Esther Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) • Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) • Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Louisa Harris) • piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) • concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
  • 29. Enhancing the Linked Data cloud
  • 30.
  • 31. Motivation • Consolidate existing Web Data by integrating new or refined information. • Assist information retrieval applications by providing an additional data node to traverse. • Provide factual support to assess the truthfulness of stated assertions on the Web.
  • 32. BNB: British National Bibliography (The British Library) DBpedia: structured data from Wikipedia infoboxes LinkedBrainz: MusicBrainz as linked data VIAF: Virtual International Authority File DBpedia BNB VIAF Geo-names Linked Brainz
  • 33. DBpedia BNB Linked Brainz Geo- VIAF names
  • 34. LED data are re-published as a Linked Open Data set • Hosted at http://data.open.ac.uk • SPARQL query service at http://data.open.ac.uk/query • Documentation at http://led.kmi.open.ac.uk/linkeddata
  • 35. What does LED contribute to the LD cloud?
  • 36. New data Historical music performances Royal Carl Rosa Company – “Faust” for orchestra and voice date: 14 May, 1917 location: Garrick Theatre Patron’s Fund - “The Birthday of the Infanta” date: 9 July, 1931 location: London (indoors, private space) (you won’t find them on last.fm or setlist.fm)
  • 37. New data Portions and quotes of source documents / manuscripts Journeying boy : the diaries of the young Benjamin Britten 1928-1938 (provided by the British Library) Author: Benjamin Britten Editor: John Evans Published: Faber, London, 2009 ISBN: 9780571238835 … (provided by LED) Diary entries: • Page 17, Feb 14 1929: “Still absent from school work. Everso much more snow […]” • Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]” • Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear Gramophone records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov. 1. (Rubenstein) Tchaik.” • …
  • 38. Refinements of existing data Mary Somerville (Provided by DBpedia) Born: 1780-12-26 in Jedburgh Died: 1872-11-28 in Naples Field: Polymath, Science journalism VIAF ID: 27288356 … (integrated by LED) Full name: Mary Fairfax Greig Somerville Social group: Rulers, chiefs, aristocracy & gentry etc. Occupation: Scientist Religion: Christian, Protestant wrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
  • 39. Alignments dbpedia:Aaron_Copland dbpedia:Jane_Austen ≡ ≡ bnb:CoplandAaron1900-1990 bnb:AustenJane1775-1817 These semantic links are not found on the LD cloud. By exposing them, we assist Semantic Web applications in the retrieval of relevant information from multiple data sources.
  • 40. Figures on reuse Computed on 1102 distinct listening experiences Type Unique instances Total reuse Peak People 626 1687 184 Written works 902 990 46 Geographical locations 492 266 70 Musical items (songs, albums, performances) 2400 337 22 Musical genres 78 644 219 from external data sources Source Reused distinct instances DBpedia 823 BNB 339 data.gov.uk 816 MusicBrainz SOON
  • 41. Questions? Mathieu d’Aquin, Alessandro Adamou Knowledge Media Institute, The Open University {mathieu.daquin|alessandro.adamou}@open.ac.uk @mdaquin | @anticitizen79

Editor's Notes

  1. On behalf of Mathieu’ sitting over there on how we combined cutting-edge data management with the practices of crowdsourcing
  2. Having a non-empty database to begin with can help a lot. Though this seems a contradiction, it is only an apparent one.
  3. We started off with a set of curated entries, so crowdsourcing It is not the only way LED aggregates content, but it is the trickiest
  4. Wouldn’t it be great if we just store the essential information that is unique to our knowledge and rely upon authoritative data sources for the rest? This would be the machine equivalent of when I tell someone “Hey, I just went to see a Ravel concert” “Who’s Ravel?” and I would quite practically, if impolitely, answer “Ah come on, look it up on Wikipedia!”
  5. What seems to be the catch with it? It would appear that of I wanted to put the data on those websites to good use, I would have to either hack my way through to the underlying databases, or program my software systems to read the Web pages, with all the unpredictability that comes with it.
  6. The never-abused enough Mozart example, but with a twist. Google did not start by publishing these structured data firsthand, but rather compiling them out of data sources from the Web, the same it indexes for us to search.
  7. This kind of alignment can be done on-the-fly, not so much because we are reusing Web Data, but because we adopt the same paradigm as Linked Data within the LED system, where each entity is a node and we just move references around.
  8. For the record, the Linked Data cloud is something that looks like this today, with every circle being a data provider
  9. What does this mean? That we have saved users the burden of rewriting information about the same people like seventeen hundred times overall, and up to 184 times for one person, whom I think it’s Britten