SlideShare a Scribd company logo
1 of 41
Supporting crowd-sourced listening 
experiences with Web Data 
technologies 
Mathieu d’Aquin, Alessandro Adamou 
Knowledge Media Institute, The Open University 
{mathieu.daquin|alessandro.adamou}@open.ac.uk 
@mdaquin | @anticitizen79
Crowdsourcing in databases 
Aggregating data by soliciting contributions from a 
community. 
examples: 
• Discogs, Setlist.fm, Encyclopedia Metallum 
• Zooniverse (SETIlive, Old Weather etc.) 
• Wikipedia (disputed?) 
• UK Reading Experience Database 
• Historic Cambridge Newspaper Collection
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors.
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors. 
• Risk of recreating the same data multiple 
times: no benefit from prior contributions.
Bootstrapping a crowdsourced 
database 
• Earliest contributors implicitly dictate de facto 
quality standards that will be followed by their 
successors. 
• Risk of recreating the same data multiple 
times: no benefit from prior contributions. 
A non-empty initial database can mitigate these 
issues.
Naïve aggregation – same user 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
performer-birthdate: 22 Nov. 1913 
performer-birthplace: Lowestoft 
performer-birth_country: UK 
performer-deathdate: 4 Dec. 1976 
performer-deathplace: Aldeburgh 
performer-birth_country: UK 
performer-occupation: Musician 
composer: Maurice Ravel 
performer-birthdate: …. 
…………. 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
listener-birthplace: … 
listener-birth_country: … 
… 
… 
Not again! I’ll just write “Benjamin Britten”
Reuse 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
composer: Maurice Ravel 
Benjamin Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
birth_country: UK 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
death_country: UK 
occupation: Musician 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
“Benjamin Britten” is still not shared 
across the userbase.
Still, for two users: 
• Two “Benjamin Britten”s 
• Different degrees of detail (possibly 
discordant!) 
• Two different sets of 
performances/experiences for each Britten.
Reuse, cross-user 
“La Valse” 
location: London 
location-country: UK 
date: 30 Oct. 1930 
performer: Benjamin Britten 
composer: Maurice Ravel 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
location-country: UK 
date: 23 Sept. 1930 
listener: Benjamin Britten 
Benjamin Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
birth_country: UK 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
death_country: UK 
occupation: Musician 
User A User B
…but does any modern 
database really start up 
empty today?
Reuse, cross-user, cross-database 
Wikipedia 
MusicBrainz 
Geonames 
Benjamin 
Britten 
birthdate: 22 Nov. 1913 
birthplace: Lowestoft 
deathdate: 4 Dec. 1976 
deathplace: Aldeburgh 
occupation: Musician 
“La Valse” 
composer: Maurice Ravel 
part of: Catalogue Marcel 
Marnat des oeuvres del MR 
Lowestoft 
region: Suffolk 
country: UK 
Queen’s Hall 
location: London 
London 
country: UK 
User A 
(performance of) “La Valse” 
location: London 
date: 30 Oct. 1930 
performer: Benjamin Britten 
User B 
(event attended by Benjamin 
Britten) 
location: Queen’s Hall 
date: 23 Sept. 1930 
listener: Benjamin Britten
Almost every database can be bootstrapped out 
of a rich, human-readable and machine-readable 
data source: the Web [of data]. 
Because the Web isn’t just pages anymore.
Leverage the existing 
(resolved) ambiguities of 
data on the Web.
http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik 
 http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) 
http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik 
 http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) 
These URIs already denote 
unambiguous entities. 
So let’s use them. 
http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
Enter Linked Data
• URIs identify things, not pages 
– They may also produce Web pages 
• Data are encoded in terms of relations 
between the things identified by the URIs 
<http://...Eine_Kleine_Nachtmusik_(album)> 
<http://...performer> <http://...Venom_(band)> ; 
<http://…track> <http://... 4b42269f4510> . 
<http://... 4b42269f4510> <http://…title> “Countess Bathory” . 
• There are standards for making these relations 
machine-readable 
– data representation paradigm: RDF 
– query language: SPARQL
Most of all, reuse URIs! 
If a URI that identifies the song “Countess Bathory” is 
http://musicbrainz.org/recording/c3a0be45-d8c3-4e16- 
b44c-4b42269f4510, that does not make MusicBrainz 
the only authority with the right to provide and store 
data about it using this name.
LED integrates Linked Data 
reuse from external sources 
in the whole data lifecycle.
When entering data 
Searches across items submitted by LED users 
Falls back to Linked Data sources
When finding data 
Search categories (facets) 
are not hardcoded in the 
system. 
Facet values integrate LED 
with external data sources.
Data reconciliation for 
redundancy containment
“We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine 
instrument of Merlin's construction; he plays with great neatness and delicacy; but as 
expression must have meaning, he does not abound in that commodity. […]” 
- Diary of Frances Burney, May 1775 ? 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones Mr. Merlin 
? 
Miss Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Miss Burney) 
• keyboard music (comp. Charles Burney, Echard, Schobert; perf. Charles Burney, Miss Burney…) 
• vocal music (perf. Miss Louisa Harris)
“Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual 
successful velocity, to the amazement and delight of all present... [He] played a concerto of 
Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” 
- Letter from Frances Burney to Samuel Crisp, May 1775 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin 
Esther Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) 
• Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) 
• Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Miss Harris) 
• piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) 
• concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
“We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine 
instrument of Merlin's construction; he plays with great neatness and delicacy; but as 
expression must have meaning, he does not abound in that commodity. […]” 
- Diary of Frances Burney, May 1775 
“Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual 
successful velocity, to the amazement and delight of all present... [He] played a concerto of 
Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” 
- Letter from Frances Burney to Samuel Crisp, May 1775 
Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin 
Esther Burney 
performances: 
• harp music (perf. Edward Jones) 
• harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) 
• Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) 
• Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Louisa Harris) 
• piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) 
• concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
Enhancing the Linked Data 
cloud
Motivation 
• Consolidate existing Web Data by integrating 
new or refined information. 
• Assist information retrieval applications by 
providing an additional data node to traverse. 
• Provide factual support to assess the 
truthfulness of stated assertions on the Web.
BNB: British National Bibliography (The British Library) 
DBpedia: structured data from Wikipedia infoboxes 
LinkedBrainz: MusicBrainz as linked data 
VIAF: Virtual International Authority File 
DBpedia 
BNB 
VIAF 
Geo-names 
Linked 
Brainz
DBpedia 
BNB 
Linked 
Brainz 
Geo- VIAF 
names
LED data are re-published 
as a Linked Open Data set 
• Hosted at http://data.open.ac.uk 
• SPARQL query service at 
http://data.open.ac.uk/query 
• Documentation at 
http://led.kmi.open.ac.uk/linkeddata
What does LED contribute to 
the LD cloud?
New data 
Historical music performances 
Royal Carl Rosa Company – “Faust” 
for orchestra and voice 
date: 14 May, 1917 
location: Garrick Theatre 
Patron’s Fund - “The Birthday of the Infanta” 
date: 9 July, 1931 
location: London (indoors, private space) 
(you won’t find them on last.fm or setlist.fm)
New data 
Portions and quotes of source documents / manuscripts 
Journeying boy : the diaries of the young Benjamin Britten 
1928-1938 
(provided by the British Library) 
Author: Benjamin Britten 
Editor: John Evans 
Published: Faber, London, 2009 
ISBN: 9780571238835 
… 
(provided by LED) 
Diary entries: 
• Page 17, Feb 14 1929: “Still absent from school work. Everso much more snow […]” 
• Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]” 
• Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear Gramophone 
records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov. 1. (Rubenstein) Tchaik.” 
• …
Refinements of existing data 
Mary Somerville 
(Provided by DBpedia) 
Born: 1780-12-26 in Jedburgh 
Died: 1872-11-28 in Naples 
Field: Polymath, Science journalism 
VIAF ID: 27288356 
… 
(integrated by LED) 
Full name: Mary Fairfax Greig Somerville 
Social group: Rulers, chiefs, aristocracy & gentry etc. 
Occupation: Scientist 
Religion: Christian, Protestant 
wrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
Alignments 
dbpedia:Aaron_Copland 
dbpedia:Jane_Austen 
≡ 
≡ 
bnb:CoplandAaron1900-1990 
bnb:AustenJane1775-1817 
These semantic links are not found on the LD cloud. 
By exposing them, we assist Semantic Web applications in the 
retrieval of relevant information from multiple data sources.
Figures on reuse 
Computed on 1102 distinct listening experiences 
Type Unique instances Total reuse Peak 
People 626 1687 184 
Written works 902 990 46 
Geographical locations 492 266 70 
Musical items (songs, albums, performances) 2400 337 22 
Musical genres 78 644 219 
from external data sources 
Source Reused distinct instances 
DBpedia 823 
BNB 339 
data.gov.uk 816 
MusicBrainz SOON
Questions? 
Mathieu d’Aquin, Alessandro Adamou 
Knowledge Media Institute, The Open University 
{mathieu.daquin|alessandro.adamou}@open.ac.uk 
@mdaquin | @anticitizen79

More Related Content

Similar to Supporting crowd-sourced listening experiences with Web Data technologies

1 history sound_design
1 history sound_design1 history sound_design
1 history sound_design
Gints Rutks
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
Adrian Stevenson
 

Similar to Supporting crowd-sourced listening experiences with Web Data technologies (20)

Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of Music
 
IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014IASA 2014 Conference - Cape Town, South Africa #iasa2014
IASA 2014 Conference - Cape Town, South Africa #iasa2014
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!
 
McNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imageryMcNair Powerpoint for Shakespeare's use of sexual imagery
McNair Powerpoint for Shakespeare's use of sexual imagery
 
The Europeana Music Collections
The Europeana Music CollectionsThe Europeana Music Collections
The Europeana Music Collections
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channel
 
H03 david haskiya_music_channel
H03 david haskiya_music_channelH03 david haskiya_music_channel
H03 david haskiya_music_channel
 
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
Archives of the Columbia-Princeton Electronic Music Center (@ Pratt)
 
Evidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over AlgorithmEvidence Over Story: Assembly Over Algorithm
Evidence Over Story: Assembly Over Algorithm
 
1 history sound_design
1 history sound_design1 history sound_design
1 history sound_design
 
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nlThe JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
The JazzTraffic Trio present: Swingin' Musicals. wwwjazztraffic.nl
 
Artistic Practice and The Archive
Artistic Practice and The ArchiveArtistic Practice and The Archive
Artistic Practice and The Archive
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured Data
 
JazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' MusicalsJazzTrio JazzTraffic presents: Swingin' Musicals
JazzTrio JazzTraffic presents: Swingin' Musicals
 
PGS AGM 2023
PGS AGM 2023PGS AGM 2023
PGS AGM 2023
 
Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology Challenges and Opportunities in Digital Musicology
Challenges and Opportunities in Digital Musicology
 
მასწ ინგლისურის ტესტი 2012
მასწ ინგლისურის ტესტი 2012მასწ ინგლისურის ტესტი 2012
მასწ ინგლისურის ტესტი 2012
 
Nati
NatiNati
Nati
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
 
Coffee and a byte
Coffee and a byteCoffee and a byte
Coffee and a byte
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Supporting crowd-sourced listening experiences with Web Data technologies

  • 1. Supporting crowd-sourced listening experiences with Web Data technologies Mathieu d’Aquin, Alessandro Adamou Knowledge Media Institute, The Open University {mathieu.daquin|alessandro.adamou}@open.ac.uk @mdaquin | @anticitizen79
  • 2. Crowdsourcing in databases Aggregating data by soliciting contributions from a community. examples: • Discogs, Setlist.fm, Encyclopedia Metallum • Zooniverse (SETIlive, Old Weather etc.) • Wikipedia (disputed?) • UK Reading Experience Database • Historic Cambridge Newspaper Collection
  • 3.
  • 4. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors.
  • 5. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors. • Risk of recreating the same data multiple times: no benefit from prior contributions.
  • 6. Bootstrapping a crowdsourced database • Earliest contributors implicitly dictate de facto quality standards that will be followed by their successors. • Risk of recreating the same data multiple times: no benefit from prior contributions. A non-empty initial database can mitigate these issues.
  • 7.
  • 8. Naïve aggregation – same user “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten performer-birthdate: 22 Nov. 1913 performer-birthplace: Lowestoft performer-birth_country: UK performer-deathdate: 4 Dec. 1976 performer-deathplace: Aldeburgh performer-birth_country: UK performer-occupation: Musician composer: Maurice Ravel performer-birthdate: …. …………. (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten listener-birthplace: … listener-birth_country: … … … Not again! I’ll just write “Benjamin Britten”
  • 9. Reuse “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten composer: Maurice Ravel Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft birth_country: UK deathdate: 4 Dec. 1976 deathplace: Aldeburgh death_country: UK occupation: Musician (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten “Benjamin Britten” is still not shared across the userbase.
  • 10. Still, for two users: • Two “Benjamin Britten”s • Different degrees of detail (possibly discordant!) • Two different sets of performances/experiences for each Britten.
  • 11. Reuse, cross-user “La Valse” location: London location-country: UK date: 30 Oct. 1930 performer: Benjamin Britten composer: Maurice Ravel (event attended by Benjamin Britten) location: Queen’s Hall location-country: UK date: 23 Sept. 1930 listener: Benjamin Britten Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft birth_country: UK deathdate: 4 Dec. 1976 deathplace: Aldeburgh death_country: UK occupation: Musician User A User B
  • 12. …but does any modern database really start up empty today?
  • 13. Reuse, cross-user, cross-database Wikipedia MusicBrainz Geonames Benjamin Britten birthdate: 22 Nov. 1913 birthplace: Lowestoft deathdate: 4 Dec. 1976 deathplace: Aldeburgh occupation: Musician “La Valse” composer: Maurice Ravel part of: Catalogue Marcel Marnat des oeuvres del MR Lowestoft region: Suffolk country: UK Queen’s Hall location: London London country: UK User A (performance of) “La Valse” location: London date: 30 Oct. 1930 performer: Benjamin Britten User B (event attended by Benjamin Britten) location: Queen’s Hall date: 23 Sept. 1930 listener: Benjamin Britten
  • 14. Almost every database can be bootstrapped out of a rich, human-readable and machine-readable data source: the Web [of data]. Because the Web isn’t just pages anymore.
  • 15.
  • 16. Leverage the existing (resolved) ambiguities of data on the Web.
  • 17. http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik  http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
  • 18. http://en.wikipedia.org/wiki/Eine_kleine_Nachtmusik  http:// dbpedia.org/resource/Eine_kleine_Nachtmusik (machine-readable form) These URIs already denote unambiguous entities. So let’s use them. http://musicbrainz.org/release-group/d25f8fee-68ff-38b9-94e7-c1533b0a0f77#_
  • 20. • URIs identify things, not pages – They may also produce Web pages • Data are encoded in terms of relations between the things identified by the URIs <http://...Eine_Kleine_Nachtmusik_(album)> <http://...performer> <http://...Venom_(band)> ; <http://…track> <http://... 4b42269f4510> . <http://... 4b42269f4510> <http://…title> “Countess Bathory” . • There are standards for making these relations machine-readable – data representation paradigm: RDF – query language: SPARQL
  • 21. Most of all, reuse URIs! If a URI that identifies the song “Countess Bathory” is http://musicbrainz.org/recording/c3a0be45-d8c3-4e16- b44c-4b42269f4510, that does not make MusicBrainz the only authority with the right to provide and store data about it using this name.
  • 22. LED integrates Linked Data reuse from external sources in the whole data lifecycle.
  • 23. When entering data Searches across items submitted by LED users Falls back to Linked Data sources
  • 24. When finding data Search categories (facets) are not hardcoded in the system. Facet values integrate LED with external data sources.
  • 25. Data reconciliation for redundancy containment
  • 26. “We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine instrument of Merlin's construction; he plays with great neatness and delicacy; but as expression must have meaning, he does not abound in that commodity. […]” - Diary of Frances Burney, May 1775 ? Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones Mr. Merlin ? Miss Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Miss Burney) • keyboard music (comp. Charles Burney, Echard, Schobert; perf. Charles Burney, Miss Burney…) • vocal music (perf. Miss Louisa Harris)
  • 27. “Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual successful velocity, to the amazement and delight of all present... [He] played a concerto of Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” - Letter from Frances Burney to Samuel Crisp, May 1775 Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin Esther Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) • Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) • Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Miss Harris) • piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) • concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
  • 28. “We have had a charming Concert.... Mr. Jones, the harper, began the Concert. He has a fine instrument of Merlin's construction; he plays with great neatness and delicacy; but as expression must have meaning, he does not abound in that commodity. […]” - Diary of Frances Burney, May 1775 “Our Concert proved to be very much the Thing... ....Mr Burney... Fired away, with his usual successful velocity, to the amazement and delight of all present... [He] played a concerto of Schobert, and one of my Father's , and a great deal of Extemporary Preluding. […]” - Letter from Frances Burney to Samuel Crisp, May 1775 Charles Burney Frances Burney Baron Deiden Baroness Deiden Edward Jones John Joseph Merlin Esther Burney performances: • harp music (perf. Edward Jones) • harpsichord duet (comp. Muthel; perf. Charles Burney, Esther Burney) • Lesson by Charles Burney (comp. Charles Burney; perf. Esther Burney) • Rondeau from Piramo and Tisbé (comp. Venanzio Rauzzini; perf. James Harris, Louisa Harris) • piece by Johann Gottfried Eckard (comp. Johann Gottfried Eckard; perf. Esther Burney) • concerto by Johann Schobert (comp. Johann Schobert; perf. Charles Burney)
  • 29. Enhancing the Linked Data cloud
  • 30.
  • 31. Motivation • Consolidate existing Web Data by integrating new or refined information. • Assist information retrieval applications by providing an additional data node to traverse. • Provide factual support to assess the truthfulness of stated assertions on the Web.
  • 32. BNB: British National Bibliography (The British Library) DBpedia: structured data from Wikipedia infoboxes LinkedBrainz: MusicBrainz as linked data VIAF: Virtual International Authority File DBpedia BNB VIAF Geo-names Linked Brainz
  • 33. DBpedia BNB Linked Brainz Geo- VIAF names
  • 34. LED data are re-published as a Linked Open Data set • Hosted at http://data.open.ac.uk • SPARQL query service at http://data.open.ac.uk/query • Documentation at http://led.kmi.open.ac.uk/linkeddata
  • 35. What does LED contribute to the LD cloud?
  • 36. New data Historical music performances Royal Carl Rosa Company – “Faust” for orchestra and voice date: 14 May, 1917 location: Garrick Theatre Patron’s Fund - “The Birthday of the Infanta” date: 9 July, 1931 location: London (indoors, private space) (you won’t find them on last.fm or setlist.fm)
  • 37. New data Portions and quotes of source documents / manuscripts Journeying boy : the diaries of the young Benjamin Britten 1928-1938 (provided by the British Library) Author: Benjamin Britten Editor: John Evans Published: Faber, London, 2009 ISBN: 9780571238835 … (provided by LED) Diary entries: • Page 17, Feb 14 1929: “Still absent from school work. Everso much more snow […]” • Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]” • Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear Gramophone records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov. 1. (Rubenstein) Tchaik.” • …
  • 38. Refinements of existing data Mary Somerville (Provided by DBpedia) Born: 1780-12-26 in Jedburgh Died: 1872-11-28 in Naples Field: Polymath, Science journalism VIAF ID: 27288356 … (integrated by LED) Full name: Mary Fairfax Greig Somerville Social group: Rulers, chiefs, aristocracy & gentry etc. Occupation: Scientist Religion: Christian, Protestant wrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
  • 39. Alignments dbpedia:Aaron_Copland dbpedia:Jane_Austen ≡ ≡ bnb:CoplandAaron1900-1990 bnb:AustenJane1775-1817 These semantic links are not found on the LD cloud. By exposing them, we assist Semantic Web applications in the retrieval of relevant information from multiple data sources.
  • 40. Figures on reuse Computed on 1102 distinct listening experiences Type Unique instances Total reuse Peak People 626 1687 184 Written works 902 990 46 Geographical locations 492 266 70 Musical items (songs, albums, performances) 2400 337 22 Musical genres 78 644 219 from external data sources Source Reused distinct instances DBpedia 823 BNB 339 data.gov.uk 816 MusicBrainz SOON
  • 41. Questions? Mathieu d’Aquin, Alessandro Adamou Knowledge Media Institute, The Open University {mathieu.daquin|alessandro.adamou}@open.ac.uk @mdaquin | @anticitizen79

Editor's Notes

  1. On behalf of Mathieu’ sitting over there on how we combined cutting-edge data management with the practices of crowdsourcing
  2. Having a non-empty database to begin with can help a lot. Though this seems a contradiction, it is only an apparent one.
  3. We started off with a set of curated entries, so crowdsourcing It is not the only way LED aggregates content, but it is the trickiest
  4. Wouldn’t it be great if we just store the essential information that is unique to our knowledge and rely upon authoritative data sources for the rest? This would be the machine equivalent of when I tell someone “Hey, I just went to see a Ravel concert” “Who’s Ravel?” and I would quite practically, if impolitely, answer “Ah come on, look it up on Wikipedia!”
  5. What seems to be the catch with it? It would appear that of I wanted to put the data on those websites to good use, I would have to either hack my way through to the underlying databases, or program my software systems to read the Web pages, with all the unpredictability that comes with it.
  6. The never-abused enough Mozart example, but with a twist. Google did not start by publishing these structured data firsthand, but rather compiling them out of data sources from the Web, the same it indexes for us to search.
  7. This kind of alignment can be done on-the-fly, not so much because we are reusing Web Data, but because we adopt the same paradigm as Linked Data within the LED system, where each entity is a node and we just move references around.
  8. For the record, the Linked Data cloud is something that looks like this today, with every circle being a data provider
  9. What does this mean? That we have saved users the burden of rewriting information about the same people like seventeen hundred times overall, and up to 184 times for one person, whom I think it’s Britten