Royal Opera House: Why we love linked data and the semantic web

12,627 views

Published on

Ongoing project to rebuild the Royal Opera House website along semantic lines.
The challenges of thoroughly modelling information for a cultural institution.
The potential benefits for arts organisations of exposing their data.

Ellen West and Jamie Tetlow gave this presentation at Culture Geek conference at the Barbican Centre in London on Friday 7th September 2012.

http://www.culturegeek.com

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
12,627
On SlideShare
0
From Embeds
0
Number of Embeds
1,232
Actions
Shares
0
Downloads
32
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • 1. Why we love linked data and the semantic web Ellen West (Online Content Manager) - Studied English, entirely editorial background. 20 years ago would have wanted to go into publishing. Joined BBC online in 1999 and then ROH in 2011. Jamie Tetlow (Design and Development Manager) - Studied Fine Art. Fell into the web through a diversifying music production company. Joined BBC in 2002 and then ROH in 2010.
  • 2. Teams Content - 8 staff including Content Producers, Social Media Manager, Learning Producer, AV Editor and a Web Editor. Technical Development - 2 Developers - a completely new team, used to be outsourced to agencies but ROH wants to take more control.
  • 3. Some definitions
  • 4. 'Linked Data' / 'Semantic Web' Tim Berners-Lee, Design Isues, 2006-07-27 http://www.w3.org/DesignIssues/LinkedData.html Data enlightenment - providing a key for your data's potential energy. Is your data 5 star? ROH is lingering somewhere between 2 and 3. How did we get there? WHat influenced us? Where are we going?
  • 5. Culture Hack Day, January 2011
  • 6. Culture Hack Day (Weekend) Actually a weekend, 15-16 January 2011 http://culturehackday.org.uk/previous-hacks/culture-hack-london-2011/ The first Culture Hack Day curated by the Royal Opera House but mostly the product of Rachel Coldicutt (now of Caper and continuing the Culture Hack strand). Many developers worked with data freely available on the web, we had brought 2 CSV files for people to use, useful but harder to get started. This, as the key sponsor, was a little embarrassing.
  • 7. Some specifics CULTURE HACK DAY – THE STATS: 69 DEVELOPERS 8 INSPIRING SPEAKERS 12 CULTURAL ORGANISATIONS 1 SOFTWARE COMPANY 3 MEDIA ORGANISATIONS 2 FUNDING BODIES 80 PEOPLE WHO WERE INTERESTED ENOUGH TO TURN UP AND JOIN IN WITH THE DEBATE 276 TWEETS FROM @CULTUREHACKDAY 15 GIANT BEAN BAGS 82 PIZZAS 11 CRATES OF BEER 100 GOODIE BAGS FULL OF USB STICKS, DVDS, STORYCUBES, MAGAZINES...
  • 8. Hack by Clare Lovell and Matthew Somerville Pepys' Shows http://pepysshows.co.uk Combining the Pepys diary blog serialisation by Phil Gyford and Matthew Somerville's own Theatricalia theatre database to provide Sam's postings and reviews on productions he'd seen.
  • 9. Hack by Dan Williams When Should I Vist? Gather data from Foursquare logins to allow people to see which were the less busy days to visit a museum.
  • 10. The BBC
  • 11. BBC Programmes The BBC used to create bespoke programme websites (for the higher budget shows) and schedules for all channels but [ephemerally] showing just the next 10 days. This was expensive, incoherent and eroded over time. BBC Programmes aimed to create a permanent, findable web presence for every programme the BBC broadcast. This influenced our thinking about the foundations of our website. URL design and permanence. Building the archive as you go.
  • 12. BBC Music The BBC Music website has a page for every artist featured on TV and Radio. Not all music, just the music relevant to the BBC.
  • 13. musicbrainz.org It uses the publicly contributed online music encyclopedia as a source for it's artist web-scale identifiers (URL slugs) rather than mint it's own. Re-using what's already available. The BBCs digital play-out systems identify artist names which are matched against musicbrainz to determine which artist pages are available. It also use the official wikipedia URLs provided by musicbrainz to extract biography information from wikipedia (a bot tracks wikipedia article edits being announced in the IRC chatrooms). BBC Music editorial staff became trusted contributors to muscibrainz and wikipedia. This project influenced our thinking about the web as content management system.
  • 14. BBC Wildlife Finder One of the richest a/v archives the BBC actually owns the rights to is the output of the BBC Natural History Unit in Bristol. The project used the 18C Swedish botanist Carl Linnaeus' model of species relationship as its foundation.
  • 15. dbpedia.org The project utilised similar wikipedia principles as BBC Music but went a step further in using DBpedia (a project, central to the world of Linked Data, that aims to extract structured data from wikipedia and make this information available on the web) URL slugs. This project influenced our thoughts on creating a rich experience. All BBC projects are well documented in many other presentations: http://www.slideshare.net/fantasticlife/semweb-at-the-bbc http://www.slideshare.net/metade/linked-data-on-the-bbc http://www.slideshare.net/fantasticlife/how-we-make-websites-iwmw2009
  • 16. The old Royal Opera House website Had been around for 5-6 years.
  • 17. Waiting room It had a very poor customer experience in that it utilised a waiting room to handle load on busy ticket sale days. This would place our customers spending considerable figures in a waiting room that might take several hours and, because the waiting room sat in front of the whole website, it also causing frustration (desertion) in casual browsers.
  • 18. A big sitemap Nothing wrong with big websites but over 600 statically published pages is not an easy thing to manage. Content quickly becomes dated. Pages and structure had been produced in departmental silos creating a confusing experience. Editorial teams had been given a page making tool and they went forth and made pages. Over 60% of the traffic was centered on the 5 or 6 dynamic listing/ticket-sales pages. There was a massive discrepancy between editorial effort and user engagement. The richest editorial content was hidden within the, ironically named, 'Discover' section.
  • 19. Verdi’s Aida – lots of content An example of our content offering for the the opera Aida. Event details, long-form articles, videos, photos, blog posts, artifacts, products, etc. Great content but...
  • 20. Incoherently connected All manually interlinked. So the ticket buyer may not be able to find the rich background information and the content consumer, arriving from Google, may never know we have events. Missed opportunities.
  • 21. Bloated CMS A CMS with 30 attributes for a production. Many attributes were replications of data already in our internal CRM/ticketing software. Beyond the essential attributes only 2 or 3 others were used but their labels were not respected, they happened to be the attributes that appeared near the top of the page. This is people thinking about 'page making' and not 'managing data'.
  • 22. The new Royal Opera House website
  • 23. Domain modelling As the principles of Domain Driven Design outline we had many conversations about our domain and sketched out the relationships to get a shared understanding. We referenced Luke Blaney's Theatre Ontology - http://lukeblaney.co.uk/semweb/theatre - This was expanded when he attended the Culture Hack Day and explored our performance archive data.
  • 24. Evolving prototype Before any traditional visual or graphic design had taken place we went straight to prototyping in code. This is the ONLY way to get a true feel for whether your modelling (structure and relationship) actually works.
  • 25. Make the most of existing internal tools We made use of existing tools pushing text based content, that used to exist in our CMS, upstream into our CRM/ticketing software. Tessitura has easily configurable classification systems and object attributes: Tessitura > Production Elements > Keywords and Content Items.
  • 26. Content on the web Our old website had a video player, photo galleries, long form text and link but these were manually integrated and the functionality hadn't evolved at the same rate as the web. These would have been costly to update when we were already cross-posting on YouTube and Flickr and were running a WordPress blog and storing links in Delicious. On these sites the content was managed better, the technology was more robust, had clearer rights and reached a much greater audience. These websites could also be considered platforms as they follow open data principles and expose their content in machine readable formats.
  • 27. Tags These platforms all utilise tagging and so we defined a simple, natural language, tagging scheme to enable us to re-aggregate content around our productions (our principle first order object). We had begun to use the 'Web as our CMS'.
  • 28. Production page The tagging scheme mapped to data already stored in our internal systems that we use in designing our production page URLs.
  • 29. Production page content sources The production page combines all the sources of data. We have brought structure to our content :-) Pre-defined set of Tessitura data is exported to a MySQL database - we have a Zend application running on top where we've created a 'Media Manager' module that collects data from the various sources on the web based on the tagging scheme. This is aggregated together to create a production page.
  • 30. Where's it heading We said we our data is somewhere between the linked data two and three stars. Why are we wanting to structure our data so much.
  • 31. 'Verdi' Search for Verdi on Google and the results are no longer a simple 'top ten pages'. Search results are becoming a 'media' experience highlighting videos, images, news and events and now, most recently, of significance to us are the info boxes appearing on the right hand side that feature people, places and…
  • 32. …which links to ‘Aida’ creative works! Could it get to a point where Google has got so far with its mission to "organize the world's information and make it universally accessible and useful" that you no longer need to leave Google to get the information you need? We can't be too sure but we certainly want to be a part of this organising mission and not be watching from the outside.
  • 33. schema.org Fortunately it's quite easy to follow this organising effort as Google have come together with Bing and Yahoo to agree on standards for exposing data.
  • 34. Schemas You can read all the specifications for defining People, Places, Organisations, Events, Creative works, etc on schema.org
  • 35. HTML view for people We're getting in shape as we've structured our site and URLs so that it not only can be read by people…
  • 36. Data view for machines …but can also be consumed by machines (this is work in progress!)
  • 37. Challenges
  • 38. Opening up other areas of the business So much do model, so much content to bring structure to. This stuff takes time and effort and so one of the biggest challenges is managing the expectations of the business.
  • 39. Conclusions We work in an area that is creative as well as commercial. We want to reach as many people as possible and inspire them. Working for a historic institution also makes you a custodian of its heritage - there's a lot that the ROH can contribute by opening up. Open data, linked open data and the semantic web is a way of taking part in innovations that we as yet have no conception of, there are untold benefits.
  • 40. Thank you – http://www.roh.org.uk
  • ×