Drupal case study: ABC Dig Music

7,202 views

Published on

A Drupal case study on developing the Australian Broadcasting Corporation's Dig Music website. I gave this talk at Drupal Downunder #ddu2011 in Brisbane, Australia (Jan 23, 2011).

I discuss how the Semantic Web was used to create a real time snapshot of a musical artist that is pulled live from the digital radio broadcast.

I also talk about performance issues we encountered and ways that they were overcome.

Published in: Technology, Sports
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,202
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
24
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Drupal case study: ABC Dig Music

  1. 1. Case Study – ABC Dig MusicDavid Peterson @davidseth #ddu2011 http://www.flickr.com/photos/soyignatius/
  2. 2. David Peterson @davidseth
  3. 3. ChallengeCreate a snapshot of an artist
  4. 4. • Known DataCombining • Data in the Wild
  5. 5. Problem<xml> <track> <title>Purple Rain</title> <artistName>Prince</artistName> </track></xml>
  6. 6. Into
  7. 7. It’s all about Storytelling…
  8. 8. Shared Understanding• Can’t tell a story if the other person doesn’t get what we mean• Or even speak the same language
  9. 9. • The story matters• ... but ...• You never really have all the information you need, whether big or small
  10. 10. You Just don’t Always Know• Someone else knows more than you• How to find it?
  11. 11. One Exception
  12. 12. Semantic Web• Core idea – you never really know the entire picture• This is a “good thing”• Freedom
  13. 13. Open WorldClosed World http://www.flickr.com/photos/almasryalyoum_e/
  14. 14. “If the graph of people iscool, imagine a graph of everything” - Dries Buytaert
  15. 15. Open Data
  16. 16. Facebook?• A little late to the party ;)
  17. 17. Finding a Solution• Which APIs to use• Which APIs can we use• How can we combine data from multiple sources• How can we automate it
  18. 18. The Curse of too Much• There are over 50 APIs listed on programmableweb.com• Too many to look into• Each has its own API methods and return data formats – JSON, XML, RSS, RDF !!!
  19. 19. Take your Pick• APIs everywhere – BBC Music – Discogs – Last.fm – MusicBrainz – Yahoo Music – Flickr – Youtube – The Hype Machine
  20. 20. Finding the Key• One common feature was the usage of a MusicBrainz ID – Last.fm – Discogs – Freebase – Wikipedia/Dbpedia – BBC
  21. 21. Eureka!• Great, now all I had to do was use the MusicBrainz API to look up the ID and I was done. Easy...• :(• The search API sucked. It returned too many fuzzy results• crap
  22. 22. Back to the Future • This is where the Semantic Web enters the picture – All that stuff about story telling – Shared understanding – URIs (web links)
  23. 23. SPARQLThink of it as Google with a WHERE clause
  24. 24. SELECT ?artist WHERE { ?artist foaf:name "Prince"@en . ?artist a <http://dbpedia.org/ontology/MusicalArtist>.}
  25. 25. SELECT ?artist ?bio ?url ?album WHERE { ?artist foaf:name "Prince"@en . ?artist a <http://dbpedia.org/ontology/MusicalArtist> . ?artist dbpedia2:abstract ?bio . ?artist foaf:page ?url . OPTIONAL { ?album <http://dbpedia.org/ontology/artist> ?artist . ?album rdfs:label "Purple Rain"@en . }}LIMIT 1
  26. 26. Pinpoint Results• This returns ONE result• “exactly” what we are looking for (or nothing!)
  27. 27. {170d193a-845c-479f-980e-bef15710653e} http://www.flickr.com/photos/riseofphoenix/
  28. 28. {070d193a-845c-479f-980e-bef15710653e}http://www.flickr.com/photos/angeldew/
  29. 29. Raw Data• Not too pretty to look at• But computers LOVE this stuff
  30. 30. So, what do we get• Disambiguation• MusicBrainz ID• Discography• Related Artists• Official homepage• Bio• Credit card details (sometime in 2012)
  31. 31. The Rosetta Stone • MusicBrainz ID is our key to the wild web of APIs • Wikipedia URL is the key to Semantic Web • One happy family :)http://www.flickr.com/photos/vportals/
  32. 32. Take a look [browser]
  33. 33. Hindsight is 20/20 ... or lessons learned
  34. 34. Drupal Sucks• Drupal performance, what performance?
  35. 35. Don’t use Drupal• To get the best performance out of Drupal 6, don’t use Drupal 6!
  36. 36. Pressflow• Key patches and enhancements• Releases mirror official Drupal releases• Big players are using it – Drupal.org – ABC – Music labels – Newspapers
  37. 37. Start your EnginesMySQL base install is ... lacking• MyISAM == slow• Use Percona XtraDB• ... or ... InnoDB
  38. 38. Reduce your footprint• APC – PHP app is compiled & cached in memory• Memcached
  39. 39. Search• Drupal’s built in search can be a dawg• Solr – Much faster search – Offers faceting – Can become a platform in its own right
  40. 40. A Fresh Coat of Paint• Varnish – Last but certainly not least – Up to millions of hits per hour
  41. 41. Performance Optimisations• Switch host to Linode• Two-server architecture - db server and app server• Master-slave relationship for mysql• Migrated Drupal to Pressflow• Changed tables to InnoDB• Varnish for serving pages• memcached for caching• Setup munin to monitor servers
  42. 42. An Alternate FutureRDFaViewEntitFielMediStreaMongo
  43. 43. An Alternate Future• Drupal 7 – RDFa – Views 3 – Entities – Fields – Media Module – Stream Wrappers – MongoDB

×