Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Knowledge Acquisition from Music Digital Libraries


Published on

A vast amount of musical knowledge has been gathered for centuries by musicologists and music enthusiasts. Most of this knowledge is implicitly expressed in artist biographies, reviews, facsimile editions, etc. Music Digital Libraries make this information available and searchable. Documents are indexed and keyword-based search is generally provided. However, implicit knowledge present in text is not understood by machines, so complex queries cannot be answered.

As a first step towards the approximation of machine understanding to the accumulated musicological knowledge, documents stored in Digital Libraries must be semantically annotated. Current descriptive metadata and markup annotations provide some structured information. Nevertheless, it is insignificant compared with the epistemic potential of the source text. Once documents are properly annotated, complex structures and meaningful relations between pieces of information may emerge. This supports a paradigm shift, from keyword-based systems to knowledge-based systems, hence enabling musicologists to formulate more complex queries. As a consequence of that, Digital Libraries will turn into real knowledge environments, instead of mere searchable repositories.

Manual annotation of documents is very expensive, and sometimes unwieldy. Thus, the use of reliable, automatic processes is crucial to build knowledge environments. In the last few years several studies and approaches on the use of Semantic Web technologies in Digital Libraries have been proposed. The result of these intersection has been coined as Semantic Digital Libraries. Most of the related work is focused on the acquisition of Semantic Web methodologies for knowledge representation, which, among other advantages, facilitates information exchange between multiple knowledge bases. In the case of Music Digital Libraries though, tools and methodologies developed around the Semantic Web for automatic knowledge acquisition have received less attention.

In our work, an extensive survey is provided about the applicability and performance of state-of-the-art semantic technologies for knowledge acquisition from Music Digital Libraries. These technologies are adapted to fulfill the requirements and specificity of the music domain. An evaluation of analyzed tools is performed over artist biographies and Flamenco Music articles gathered from different sources (e.g. The Grove Dictionary, Wikipedia). An exhaustive overview of the possibilities that a knowledge layer may offer to Music Digital Libraries is exposed. Finally, some guidelines for future work in this research direction are also provided.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Knowledge Acquisition from Music Digital Libraries

  1. 1. Knolwedge  Acquisi0on  from   Music  Digital  Libraries       Sergio  Oramas,  Mohamed  Sordo       IAML/IMS  Congress  2015    
  2. 2. •  Musicological  Knowledge  is  hidden  between  the  lines   •  Machines  don’t  know  how  to  read   Mo0va0on  
  3. 3. •  Obtain  knowledge  automa0cally   •  Make  complex  ques0ons   •  Visualize  the  informa0on   •  Improve  naviga0on   •  Share  knowledge   Why  Knowledge  Acquisi0on?   4  
  4. 4. Musical  Libraries  
  5. 5. Musical  Libraries   digital   recording,   scan  
  6. 6. Musical  Libraries   digital   recording,   scan   OCR,  manual   transcrip0on  
  7. 7. Musical  Libraries   digital   recording,   scan   OCR,  manual   transcrip0on   Informa0on  Extrac0on,   seman0c  annota0on  
  8. 8. Musical  Libraries   Current  DL   digital   recording,   scan   OCR,  manual   transcrip0on   Informa0on  Extrac0on,   seman0c  annota0on  
  9. 9. Musical  Libraries   Current  DL   Web  search   digital   recording,   scan   OCR,  manual   transcrip0on   Informa0on  Extrac0on,   seman0c  annota0on  
  10. 10. Musical  Libraries   digital   recording,   scan   OCR,  manual   transcrip0on   Informa0on  Extrac0on,   seman0c  annota0on   Current  DL   Web  search  
  11. 11. •  The  Seman&c  Web  aims  at  conver0ng  the  current  web,   dominated  by  unstructured  and  semi-­‐structured  documents   into  a  web  of  linked  data.   •  Achievements  useful  for  Digital  Libraries   –  Common  framework  for  data  representa0on  and   interconnec0on  (RDF,  ontologies)   –  Seman0c  technologies  to  annotate  texts  (En0ty  Linking)   –  Language  for  complex  queries  (SPARQL)   Seman0c  Web  
  12. 12. Wikipedia  and  DBpedia   13   -­‐  Digital  Encyclopedia   -­‐  Unstructured   -­‐  Keyword  search   -­‐  Knowledge  Base   -­‐  Structured   -­‐  Query  search  
  13. 13. 14  
  14. 14. 15  
  15. 15. •  Dbpedia  example  queries   –  Composers  born  in  Vienna  in  XVIII  Century   –  American  jazz  musicians  that  have  wri_en  songs  recorded  by   RCA  Records   •  Dbpedia  graph  applica0ons   –  En0ty  Relevance   –  En0ty  Similarity   –  En0ty  Recommenda0on   DBpedia   16  
  16. 16. Google  Knowledge  Graph   17  
  17. 17. •  Current  naviga0on   –  Indexes   –  Keyword-­‐based  search   •  From  searchable  repositories  to  knowledge  environments   –  Complex  queries   –  Informa0on  visualiza0on   –  Recommenda0on   –  Data  Analy0cs   –  Q&A   Music  Digital  Libraries  
  18. 18. Music  Digital  Libraries   19  
  19. 19. Music  Digital  Libraries   20  
  20. 20. •  Encyclopedic  dic0onary   •  One  of  the  largest  reference  works  in  Western  music   •  Ar0st  biographies  crawled  from  the  Grove  Music  Online   –  16,707  biographies  (1st  paragrahph)   –  From  pre-­‐medieval  to  contemporary   Dataset:  The  New  Grove  
  21. 21. •  What  are  the  most  relevant  music  schools?   •  What  are  the  ar0sts  most  similar  to  Schoenberg?   •  Which  are  the  most  represented  roles  in  the  Grove?   •  Is  there  a  migra0on  tendency  in  ar0sts?  To  which  ci0es?   •  What  is  the  best  city  to  die  for  a  musician?   Dataset:  The  New  Grove   22  
  22. 22. Dataset:  The  New  Grove   23  
  23. 23.   Anton  Webern   (b  Vienna,  3  Dec  1883;  d  Mi_ersill,15  Sept  1945).  Austrian   composer  and  conductor.  Webern,  who  was  probably   Schoenberg's  first  private  pupil,  and  Alban  Berg,  who  came  to   him  a  few  weeks  later  […]  Boulez  and  Stockhausen  and  other   integral  serialists  of  the  Darmstadt  School  […]   Informa0on  Extrac0on   24  
  24. 24.   Anton  Webern   (b  Vienna,  3  Dec  1883;  d  Mi_ersill,15  Sept  1945).  Austrian   composer  and  conductor.  Webern,  who  was  probably   Schoenberg's  first  private  pupil,  and  Alban  Berg,  who  came  to   him  a  few  weeks  later  […]  Boulez  and  Stockhausen  and  other   integral  serialists  of  the  Darmstadt  School  […]   Informa0on  Extrac0on   25  
  25. 25.   Anton  Webern   (b  Vienna,  3  Dec  1883;  d  Mi_ersill,15  Sept  1945).  Austrian   composer  and  conductor.  Webern,  who  was  probably   Schoenberg's  first  private  pupil,  and  Alban  Berg,  who  came  to   him  a  few  weeks  later  […]  Boulez  and  Stockhausen  and  other   integral  serialists  of  the  Darmstadt  School  […]   Informa0on  Extrac0on   26  
  26. 26.   Anton  Webern   (b  Vienna,  3  Dec  1883;  d  Mi_ersill,15  Sept  1945).  Austrian   composer  and  conductor.  Webern,  who  was  probably   Schoenberg's  first  private  pupil,  and  Alban  Berg,  who  came  to   him  a  few  weeks  later  […]  Boulez  and  Stockhausen  and  other   integral  serialists  of  the  Darmstadt  School  […]   Informa0on  Extrac0on   27  
  27. 27.   Anton  Webern   (b  Vienna,  3  Dec  1883;  d  Mi_ersill,15  Sept  1945).  Austrian   composer  and  conductor.  Webern,  who  was  probably   Schoenberg's  first  private  pupil,  and  Alban  Berg,  who  came  to   him  a  few  weeks  later  […]  Boulez  and  Stockhausen  and  other   integral  serialists  of  the  Darmstadt  School  […]   Informa0on  Extrac0on:  En0ty  Linking   28   Domain   Knowledge  Base  
  28. 28. •  Webern,  who  was  probably  Schoenberg's  first  private  pupil   Informa0on  Extrac0on:  Rela0on  Extrac0on   29  
  29. 29. •  Webern,  who  was  probably  Schoenberg's  first  private  pupil   Informa0on  Extrac0on:  Rela0on  Extrac0on   30   Webern   Schoenberg   pupil_of  
  30. 30. Methodology   31  
  31. 31. Knowledge  Graph   32  
  32. 32. •  16,707  biographies   •  434  roles   •  Graph   –  47,367  nodes   –  274,333  edges   Knowledge  Graph:  Data  Analy0cs   33   Role   Amount   composer                   2618   teacher   1065   conductor                 968   pianist   704   organist                   676   singer       404   violinist                 285   …       musicologist           144   cri0c       133  
  33. 33. 34   Birth  Year   Death  Year  
  34. 34. 35   Birth  Year   Death  Year  
  35. 35. Knowledge  Graph:  Data  Analy0cs   36   Country   Births   Deaths   Difference   United  States   2317   2094   -­‐10%   Italy   1616   1279   -­‐21%   Germany   1270   1292   2%   France   991   1058   7%   United  Kingdom   882   877   -­‐1%   City   Births   Deaths   Difference   London   322   507   57%   Paris   304   720   137%   New  York   266   501   88%   Vienna   177   292   65%   Rome   159   256   61%  
  36. 36. •  City   –  Paris,  London,  Vienna,  Rome,  Venice,  Berlin,  Paris,  New  York   •  Venue   –  Covent  Garden  Theatre,  King's  Theatre,  Drury  Lane,  Carnegie   Hall,  Théâtre  de  la  Monnaie,  Stad_heater,  Theatre  Royal,  …   •  Educa0onal  Ins0tu0on   –  Paris  Conservatoire,  Moscow  Conservatory,  Juilliard  School,  St   Petersburg  Conservatory,  Bmus,  Prague  Conservatory,  Leipzig   Conservatory,  Vienna  Hochschule  für  Musik,  ...   Knowledge  Graph:  En0ty  Relevance   37  
  37. 37. •  Biography  subject   –  Haydn,  Claude  Debussy,  Arnold  Schoenberg,  Robert  Stevenson,   Paul  Hindemith,  Giovanni  Pierluigi  da  Palestrina,  Gustav  Mahler,   Maurice  Ravel,  Jean-­‐Philippe  Rameau   –  Mozart?,  Bach?,  Wagner?   •  Genre   –  chamber  music,  cappella,  jazz,  folk  music,  avant  garde,  baroque   music,  electronic  music,  musical  theatre,  plainchant   Knowledge  Graph:  En0ty  Relevance   38  
  38. 38. •  PageRank  algorithm  and  Maximal  Common  Subgraph     •  Arnold  Schoenberg:  Anton  Webern,  Paul  Hindemith,  Gustav   Mahler,  Alban  Berg,  Claude  Debussy   •  Guido  Adler:  Heinrich  Jalowetz,  Eusebius  Mandyczewski,   Robert  Fuchs,  Karl  Weigl,  Anton  Wranitzky   •  Manuel  de  Falla:  Ricardo  Viñes,  Juan  Vicente  Lecuna,  Enrique   Granados,  Miguel  Llobet  Soles,  Luigi  Russolo   •  Miles  Davis:  Dizzy  Gillespie,  Herbie  Hancock,  Paul  Chambers,   Tony  Williams,  Cannonball  Adderley   Knowledge  Graph:  En0ty  Similarity   39  
  39. 39. Rela0ons  Graph:  Informa0on  Visualiza0on   40  
  40. 40. Rela0ons  Graph:  Informa0on  Visualiza0on   41  
  41. 41. •  Crea0on  and  cura0on  of  a  library  from  Web  content   •  FlaBase:  Flamenco  Knowledge  Base   Mixing  Different  Data  Sources   42  
  42. 42. •  Data  Acquisi0on   –  APIs   –  Web  crawling   –  SPARQL  endpoints   •  En0ty  Resolu0on   –  String  similarity  between  labels   –  Graph  similarity  (context  informa0on)   Mixing  Different  Data  Sources   43  
  43. 43. FlaBase:  Flamenco  Knowledge  Base   44  
  44. 44. •  Data  gathered   –  1,174  Ar0sts  (text  biography)   –  76  Palos  (flamenco  genres)   –  2,913  Albums   –  14,078  Tracks   –  771  Andalusian  loca0ons   •  Knowledge  Extracted   –  Place  of  birth   –  Date  of  birth   –  En0ty  men0ons  in  text   FlaBase:  Flamenco  Knowledge  Base   45  
  45. 45. •  Number  of  ar0sts  by  year  of  birth   FlaBase:  Data  Analy0cs   46  
  46. 46. FlaBase:  Data  Analy0cs   47  
  47. 47. •  Flamenco  expert  evalua0on   FlaBase:  Ar0st  Relevance   48   Precision  Values  
  48. 48. •  Music  Digital  Libraries  can  benefit  from  seman0c  approaches     •  Music  Digital  Libraries  are  s0ll  in  an  early  stage  of   development  compared  to  the  Web  (Linked  Open  Data,   Google  Knowledge  Graph)   •  Knowledge  acquisi0on  from  Digital  Libraries  can  help   musicologists  not  only  to  search  content,  but  also  to  discover   new  knowledge   Conclusions   49  
  49. 49. •  This  work  was  partly  funded  by  the  COFLA2  research  project   (Proyectos  de  Excelencia  de  la  Junta  de  Andalucía,  FEDER  P12-­‐ TIC-­‐1362).   Aknowledgments   50  
  50. 50. •  Oramas  S.,  Sordo  M.,  Espinosa-­‐Anke  L.,  Serra  X.  (2015).  A  Seman,c-­‐based   approach  for  Ar,st  Similarity.  Interna0onal  Society  for  Music  Informa0on  Retrieval   Conference  ISMIR  2015.  In  Press.   •  Oramas  S.,  Gomez  F.,  Gomez  E.,  Mora  J.  (2015).  FlaBase:  Towards  the  crea,on  of  a   Flamenco  Music  Knowledge  Base.  Interna0onal  Society  for  Music  Informa0on   Retrieval  Conference  ISMIR  2015.  In  Press.   •  Sordo,  M.,  Oramas  S.,  &  Espinosa-­‐Anke  L.  (2015).  Extrac,ng  Rela,ons  from   Unstructured  Text  Sources  for  Music  Recommenda,on.  Interna0onal  Conference   on  Applica0ons  of  Natural  Language  to  Informa0on  Systems  NLDB  2015.   •  Oramas  S.,  Sordo  M.,  Espinosa-­‐Anke  L.  (2015).  A  Rule-­‐based  Approach  to   Extrac,ng  Rela,ons  from  Music  Tidbits.  2nd  Workshop  on  Knowledge  Extrac0on   from  Text  at  WWW  2015.   •  Oramas  S.,  Sordo  M.,  Serra  X.  (2014).  Automa,c  Crea,on  of  Knowledge  Graphs   from  Digital  Musical  Document  Libraries.  Conference  in  Interdisciplinary   Musicology  CIM  2014.   Bibliography   51  
  51. 51. Knolwedge  Acquisi0on  from   Music  Digital  Libraries       Sergio  Oramas,  Mohamed  Sordo   @sergiooramas   Thanks!