Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 0.0 media panel - matthias priem - gtuo - semantics 2017

86 views

Published on

Talk at SEMANTiCS 2017
www.semantics.cc

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Session 0.0 media panel - matthias priem - gtuo - semantics 2017

  1. 1. Linking thesauri
  2. 2. About me Matthias Priem VIAA Manager Archiving
  3. 3. Over VIAA
  4. 4. Not (only) a Broadcast Archive
  5. 5. CULTURAL HERITAGE We don’t own collections Currently: +/- 100 orgs CITY ARCHIVES BROADCAST SERVICE PROVIDER
  6. 6. DIGITISATION
  7. 7. VIAA FACTS & FIGURES ●  Total carriers identified: 650.000 ●  Total carriers digitised today: 227.000 ●  About 30% done (but most projects are started)
  8. 8. DIGITAL ARCHIVE SYSTEM
  9. 9. VIAA FACTS & FIGURES ●  Archive today: 80.000+ hours (2 PB) ●  Growing 15+ TB/day (that’s almost 0,5 PB / month)
  10. 10. VIAA FACTS & FIGURES ●  Archive today: +80.000 hours / 2 PB ●  Growing 15TB/day (that’s 0.5 PB / month)
  11. 11. EDUCATION
  12. 12. VIAA FACTS & FIGURES in 50% of all schools (in < 1 year)
  13. 13. VIAA FACTS & FIGURES in 50% of all schools Fiat/IFTA nominee
  14. 14. VIAA and thesauri ●  VIAA archives collections of more than 100 organisations ●  Multiple sectors with large differences ●  Feasibility study in 2014 “Can we build one ‘unified’ thesaurus and if so: how?”
  15. 15. Feasibility study: main outcomes ●  One generic thesaurus won’t work ○  Huge discussions on content ○  Very niche thesauri vs. more generalist ●  But ○  What about SKOS ○  And what about linking thesauri
  16. 16. What about SKOS? ●  Standardized way for knowledge organisation ●  W3C, semantic web ●  Main contents ○  Concepts with descriptions ○  Hierarchy ○  Alternative names ○  Multilingual ○  ...and links: exactMatch, ...
  17. 17. What about linking thesauri? voc1 voc2 •  Allows people to work on their own thesaurus •  Benefit from each others work •  Allow unified search
  18. 18. But linking manually…?
  19. 19. About Sound & Vision Audiovisual archief van Nederland > 800.000 uur materiaal (tv, radio, muziek, docu, film, commercials, etc.) Archive + Access GTAA!! “Archive as Lab” ●  Smart ●  Connected ●  Open
  20. 20. About Taalunie ●  Support of Dutch around the globe ●  Also active in the field of digital heritage
  21. 21. Linked Thesaurus for Uniform Dissemination ●  Source selection ●  SKOSification ●  Alignment (linking of the sources) ●  Demonstrator
  22. 22. With a little help from our friends
  23. 23. Source thesauri GTAA (SKOS) VRT Thesaurus (not SKOS.. yet)
  24. 24. VRT Thesaurus 100.000+ terms Structured Every term has an ID Not standardized Not published Managed as part of AVID Bit of a ‘black box’
  25. 25. VRT Thesaurus Relations in the thesaurus
  26. 26. VRT Thesaurus Hierarchical relations, to SKOS (‘broader’ en ‘narrower’) Also: related terms
  27. 27. Amsterdam in SKOS (turtle)
  28. 28. VRT Thesaurus: scopeNotes
  29. 29. Synonyms
  30. 30. VRT Thesaurus after SKOSification 102.172 terms (concepts) 97.744 terms are linked hierarchically 4.429 topConcepts 212 scopeNotes 6828 relations between terms
  31. 31. GTAA 184.484 termen 19.695 terms are linked hierarchically 9 conceptSchemes (persons, locations, …) 90.708 scopeNotes 33.542 relaties => published as linked open data J
  32. 32. Linking thesauri together http://CultuurLINK.beeldengeluid.nl
  33. 33. Linking thesauri Start working with two thesauri Isolate a part of it (e.g. geographical names) Compare the resulting terms - String matching - or more complex operations Re-iterate on non-matching terms
  34. 34. Linking subjects
  35. 35. Linking People (zoom) FILTERS COMPARISON
  36. 36. Linking People (zoom) VERIFY
  37. 37. Linking People (zoom) REPEAT
  38. 38. Linking People
  39. 39. Result of the linking process Term # of Links Subjects (things) 4.167 Names (bands, companies,...) 2.197 Locations 4.011 Persons 11.265 Total 21.640
  40. 40. Learnings ●  Linking is not (very) technical ●  Linking requires knowledge of the sources ●  Linking requires human input ●  Richer thesaurus: more chance of finding links
  41. 41. http://link.spinque.com/VIAA-1.0/ Demonstrator
  42. 42. Demonstrator - Sources ●  VIAA ○  Part of the VRT collection. ○  +/- 35.000 items (of 1 mio records) ○  Annotations and links to the thesaurus ●  Sound & Vision ○  ‘Openbeelden’ ○  Again: a small part of the archive ○  The annotation of those items.
  43. 43. SLIDER determines weight of the collection SEARCH
  44. 44. Keyword “migration” All matching keywords of all search results (grouped & with indicator) Search results with highlighted -  matches in descriptions -  matches in thesaurus terms collection indicator
  45. 45. Changed the weight towards the North … more results from across the border
  46. 46. Clicked video is now the search criterium Terms with associated with all related video’s Related video’s
  47. 47. User searches for related VRT content based on the B&G video.
  48. 48. Conclusions ●  Up until now: we gained a lot of insight in working with thesauri ○  SKOS ○  Linking thesauri ○  Linked and open data ○  Got a good view on what thesaurus landscape looks like ●  Linking makes thesauri richer, allows for collaborative work
  49. 49. Many more steps ahead ●  Select thesauri to start working from ○  Link them where appropriate ○  Reuse existing sources where possible (VIAF, GeoNames, …) ●  Manage ○  Good management tool and workflow for thesauri ●  Use (integration, integration, integration) ○  Use (public) LOD sources in our collection mgmt system (!) ○  Use them as reference source for term extraction ○  Re-use the terms in user-facing systems
  50. 50. Thanks!

×