Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digital Archiving, The Semantic Web, and Modern AI

195 views

Published on

This was my keynote talk on accepted the "Spotlight Award" from the association of moving image archivists. The talk relates needs of archiving, use of semantic (web) metadata, and deep learning for archiving.

Published in: Technology
  • Be the first to comment

Digital Archiving, The Semantic Web, and Modern AI

  1. 1. Tetherless World Constellation, RPI Digital Archiving, The Semantic Web, and Modern AI Jim Hendler Tetherless World Professor of Computer, Web and Cognitive Sciences Director, Institute for Data Exploration and Applications Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler (twitter) Major talks at: http://www.slideshare.net/jahendler
  2. 2. Tetherless World Constellation, RPI Not going to talk today about issues of AI and society, personal data, umeployment, etc. Wrote a book about those, happy to discuss w/people… Today I will focus on archiving: metadata, knowledge graphs, & new directions in AI (or see slideshare for “jahendler”, TedX, …)
  3. 3. Tetherless World Constellation, RPI The real challenge • Today would be the 60th birthday of my best friend growing up, Jack Pressman (who passed away 20 years ago) – How could we find a picture/image of him? • Not famous enough for wikipedia • Never made it into a youtube video • Common name (and not likely to have been annotated)
  4. 4. Tetherless World Constellation, RPI Finding Jack • What would you do? – (Class exercise) • We’d learn what we could about him – We know his age – Where did he grow up • Any of those locations have pictures with people – Where did he go to school • Any famous classmates he may be in picture with – Any major accomplishments • He wrote a well-respected book on the history of medicine (lobotomies) • Essentially, we look for things that “link” him to places, events, objects, times … – This is how finding things in archives happens • How can machines help?
  5. 5. Tetherless World Constellation, RPI So we annotate images/videos But the information is saved internal to the system, generally for later search, not exposed externally…
  6. 6. Tetherless World Constellation, RPI C) Semantic Web 2001
  7. 7. Tetherless World Constellation, RPI On the Web -- links are critical! <a href= URI> HTML Web page Any Web Resource <a href=“http://…”> RDF URI URI URI RDF is like the web!
  8. 8. Tetherless World Constellation, RPI <mind:Person rdf:id=“Hendler”> <mind:title jobs:Professor> <jobs:placeOfWork http://www.cs.rpi.edu> </mind:Person> DOC1 Hendler DOC1 Mind:title Jobs:placeOfWork Web Page http://www… ProfessorJobs:Mind: Jobs: Links in the data
  9. 9. Tetherless World Constellation, RPI <mind:Person rdf:id=“Hendler”> owl:sameAs <http://dbpedia.org/page/James_Hendler> DOC2 Hendler Mind:title Jobs:placeOfWork Web Page http://www… Jobs:Professor Asserting Links in the data Dbpedia:Hendler Owl:sameAs Dbpedia:ComputerScientist Dbpedia:occupation
  10. 10. Tetherless World Constellation, RPI Led to Linked Data experimentation and growth Billions of links in public cloud – across many sectors
  11. 11. Marking up metadata in images Slide from 2002
  12. 12. Tetherless World Constellation, RPI Based on RDF Schema/OWL PhotoStuff, ca. 2005-2007
  13. 13. Tetherless World Constellation, RPI And instances
  14. 14. Tetherless World Constellation, RPI NASA image markup (SemSpace, 2006) Also used by other govt agencies in DoD
  15. 15. Tetherless World Constellation, RPI Extended to video markup (segments) A particular scene from a movie… The story that ran on NHK television from 0847-0903 on 2001-09-11 (GMT + 9) 2008
  16. 16. Tetherless World Constellation, RPI Extended to video annotation 2008
  17. 17. Tetherless World Constellation, RPI Various experiments in museums Lora Aroyo, 2011
  18. 18. Tetherless World Constellation, RPI BBC Ontologies Many demos 2012 Olympics
  19. 19. Tetherless World Constellation, RPI Commercial takeoff really started ca. 2012
  20. 20. Tetherless World Constellation, RPI Google 2012 The Knowledge Graph
  21. 21. Tetherless World Constellation, RPI Facebook 2012 The Open Graph Protocol
  22. 22. Tetherless World Constellation, RPI Impressive results Google finds embedded metadata on >30% of its crawl – Guha, 2015 Google “knowledge vault” reported to have over 5 billion “facts” (links)
  23. 23. Tetherless World Constellation, RPI But, the knowledge graph isn’t all automated (P. Norvig, WWW 2016, 4/16)
  24. 24. Tetherless World Constellation, RPI © Peter Mika, 2014.
  25. 25. Tetherless World Constellation, RPI © Peter Mika, 2014.
  26. 26. Tetherless World Constellation, RPI © Peter Mika, 2014.
  27. 27. Tetherless World Constellation, RPI What about image/video archiving • Despite this growth, still mostly “experimental” in the archiving community – Especially image/video • Two main impediments – High cost of annotating collections with enhanced metadata – How does doing the annotation increase the “value” of a collection • Beyond search
  28. 28. Tetherless World Constellation, RPI Recent major breakthrough in automating computer vision “phase transition” in capabilities of neural networks w/machine power
  29. 29. Tetherless World Constellation, RPI “deep learning” “phase transition” in capabilities of neural networks w/machine power
  30. 30. Tetherless World Constellation, RPI Impressive results Increasingly powerful techniques have yielded incredible results in the past few years
  31. 31. Tetherless World Constellation, RPI Moving to Vision and Text Mix
  32. 32. Tetherless World Constellation, RPI Context issues a problem
  33. 33. Tetherless World Constellation, RPI And still a long way to go
  34. 34. Tetherless World Constellation, RPI But recent “action” descriptions doing better than question answering A very promising direction for jumpstarting (semi)-automated annotation
  35. 35. Tetherless World Constellation, RPI Moving from search to exploration (Mei Si, 2017) Using “narrative” technology to turn our campus archive into an interactive “story”
  36. 36. Tetherless World Constellation, RPI At human scales Cognitive and Immersive Systems Laboratory http://cisl.rpi.edu
  37. 37. Tetherless World Constellation, RPI Summary Semantic Web (Linked Data) has been a small, but growing presence in the archiving world - increasing use in library and museum communities - increasing interest in collection management - increasing interest in collection sharing Semantic Technologies are being deployed at scale in the larger Web world - still primarily for search (ad match) and social networking (ad match) New AI technologies have the potential to overcome some of the key problems - reducing the cost of metadata generation/annotation - making archives “alive” and explorable
  38. 38. Tetherless World Constellation, RPI Questions?

×