Tetherless World Constellation, RPI
Digital Archiving, The Semantic
Web, and Modern AI
Jim Hendler
Tetherless World Professor of Computer, Web and Cognitive Sciences
Director, Institute for Data Exploration and Applications
Rensselaer Polytechnic Institute
http://www.cs.rpi.edu/~hendler
@jahendler (twitter)
Major talks at: http://www.slideshare.net/jahendler
Tetherless World Constellation, RPI
Not going to talk today about issues of
AI and society, personal data, umeployment, etc.
Wrote a book about those, happy to discuss w/people…
Today I will focus on archiving:
metadata, knowledge graphs, & new directions in AI
(or see slideshare for “jahendler”, TedX, …)
Tetherless World Constellation, RPI
The real challenge
• Today would be the 60th birthday of
my best friend growing up, Jack
Pressman (who passed away 20
years ago)
– How could we find a picture/image of
him?
• Not famous enough for wikipedia
• Never made it into a youtube video
• Common name (and not likely to have been
annotated)
Tetherless World Constellation, RPI
Finding Jack
• What would you do?
– (Class exercise)
• We’d learn what we could about him
– We know his age
– Where did he grow up
• Any of those locations have pictures with people
– Where did he go to school
• Any famous classmates he may be in picture with
– Any major accomplishments
• He wrote a well-respected book on the history of medicine (lobotomies)
• Essentially, we look for things that “link” him to
places, events, objects, times …
– This is how finding things in archives happens
• How can machines help?
Tetherless World Constellation, RPI
So we annotate images/videos
But the information is saved internal to the
system, generally for later search, not exposed
externally…
Tetherless World Constellation, RPI
C) Semantic Web
2001
Tetherless World Constellation, RPI
On the Web -- links are critical!
<a href= URI>
HTML
Web page Any Web Resource
<a href=“http://…”>
RDF
URI URI
URI
RDF is like the web!
Tetherless World Constellation, RPI
<mind:Person rdf:id=“Hendler”>
<mind:title jobs:Professor>
<jobs:placeOfWork http://www.cs.rpi.edu>
</mind:Person>
DOC1
Hendler
DOC1 Mind:title
Jobs:placeOfWork Web Page
http://www…
ProfessorJobs:Mind:
Jobs:
Links in the data
Tetherless World Constellation, RPI
<mind:Person rdf:id=“Hendler”>
owl:sameAs
<http://dbpedia.org/page/James_Hendler>
DOC2
Hendler
Mind:title
Jobs:placeOfWork Web Page
http://www…
Jobs:Professor
Asserting Links in the data
Dbpedia:Hendler
Owl:sameAs
Dbpedia:ComputerScientist
Dbpedia:occupation
Tetherless World Constellation, RPI
Led to Linked Data experimentation and growth
Billions of links in public cloud – across many sectors
Marking up metadata in images
Slide from 2002
Tetherless World Constellation, RPI
Based on RDF Schema/OWL
PhotoStuff, ca. 2005-2007
Tetherless World Constellation, RPI
And instances
Tetherless World Constellation, RPI
NASA image markup (SemSpace, 2006)
Also used by other govt agencies in DoD
Tetherless World Constellation, RPI
Extended to video markup (segments)
A particular scene from
a movie…
The story that ran on
NHK television from
0847-0903 on
2001-09-11 (GMT + 9)
2008
Tetherless World Constellation, RPI
Extended to video annotation
2008
Tetherless World Constellation, RPI
Various experiments in museums
Lora Aroyo, 2011
Tetherless World Constellation, RPI
BBC Ontologies
Many demos 2012 Olympics
Tetherless World Constellation, RPI
Commercial takeoff really started ca. 2012
Tetherless World Constellation, RPI
Google 2012
The Knowledge Graph
Tetherless World Constellation, RPI
Facebook 2012
The Open Graph Protocol
Tetherless World Constellation, RPI
Impressive results
Google finds embedded metadata on >30% of its crawl – Guha, 2015
Google “knowledge vault” reported to have over 5 billion “facts” (links)
Tetherless World Constellation, RPI
But, the knowledge graph isn’t all automated
(P. Norvig, WWW 2016, 4/16)
Tetherless World Constellation, RPI
© Peter Mika, 2014.
Tetherless World Constellation, RPI
© Peter Mika, 2014.
Tetherless World Constellation, RPI
© Peter Mika, 2014.
Tetherless World Constellation, RPI
What about image/video archiving
• Despite this growth, still mostly
“experimental” in the archiving
community
– Especially image/video
• Two main impediments
– High cost of annotating collections with
enhanced metadata
– How does doing the annotation increase
the “value” of a collection
• Beyond search
Tetherless World Constellation, RPI
Recent major breakthrough in
automating computer vision
“phase transition” in capabilities of neural networks
w/machine power
Tetherless World Constellation, RPI
“deep learning”
“phase transition” in capabilities of neural networks
w/machine power
Tetherless World Constellation, RPI
Impressive results
Increasingly powerful techniques have yielded
incredible results in the past few years
Tetherless World Constellation, RPI
Moving to Vision and Text Mix
Tetherless World Constellation, RPI
Context issues a problem
Tetherless World Constellation, RPI
And still a long way to go
Tetherless World Constellation, RPI
But recent “action” descriptions doing better
than question answering
A very promising direction for
jumpstarting (semi)-automated
annotation
Tetherless World Constellation, RPI
Moving from search to exploration
(Mei Si, 2017)
Using “narrative” technology to turn our campus
archive into an interactive “story”
Tetherless World Constellation, RPI
At human scales
Cognitive and Immersive Systems Laboratory
http://cisl.rpi.edu
Tetherless World Constellation, RPI
Summary
Semantic Web (Linked Data) has been a small, but growing
presence in the archiving world
- increasing use in library and museum communities
- increasing interest in collection management
- increasing interest in collection sharing
Semantic Technologies are being deployed at scale in the
larger Web world
- still primarily for search (ad match) and social
networking (ad match)
New AI technologies have the potential to overcome some of
the key problems
- reducing the cost of metadata generation/annotation
- making archives “alive” and explorable
Tetherless World Constellation, RPI
Questions?

Digital Archiving, The Semantic Web, and Modern AI

  • 1.
    Tetherless World Constellation,RPI Digital Archiving, The Semantic Web, and Modern AI Jim Hendler Tetherless World Professor of Computer, Web and Cognitive Sciences Director, Institute for Data Exploration and Applications Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler (twitter) Major talks at: http://www.slideshare.net/jahendler
  • 2.
    Tetherless World Constellation,RPI Not going to talk today about issues of AI and society, personal data, umeployment, etc. Wrote a book about those, happy to discuss w/people… Today I will focus on archiving: metadata, knowledge graphs, & new directions in AI (or see slideshare for “jahendler”, TedX, …)
  • 3.
    Tetherless World Constellation,RPI The real challenge • Today would be the 60th birthday of my best friend growing up, Jack Pressman (who passed away 20 years ago) – How could we find a picture/image of him? • Not famous enough for wikipedia • Never made it into a youtube video • Common name (and not likely to have been annotated)
  • 4.
    Tetherless World Constellation,RPI Finding Jack • What would you do? – (Class exercise) • We’d learn what we could about him – We know his age – Where did he grow up • Any of those locations have pictures with people – Where did he go to school • Any famous classmates he may be in picture with – Any major accomplishments • He wrote a well-respected book on the history of medicine (lobotomies) • Essentially, we look for things that “link” him to places, events, objects, times … – This is how finding things in archives happens • How can machines help?
  • 5.
    Tetherless World Constellation,RPI So we annotate images/videos But the information is saved internal to the system, generally for later search, not exposed externally…
  • 6.
    Tetherless World Constellation,RPI C) Semantic Web 2001
  • 7.
    Tetherless World Constellation,RPI On the Web -- links are critical! <a href= URI> HTML Web page Any Web Resource <a href=“http://…”> RDF URI URI URI RDF is like the web!
  • 8.
    Tetherless World Constellation,RPI <mind:Person rdf:id=“Hendler”> <mind:title jobs:Professor> <jobs:placeOfWork http://www.cs.rpi.edu> </mind:Person> DOC1 Hendler DOC1 Mind:title Jobs:placeOfWork Web Page http://www… ProfessorJobs:Mind: Jobs: Links in the data
  • 9.
    Tetherless World Constellation,RPI <mind:Person rdf:id=“Hendler”> owl:sameAs <http://dbpedia.org/page/James_Hendler> DOC2 Hendler Mind:title Jobs:placeOfWork Web Page http://www… Jobs:Professor Asserting Links in the data Dbpedia:Hendler Owl:sameAs Dbpedia:ComputerScientist Dbpedia:occupation
  • 10.
    Tetherless World Constellation,RPI Led to Linked Data experimentation and growth Billions of links in public cloud – across many sectors
  • 11.
    Marking up metadatain images Slide from 2002
  • 12.
    Tetherless World Constellation,RPI Based on RDF Schema/OWL PhotoStuff, ca. 2005-2007
  • 13.
  • 14.
    Tetherless World Constellation,RPI NASA image markup (SemSpace, 2006) Also used by other govt agencies in DoD
  • 15.
    Tetherless World Constellation,RPI Extended to video markup (segments) A particular scene from a movie… The story that ran on NHK television from 0847-0903 on 2001-09-11 (GMT + 9) 2008
  • 16.
    Tetherless World Constellation,RPI Extended to video annotation 2008
  • 17.
    Tetherless World Constellation,RPI Various experiments in museums Lora Aroyo, 2011
  • 18.
    Tetherless World Constellation,RPI BBC Ontologies Many demos 2012 Olympics
  • 19.
    Tetherless World Constellation,RPI Commercial takeoff really started ca. 2012
  • 20.
    Tetherless World Constellation,RPI Google 2012 The Knowledge Graph
  • 21.
    Tetherless World Constellation,RPI Facebook 2012 The Open Graph Protocol
  • 22.
    Tetherless World Constellation,RPI Impressive results Google finds embedded metadata on >30% of its crawl – Guha, 2015 Google “knowledge vault” reported to have over 5 billion “facts” (links)
  • 23.
    Tetherless World Constellation,RPI But, the knowledge graph isn’t all automated (P. Norvig, WWW 2016, 4/16)
  • 24.
    Tetherless World Constellation,RPI © Peter Mika, 2014.
  • 25.
    Tetherless World Constellation,RPI © Peter Mika, 2014.
  • 26.
    Tetherless World Constellation,RPI © Peter Mika, 2014.
  • 27.
    Tetherless World Constellation,RPI What about image/video archiving • Despite this growth, still mostly “experimental” in the archiving community – Especially image/video • Two main impediments – High cost of annotating collections with enhanced metadata – How does doing the annotation increase the “value” of a collection • Beyond search
  • 28.
    Tetherless World Constellation,RPI Recent major breakthrough in automating computer vision “phase transition” in capabilities of neural networks w/machine power
  • 29.
    Tetherless World Constellation,RPI “deep learning” “phase transition” in capabilities of neural networks w/machine power
  • 30.
    Tetherless World Constellation,RPI Impressive results Increasingly powerful techniques have yielded incredible results in the past few years
  • 31.
    Tetherless World Constellation,RPI Moving to Vision and Text Mix
  • 32.
    Tetherless World Constellation,RPI Context issues a problem
  • 33.
    Tetherless World Constellation,RPI And still a long way to go
  • 34.
    Tetherless World Constellation,RPI But recent “action” descriptions doing better than question answering A very promising direction for jumpstarting (semi)-automated annotation
  • 35.
    Tetherless World Constellation,RPI Moving from search to exploration (Mei Si, 2017) Using “narrative” technology to turn our campus archive into an interactive “story”
  • 36.
    Tetherless World Constellation,RPI At human scales Cognitive and Immersive Systems Laboratory http://cisl.rpi.edu
  • 37.
    Tetherless World Constellation,RPI Summary Semantic Web (Linked Data) has been a small, but growing presence in the archiving world - increasing use in library and museum communities - increasing interest in collection management - increasing interest in collection sharing Semantic Technologies are being deployed at scale in the larger Web world - still primarily for search (ad match) and social networking (ad match) New AI technologies have the potential to overcome some of the key problems - reducing the cost of metadata generation/annotation - making archives “alive” and explorable
  • 38.