SlideShare a Scribd company logo
The unreasonable effectiveness of
metadata
Prof. Jim Hendler
Director, Institute of Data Exploration and Applications
Rensselaer Polytechnic Institute (RPI)
USA
@jahendler
Tetherless World Constellation
A new response to
Halevy, Norvig and Pereira (2009)
Tetherless World Constellation
Talks I’m not giving today
https://www.slideshare.net/jahendler
Tetherless World Constellation, RPI
The Semantic Web
2001
Tetherless World Constellation, RPI
The original vision included rich metadata
Slide from 2002
Tetherless World Constellation, RPI
Deployed around the world in libraries
and museums
Lora Aroyo, 2011
Tetherless World Constellation, RPI
BBC Ontologies
Many demos 2012 Olympics
Tetherless World Constellation
So back to Google, 2009
Tetherless World Constellation, RPI
Google 2011
Tetherless World Constellation, RPI
Google 2011
…. Now we hear a lot of talk about the Semantic Web and that’s coming
along, but it’s not quite here yet ….
Tetherless World Constellation
Google 2013: Schema.org
Tetherless World Constellation
Google 2013: Schema.org
…. A longstanding goal of the semantic web initiative is to get webmasters
to make the structured data directly available on the web …. Learning from
these earlier attempts has guided the development of schema.org
Tetherless World Constellation, RPI
From
Google finds embedded metadata on >30% of its crawl – Guha, 2015
Google “knowledge vault” reported to have over 5 billion “facts” (links)
Tetherless World Constellation, RPI
But, the knowledge graph isn’t all automated
(P. Norvig, WWW 2016, 4/16)
Tetherless World Constellation, RPI
© Peter Mika, 2014.
Tetherless World Constellation, RPI
© Peter Mika, 2014.
Tetherless World Constellation, RPI
Facebook 2012
The Open Graph Protocol
Tetherless World Constellation, RPI
So all this gives us tons of linked data…
most of which isn’t used much
Office of Research
DIVE into data
Discover
Integrate
Validate
Explain
Thinking outside the (database)
box…
Metadata lets us DIVE into data
Discover
Find datasets and//or content (incl. outside your own organization)
Integrate
Link the relations using meaningful labels: “reconciliation”
Validate
Provide inputs to modeling and simulation systems
Explore
Develop (multimodal) approaches to turn data into actionable
knowledge
20
Metadata lets us DIVE INTO DATA
Discover
Find datasets and//or content (incl. outside your own organization)
Integrate
Link the relations using meaningful labels: “reconciliation”
Validate
Provide inputs to modeling and simulation systems
Explore
Develop (multimodal) approaches to turn data into actionable
knowledge
21
Tetherless World Constellation, RPI
How does one discover resources in linked data?
With great difficulty
Tetherless World Constellation
Why?
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
It’s not enough just to describe the data elements…
Tetherless World Constellation
Describing a dataset … requires a context
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
1976 Dates
of Birth
Tetherless World Constellation
Describing a dataset … requires a context
How do we capture more of this information?
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
1976 Cancer
Mortality
dates
Tetherless World Constellation
Data
MetaData
Challenge: Linked Metadata
“A Small World of Big Data”
Linked Semantic Web “metadata” documents can be used to link very large databases in
distributed data systems. This leads to orders of magnitude reduction in information flow for
large-scale distributed data problems.
Metadata lets us DIVE INTO DATA
Discover
Find datasets and//or content (incl. outside your own organization)
Integrate
Link the relations using meaningful labels: “reconciliation”
Validate
Provide inputs to modeling and simulation systems
Explore
Develop (multimodal) approaches to turn data into actionable
knowledge
27
Tetherless World Constellation
Linked Data STILL needs semantics
• Data is converted into RDF and SPARQL’ed
– creates huge graph DBs less efficient than the
original DB
• Data is converted from DB into SPARQL
return on demand
– much better, but you must know the mapping
• owl:sameAs is (ab)used to map data to
data
– but that only lets you map equals – which is
an easy mapping to express in many ways
• defining equality correctly in a model theory is much
harder, and thus the abuse, but let’s leave that for
another talk
• OWL was based on a model having to do
with inference and reasoning
• Not ”Big Data” and integration
• Proposed solution: rework OWL
• Add key features for the world of linked data
• Part-Whole
• Procedural Attachment
• Simple temporal relations
• Ability to close (named) graphs
• SEE: https://www.slideshare.net/jahendler/on-beyond-owl-
challenges-for-ontologies-on-the-web
29
Making Metadata work for integration
Metadata lets us DIVE INTO DATA
Discover
Find datasets and//or content (incl. outside your own organization)
Integrate
Link the relations using meaningful labels: “reconciliation”
Validate
Provide inputs to modeling and simulation systems
Explore
Develop (multimodal) approaches to turn data into actionable
knowledge
30
31
Metadata can help to ”Close the loop”
We have to close the loop between the correlational nature of
predictive data analytics (machine learning) and the “causal” models
we need for science (natural & social) and engineering!
Metadata lets us DIVE INTO DATA
Discover
Find datasets and//or content (incl. outside your own organization)
Integrate
Link the relations using meaningful labels: “reconciliation”
Validate
Provide inputs to modeling and simulation systems
Explore
Develop (multimodal) approaches to turn data into actionable
knowledge
32
Tetherless World Constellation, RPI
Data Exploration
Challenge: Can metadata help us explore/understand the data
(structured or unstructured) in archives, collections, datasets,
etc?
34
Agent-based language project (RPI)
• Agent-based guide tells student
about modern Chinese tourist site
(in Chinese)
• Student can ask questions about it
(in Chinese)
• These examples are in English
(text) because presenter doesn’t
speak Chinese.
• Agent-based guide tells user about information on a tourist site
• User can ask questions about it
Using game-based story-telling approach and limited language
35
First demo covered ~500 topics
re: 2008 Beijing Olympics
• These include venues, sports, athletes, special events,
and date information
– topics can be linked along these dimensions (ie. same sport,
same date, same place, etc.)
36
• Example: Michael Phelps
37
Under the hood – a knowledge graph
• Agent reasons about a network of connected entities and chooses the
best next one to discuss based on a combination of factors
– Consistency, Novelty and Ontological relationships
38
• Different settings of these factors can lead to different presentation order.
– Settings can be pre-determined for pedagogical or application specific needs
– Agent can learns users’ interests and preferences over time, adjusting the settings
Under the hood – a knowledge graph
39
Our Approach
• Automate Agent Creation combining:
Information Extraction: Using anew
"living information extraction" technique,
we will be able to create a "never-ending
extractor" which will be pulling from web
documents information about entities and
events, and the relationships between
them.
The new system can work in a dynamic
node, and does not need human annotated
samples for training, but it works best if
there are a number of known relationships
between pages to build off of.
40
Our Approach
• Automate Agent Creation combining:
Information Extraction: Using anew
"living information extraction" technique,
we will be able to create a "never-ending
extractor" which will be pulling from web
documents information about entities and
events, and the relationships between
them.
The new system can work in a dynamic
node, and does not need human annotated
samples for training, but it works best if
there are a number of known relationships
between pages to build off of.
Semantic Web: The Semantic Web
provides a number of known relationships
between pages on the Web in a number of
domains. Using general knowledge
sources, like dbpedia and Yago, and
specialized knowledge sources, like the
data from musicbrainz, the reviews from
Yelp (which have semantic annotations)
and even the Open Graph of Facebook
(which is available in a semantic web
format), provides a jumpstart for the
language extraction.
However, the Semantic Web relates pages,
but doesn't have any sort of
"understanding" of what is on the pages.
41
Our Approach
• Automate Agent Creation combining:
Information Extraction: Using anew
"living information extraction" technique,
we will be able to create a "never-ending
extractor" which will be pulling from web
documents information about entities and
events, and the relationships between
them.
The new system can work in a dynamic
node, and does not need human annotated
samples for training, but it works best if
there are a number of known relationships
between pages to build off of.
Semantic Web: The Semantic Web
provides a number of known relationships
between pages on the Web in a number of
domains. Using general knowledge
sources, like dbpedia and Yago, and
specialized knowledge sources, like the
data from musicbrainz, the reviews from
Yelp (which have semantic annotations)
and even the Open Graph of Facebook
(which is available in a semantic web
format), provides a jumpstart for the
language extraction.
However, the Semantic Web relates pages,
but doesn't have any sort of
"understanding" of what is on the pages.
Cognitive Computing : Cognitive
Computing, can allow us to have a better
way of accessing information about the
entities found on the Web and finding other
information about the same entities using
various kinds of search and language
heuristics. This allows us to have more
organized information, rapidly generated,
about the entities being explored.
However, given a large graph of entities
(even the organized linked-open data cloud
has information about billions of things),
how do we choose what to display next? If
the best we can do is provide links, all of
the above isn't much better than choosing
a page and clicking from there.
42
Our Approach
• Automate Agent Creation combining:
Information Extraction: Using anew
"living information extraction" technique,
we will be able to create a "never-ending
extractor" which will be pulling from web
documents information about entities and
events, and the relationships between
them.
The new system can work in a dynamic
node, and does not need human annotated
samples for training, but it works best if
there are a number of known relationships
between pages to build off of.
Semantic Web: The Semantic Web
provides a number of known relationships
between pages on the Web in a number of
domains. Using general knowledge
sources, like dbpedia and Yago, and
specialized knowledge sources, like the
data from musicbrainz, the reviews from
Yelp (which have semantic annotations)
and even the Open Graph of Facebook
(which is available in a semantic web
format), provides a jumpstart for the
language extraction.
However, the Semantic Web relates pages,
but doesn't have any sort of
"understanding" of what is on the pages.
Cognitive Story-telling Technology: Interactive
storytelling techniques are being explored
to take information in the kind of
"knowledge graph" resulting from the
above, and tailoring the presentation to a
user using storytelling techniques. It is
aimed at presenting the information as an
interesting and meaningful story by taking
into consideration a combination of factors
ranging from topic consistency and novelty,
to learned user interests and even a user’s
emotional reactions. The system can
essentially determine "where to go next"
and what to do there in the organized
information as processed above..
8/3/2017 43
Turning University Archive Metadata into
narrative “tours”
Metadata as “story telling”
Archive Metadata includes:
8/3/2017 44
Cognitive and Immersive Systems Lab
Tetherless World Constellation, RPI
Summary
• The long-standing goal of the “Semantic Web
initiative” is to create metadata
– VIVO understood this from day one
• But more and more we need to start to focus
more on enriching the metadata
– To help provide more semantics for the data linking
• For discovery
• For integration
• And to use the metadata in new and exciting
ways
– Using the Semantics to power new technologies
• Closing the loop between correlation and causation research
• Providing new ways of interacting with the metadata collections
Tetherless World Constellation
Questions?
By the way:
We are exploring a 3rd edition
(different publisher) with more
explicit linked data coverage
+ please send me
comments/thoughts/etc.
(email address at
http://www.cs.rpi.edu/~hendler )

More Related Content

What's hot

Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
James Hendler
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
James Hendler
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)
James Hendler
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
James Hendler
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
James Hendler
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
James Hendler
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
James Hendler
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
James Hendler
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
James Hendler
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
James Hendler
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
James Hendler
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford Hendler
James Hendler
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
Jack Park
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
Matthew Lease
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
Ian Hopkinson
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Matthew Lease
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Jonathan Woodward
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Matthew Lease
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
Virot "Ta" Chiraphadhanakul
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Matthew Lease
 

What's hot (20)

Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford Hendler
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 

Similar to The Unreasonable Effectiveness of Metadata

Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
GIS Colorado
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 Update
James Hendler
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
Artificial Intelligence Institute at UofSC
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
Amit Sheth
 
Linked Open Data and data-driven journalism
Linked Open Data and data-driven journalismLinked Open Data and data-driven journalism
Linked Open Data and data-driven journalism
Pia Jøsendal
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
Asma Adala
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
RameshN849679
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
JoydipChandra2
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
Doug Needham
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
Julie Allinson
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
Alessandro Adamou
 
An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013
National Information Standards Organization (NISO)
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
SMTahsinurRahman1
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
ParCosProject
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
jonblower
 
Slides.ppt
Slides.pptSlides.ppt
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
PayamBarnaghi
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
Stefan Dietze
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
Amit Sheth
 

Similar to The Unreasonable Effectiveness of Metadata (20)

Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
Linked Open Data and data-driven journalism
Linked Open Data and data-driven journalismLinked Open Data and data-driven journalism
Linked Open Data and data-driven journalism
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
 
Slides.ppt
Slides.pptSlides.ppt
Slides.ppt
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 

More from James Hendler

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
James Hendler
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
James Hendler
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
James Hendler
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
James Hendler
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
James Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
James Hendler
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
James Hendler
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
James Hendler
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
James Hendler
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
James Hendler
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
James Hendler
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
James Hendler
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)
James Hendler
 

More from James Hendler (13)

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)
 

Recently uploaded

Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Required Documents for ISO 17021 Certification.PPT
Required Documents for ISO 17021 Certification.PPTRequired Documents for ISO 17021 Certification.PPT
Required Documents for ISO 17021 Certification.PPT
mithun772
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
Tech Guru
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
isBullShit
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
ZachWylie3
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
AimanAthambawa1
 
Smart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdfSmart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdf
Market.us
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 

Recently uploaded (20)

Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Required Documents for ISO 17021 Certification.PPT
Required Documents for ISO 17021 Certification.PPTRequired Documents for ISO 17021 Certification.PPT
Required Documents for ISO 17021 Certification.PPT
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
 
Smart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdfSmart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdf
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 

The Unreasonable Effectiveness of Metadata

  • 1. The unreasonable effectiveness of metadata Prof. Jim Hendler Director, Institute of Data Exploration and Applications Rensselaer Polytechnic Institute (RPI) USA @jahendler
  • 2. Tetherless World Constellation A new response to Halevy, Norvig and Pereira (2009)
  • 3. Tetherless World Constellation Talks I’m not giving today https://www.slideshare.net/jahendler
  • 4. Tetherless World Constellation, RPI The Semantic Web 2001
  • 5. Tetherless World Constellation, RPI The original vision included rich metadata Slide from 2002
  • 6. Tetherless World Constellation, RPI Deployed around the world in libraries and museums Lora Aroyo, 2011
  • 7. Tetherless World Constellation, RPI BBC Ontologies Many demos 2012 Olympics
  • 8. Tetherless World Constellation So back to Google, 2009
  • 10. Tetherless World Constellation, RPI Google 2011 …. Now we hear a lot of talk about the Semantic Web and that’s coming along, but it’s not quite here yet ….
  • 12. Tetherless World Constellation Google 2013: Schema.org …. A longstanding goal of the semantic web initiative is to get webmasters to make the structured data directly available on the web …. Learning from these earlier attempts has guided the development of schema.org
  • 13. Tetherless World Constellation, RPI From Google finds embedded metadata on >30% of its crawl – Guha, 2015 Google “knowledge vault” reported to have over 5 billion “facts” (links)
  • 14. Tetherless World Constellation, RPI But, the knowledge graph isn’t all automated (P. Norvig, WWW 2016, 4/16)
  • 15. Tetherless World Constellation, RPI © Peter Mika, 2014.
  • 16. Tetherless World Constellation, RPI © Peter Mika, 2014.
  • 17. Tetherless World Constellation, RPI Facebook 2012 The Open Graph Protocol
  • 18. Tetherless World Constellation, RPI So all this gives us tons of linked data… most of which isn’t used much
  • 19. Office of Research DIVE into data Discover Integrate Validate Explain Thinking outside the (database) box…
  • 20. Metadata lets us DIVE into data Discover Find datasets and//or content (incl. outside your own organization) Integrate Link the relations using meaningful labels: “reconciliation” Validate Provide inputs to modeling and simulation systems Explore Develop (multimodal) approaches to turn data into actionable knowledge 20
  • 21. Metadata lets us DIVE INTO DATA Discover Find datasets and//or content (incl. outside your own organization) Integrate Link the relations using meaningful labels: “reconciliation” Validate Provide inputs to modeling and simulation systems Explore Develop (multimodal) approaches to turn data into actionable knowledge 21
  • 22. Tetherless World Constellation, RPI How does one discover resources in linked data? With great difficulty
  • 23. Tetherless World Constellation Why? Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date It’s not enough just to describe the data elements…
  • 24. Tetherless World Constellation Describing a dataset … requires a context Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date 1976 Dates of Birth
  • 25. Tetherless World Constellation Describing a dataset … requires a context How do we capture more of this information? Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date 1976 Cancer Mortality dates
  • 26. Tetherless World Constellation Data MetaData Challenge: Linked Metadata “A Small World of Big Data” Linked Semantic Web “metadata” documents can be used to link very large databases in distributed data systems. This leads to orders of magnitude reduction in information flow for large-scale distributed data problems.
  • 27. Metadata lets us DIVE INTO DATA Discover Find datasets and//or content (incl. outside your own organization) Integrate Link the relations using meaningful labels: “reconciliation” Validate Provide inputs to modeling and simulation systems Explore Develop (multimodal) approaches to turn data into actionable knowledge 27
  • 28. Tetherless World Constellation Linked Data STILL needs semantics • Data is converted into RDF and SPARQL’ed – creates huge graph DBs less efficient than the original DB • Data is converted from DB into SPARQL return on demand – much better, but you must know the mapping • owl:sameAs is (ab)used to map data to data – but that only lets you map equals – which is an easy mapping to express in many ways • defining equality correctly in a model theory is much harder, and thus the abuse, but let’s leave that for another talk
  • 29. • OWL was based on a model having to do with inference and reasoning • Not ”Big Data” and integration • Proposed solution: rework OWL • Add key features for the world of linked data • Part-Whole • Procedural Attachment • Simple temporal relations • Ability to close (named) graphs • SEE: https://www.slideshare.net/jahendler/on-beyond-owl- challenges-for-ontologies-on-the-web 29 Making Metadata work for integration
  • 30. Metadata lets us DIVE INTO DATA Discover Find datasets and//or content (incl. outside your own organization) Integrate Link the relations using meaningful labels: “reconciliation” Validate Provide inputs to modeling and simulation systems Explore Develop (multimodal) approaches to turn data into actionable knowledge 30
  • 31. 31 Metadata can help to ”Close the loop” We have to close the loop between the correlational nature of predictive data analytics (machine learning) and the “causal” models we need for science (natural & social) and engineering!
  • 32. Metadata lets us DIVE INTO DATA Discover Find datasets and//or content (incl. outside your own organization) Integrate Link the relations using meaningful labels: “reconciliation” Validate Provide inputs to modeling and simulation systems Explore Develop (multimodal) approaches to turn data into actionable knowledge 32
  • 33. Tetherless World Constellation, RPI Data Exploration Challenge: Can metadata help us explore/understand the data (structured or unstructured) in archives, collections, datasets, etc?
  • 34. 34 Agent-based language project (RPI) • Agent-based guide tells student about modern Chinese tourist site (in Chinese) • Student can ask questions about it (in Chinese) • These examples are in English (text) because presenter doesn’t speak Chinese. • Agent-based guide tells user about information on a tourist site • User can ask questions about it Using game-based story-telling approach and limited language
  • 35. 35 First demo covered ~500 topics re: 2008 Beijing Olympics • These include venues, sports, athletes, special events, and date information – topics can be linked along these dimensions (ie. same sport, same date, same place, etc.)
  • 37. 37 Under the hood – a knowledge graph • Agent reasons about a network of connected entities and chooses the best next one to discuss based on a combination of factors – Consistency, Novelty and Ontological relationships
  • 38. 38 • Different settings of these factors can lead to different presentation order. – Settings can be pre-determined for pedagogical or application specific needs – Agent can learns users’ interests and preferences over time, adjusting the settings Under the hood – a knowledge graph
  • 39. 39 Our Approach • Automate Agent Creation combining: Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them. The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of.
  • 40. 40 Our Approach • Automate Agent Creation combining: Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them. The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of. Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages.
  • 41. 41 Our Approach • Automate Agent Creation combining: Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them. The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of. Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages. Cognitive Computing : Cognitive Computing, can allow us to have a better way of accessing information about the entities found on the Web and finding other information about the same entities using various kinds of search and language heuristics. This allows us to have more organized information, rapidly generated, about the entities being explored. However, given a large graph of entities (even the organized linked-open data cloud has information about billions of things), how do we choose what to display next? If the best we can do is provide links, all of the above isn't much better than choosing a page and clicking from there.
  • 42. 42 Our Approach • Automate Agent Creation combining: Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them. The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of. Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages. Cognitive Story-telling Technology: Interactive storytelling techniques are being explored to take information in the kind of "knowledge graph" resulting from the above, and tailoring the presentation to a user using storytelling techniques. It is aimed at presenting the information as an interesting and meaningful story by taking into consideration a combination of factors ranging from topic consistency and novelty, to learned user interests and even a user’s emotional reactions. The system can essentially determine "where to go next" and what to do there in the organized information as processed above..
  • 43. 8/3/2017 43 Turning University Archive Metadata into narrative “tours” Metadata as “story telling” Archive Metadata includes:
  • 44. 8/3/2017 44 Cognitive and Immersive Systems Lab
  • 45. Tetherless World Constellation, RPI Summary • The long-standing goal of the “Semantic Web initiative” is to create metadata – VIVO understood this from day one • But more and more we need to start to focus more on enriching the metadata – To help provide more semantics for the data linking • For discovery • For integration • And to use the metadata in new and exciting ways – Using the Semantics to power new technologies • Closing the loop between correlation and causation research • Providing new ways of interacting with the metadata collections
  • 46. Tetherless World Constellation Questions? By the way: We are exploring a 3rd edition (different publisher) with more explicit linked data coverage + please send me comments/thoughts/etc. (email address at http://www.cs.rpi.edu/~hendler )