SlideShare a Scribd company logo
1 of 114
JIST2014	
  Tutorial	
  on	
  	
  
Linked	
  Data	
  and	
  Knowledge	
  Graphs	
  
-­‐	
  ConstrucAng	
  and	
  Understanding	
  Knowledge	
  Graphs	
  	
  
Presenter	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
	
  
Contributors	
  
Honghan	
  Wu	
  (University	
  of	
  	
  Aberdeen)	
  
Yuan	
  Ren	
  (University	
  of	
  	
  Aberdeen)	
  
Panos	
  Alexopoulos	
  (iSOCO)	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Agenda	
  	
  	
  
Overview	
  &	
  ApplicaAons	
  
1:00pm	
  –	
  
1:20pm	
  
1:35pm	
  –	
  
1:45pm	
  
The	
  Current	
  Status	
  of	
  Linked	
  Data:	
  the	
  Good,	
  the	
  Bad	
  and	
  
the	
  Ugly	
  
1:20pm	
  –	
  
1:35pm	
  
Example	
  Linked	
  Data	
  Knowledge	
  Repositories	
  	
  
PART	
  I	
  LINKED	
  DATA	
  &	
  KNOWLEDGE	
  GRAPHS	
  
1:45pm	
  –	
  
2:00pm	
  
Research	
  Challenges	
  
2
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Agenda	
  	
  	
  
ConstrucAng	
  Knowledge	
  Graphs	
  
2:00pm	
  –	
  
3:05pm	
  
3:05pm	
  –	
  
3:40pm	
  
Understanding	
  Knowledge	
  Graphs	
  
2:30pm	
  –	
  
2:45pm	
  
Coffee	
  Break	
  
PART	
  II	
  METHODS	
  &	
  TECHNIQUES	
  
3:40pm	
  –	
  
3:45pm	
  
Outlook	
  
3
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Overview	
  
•  ApplicaLons	
  
•  Linked	
  Data	
  Knowledge	
  Repositories	
  
•  Knowledge	
  Graph	
  on	
  Linked	
  Data	
  
•  Research	
  Challenges	
  
PART	
  I	
  	
  
LINKED	
  DATA	
  &	
  KNOWLEDGE	
  GRAPHS	
  
4
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Knowledge	
  
•  What	
  is	
  knowledge?	
  
•  Something	
  is	
  known	
  
•  Structured	
  informaLon	
  	
  
•  About	
  certain	
  aspects	
  of	
  
the	
  (real)	
  world	
  	
  
5
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Semantic Networks
A semantic network is a graph	
  
structure	
  for	
  represenLng	
  
knowledge	
  in	
  paSerns	
  of	
  
interconnected	
  nodes	
  and	
  arcs.
•  with nodes representing objects,
concepts, or situations, and
•  arcs representing relationships
6
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
RDF: Standard for Directed Labelled
Graph KBs for the Web
•  RDF is
•  a modern version of semantic network,
with formal syntax and semantics
•  a	
  standard	
  model	
  for	
  data	
  interchange	
  on	
  the	
  
Web
•  RDF statements: Subject-property-value triples
[my-­‐chair	
  colour	
  tan	
  .]	
  
[my-­‐chair	
  rdf:type	
  chair	
  .]	
  
[chair	
  rdfs:subClassOf	
  furniture	
  .]	
  
7
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Linked	
  Data	
  and	
  Knowledge	
  Graphs	
  
• Linked	
  Data	
  refers	
  to	
  (RDF)	
  data	
  published	
  on	
  
the	
  web	
  
•  with	
  its	
  meaning	
  explicitly	
  defined	
  with	
  ontological	
  
(OWL)	
  vocabulary	
  
•  can	
  be	
  inter-­‐linked	
  with	
  external	
  datasets	
  
• A	
  knowledge	
  graph	
  is	
  a	
  set	
  of	
  interconnected	
  
typed	
  enLLes	
  and	
  their	
  aSributes	
  
8
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Knowledge	
  Graph	
  (KG)	
  Services	
  and	
  
Related	
  Research	
  Problems	
  
•  KG	
  construcLon:	
  how	
  to	
  construct	
  high	
  quality	
  
knowledge	
  graphs?	
  
•  Knowledge	
  aquaciLon	
  	
  
•  Knowledge	
  evaluaLon	
  
•  KG	
  understanding:	
  how	
  to	
  make	
  it	
  easier	
  to	
  access	
  and	
  
reuse	
  knowledge?	
  
•  for	
  end	
  users	
  
•  for	
  data	
  engineers	
  
•  KG	
  reasoning:	
  how	
  to	
  bridge	
  the	
  gap	
  between	
  
vocabulary	
  used	
  in	
  the	
  graphs	
  and	
  those	
  used	
  in	
  qeuries	
  
•  Scalability	
  	
  
•  Efficiency	
  
9
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
APPLICATIONS	
  OF	
  	
  
KNOWLEDGE	
  GRAPHS	
  
Summary of entities, Faceted fact, From best to list, EntityAssociations,
Structured Queries, and QuestionAnswering
10
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
ENTITY	
  UNDERSTANDING:	
  
THINGS,	
  NOT	
  STRINGS	
  
11
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
What	
  is	
  it?	
  (EnAty	
  Understanding)	
  
12
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
FACETED	
  FACT:	
  
GETTING	
  THE	
  VALUE	
  OF	
  SOME	
  
ATTRIBUTE	
  
13
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
What	
  is	
  the	
  Ame	
  there?	
  (Faceted	
  Fact)	
  
14
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
FROM	
  BEST	
  TO	
  LIST:	
  
NOT	
  ONLY	
  THE	
  BEST	
  
15
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Give	
  a	
  List	
  instead	
  of	
  Best	
  
16
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
ENTITY	
  ASSOCIATION:	
  
SHOW	
  THE	
  CONNECTIONS	
  
17
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
How	
  are	
  they	
  connected?	
  (EnAty	
  AssociaAon)	
  
Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations
between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC
2014, pp. 422–437.
http://ws.nju.edu.cn/explass/
18
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
STRUCTURED	
  QUERIES:	
  
EVEN	
  WHEN	
  THE	
  INPUTS	
  ARE	
  
KEYWORDS	
  
19
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
From	
  keywords	
  to	
  structural	
  queries	
  
Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu.
Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of
ESWC 2008, pp 584-598.
“Capin SVG”
find specifications about“SVG”whose author’s name is“Capin”
20
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
QUESTION	
  ANSWERING:	
  
	
  COMPUTE	
  ANSWERS	
  WITH	
  THE	
  KG	
  
21
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
QuesAon	
  Answering	
  
Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga
Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question
answering over RDF data." In Proceedings of the 21st international conference
onWorldWideWeb, pp. 639-648.ACM, 2012.
“films starring Brad Pitt”
22
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
SAMPLE	
  LINKED	
  DATA	
  KNOWLEDGE	
  
REPOSITORIES	
  
DBpedia,WikiData, GoodRelation
23
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
DBpedia	
  
•  A	
  crowd-­‐sourced	
  community	
  effort	
  to	
  extract	
  
structured	
  informaLon	
  from	
  Wikipedia	
  
•  allows	
  to	
  ask	
  structured	
  queries	
  against	
  
Wikipedia	
  
•  and	
  to	
  link	
  the	
  different	
  data	
  sets	
  on	
  the	
  Web	
  
to	
  Wikipedia	
  data.	
  	
  
24
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
DBpedia	
  –	
  the	
  content	
  
Entities and their attributes from
Wikipedia
infobox templates, categorisation
information, images, geo-
coordinates, etc
Classification Schemas
•  Wikipedia Categories are represented using the SKOS
vocabulary and DCMI terms.
•  YAGO Classification is derived from the Wikipedia category
system using Word Net.
•  Word Net Synset Links were generated by manually relating
Wikipedia infobox templates and Word Net synsets
DBpedia 2014 release consists of 3 billion
RDF triples
25
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
DBpedia	
  –	
  services	
  
http://dbpedia.org/sparql
Query Builders (e.g. Leipzig query builder at
http://querybuilder.dbpedia.org)
Public Faceted Web
Service Interface
Dump Downloads
•  DBpedia dumps in 125 languages at
DBpedia download server.
•  DBpedia Ontology
26
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
DBpedia	
  –	
  use	
  cases	
  
Nucleus for the Web of Data
Revolutionise Access to Wikipedia information
“Give me all cities in New Jersey with more than 10,000 inhabitants”
27
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
WikiData	
  
•  A	
  collaboraAvely	
  edited	
  knowledge	
  base	
  
operated	
  by	
  the	
  Wikimedia	
  FoundaLon.	
  	
  	
  
•  Can	
  be	
  read	
  and	
  edited	
  by	
  both	
  humans	
  and	
  
machines.	
  
•  Acts	
  as	
  central	
  storage	
  for	
  the	
  structured	
  
data	
  of	
  its	
  Wikimedia	
  sister	
  projects	
  including	
  
Wikipedia,	
  Wikivoyage,	
  Wikisource,	
  and	
  
others	
  
28
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
WikiData	
  –	
  the	
  content	
  
Wikidata is a document-
oriented, focused around
topics.
•  Information is added to
items by creating
statements (key-value
pairs)
29
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
WikiData	
  -­‐	
  to	
  Linked	
  Data	
  Web	
  (1)	
  
Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Exporting Statements as Triples
•  Faithful representations: with additional quantifiers and references
•  Simplified representations: without additional quantifiers and references
30
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
WikiData	
  -­‐	
  to	
  Linked	
  Data	
  Web	
  (2)	
  
Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Extracting Schema Information from Wikidata
•  instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf
•  constraints for the use of properties → OWL Axioms
31
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
WikiData	
  –	
  use	
  case	
  &	
  data	
  access	
  
Use Cases
•  Information about the sources helps support the
notion of verifiability
•  Collecting structured data: allow easy reuse of that
data
•  Support for Wikimedia projects: reducing the
workload in Wikipedia and increasing its quality
•  Support well beyond that. Everyone can use
Wikidata
Accessing the data
•  Mediawiki Lua Scribunto interface
•  Wikibase/API
•  RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/
32
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
GoodRelaAons	
  
GoodRelations is a lightweight ontology for annotating offerings
and other aspects of e-commerce on the Web.
[Slide	
  credit:	
  	
  MarLn	
  Hepp]	
  
33
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
GoodRelaAons	
  –	
  use	
  cases	
  
[Slide	
  credit:	
  	
  MarLn	
  Hepp]	
  
34
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
GoodRelaAons	
  –	
  use	
  cases(2)	
  
35
Google, Bing, Yahoo, and Yandex will improve the
rendering of your page directly in the search results
Rich Snippets:Search engines use your markup  to augment the preview of your site
Targeted Searching:profile and preferences of the person behind the query
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
GoodRelaAons	
  –	
  who	
  are	
  using	
  
36
Search Engines
and 10,000+
small and large
shops
Publishers
Software
OpenLink (Virtuoso)
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
CURRENT	
  STATUS	
  OF	
  ONLINE	
  
LINKED	
  DATA	
  
The good, the bad and the ugly
37
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  Good	
  
Ontology	
  Mapping	
  
Data	
  linkage	
  
RDF	
  /	
  OWL	
   Querying	
  and	
  reasoning	
  techniques	
  
-­‐ 	
  Flexible	
  	
  schema	
  sebng	
  
-­‐ 	
  schemaless	
  -­‐>	
  simple	
  
schema	
  -­‐>	
  rich	
  schema	
  
-­‐	
  Universal	
  Unique	
  ID	
  for	
  data	
  enLLes:	
  URI	
  
-­‐	
  Shared	
  vocabularies	
  
-­‐	
  Schema	
  mapping	
  
-­‐ 	
  Instance	
  mapping	
  
-­‐ 	
  SPARQL	
  entailment	
  regimes	
  
-­‐ 	
  DisLrbuted	
  SPARQL	
  endpoints	
  
38
Flexible	
  linked	
  data	
  eco-­‐system	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  Good	
  
• Flexible	
  linked	
  data	
  eco-­‐system	
  
• FaciliLes	
  of	
  sharing	
  and	
  linking	
  knowledge	
  in	
  
open	
  environment	
  
• Knowledge	
  representaLon:	
  various	
  levels	
  of	
  
expressive	
  power	
  
• Services,	
  tools,	
  and	
  approaches	
  for	
  knowledge	
  
generaLon,	
  understanding,	
  and	
  consuming	
  
• Interlinked	
  knowledge	
  repositories	
  across	
  
various	
  domains	
  	
  
39
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  Bad	
  
• Knowledge	
  Quality	
  (errors,	
  provenance,	
  
quanLfier,	
  freshness…)	
  
• Data	
  protecLon	
  (license,	
  access	
  control)	
  
• Data	
  business	
  model	
  
40
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  Ugly	
  
• Excel	
  in	
  knowledge	
  representaLon	
  
•  But,	
  a	
  large	
  amount	
  of	
  datasets	
  missing	
  schema	
  
informaLon	
  	
  	
  
• RDF	
  is	
  triple	
  based	
  model	
  
•  But,	
  it	
  is	
  hard	
  and	
  Lme-­‐consuming	
  (even	
  for	
  SW	
  
geeks)	
  to	
  understand	
  a	
  RDF	
  knowledge	
  repository	
  
41
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
RESEARCH	
  CHALLENGES	
  
42
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Research	
  Challenges	
  
•  KG Construction
•  Ontology / Schema Construction
•  Data Lifting
•  Quality Evaluation
•  Understanding KG
•  User Understanding
•  Data Understanding
•  Dynamic Knowledge in KG
•  Stream Data / Prediction
•  Belief Revision
•  Intelligent Services for KG
•  Ontology Reasoning (see my tutorial at ISWC2014)
•  Problem Solving / Workflow
43
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Incompleteness of data: is the constructed schema
generic enough to accommodate new data?
•  Inconsistency of data: what if data conflicts with each
other?
e.g. Birthdate of people: some people may not have
birthdate asserted in the dataset, should the schema
specify that each people has a birthdate? Some people
may have different birthdates asserted in different
datasets, should the schema specify that birthdate is
unique?
Challenges	
  in	
  AutomaAc	
  ConstrucAon	
  
44
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Expertise of ontology engineers: do the engineers
have sufficient understanding and experience of
ontology technologies (RDF(S), OWL, SPARQL,
RIF, etc…)
•  Workload of ontology engineers: how much time
does it take to manually construct a large
ontology? E.g. SNOMED CT has about 400,000
concepts
•  Collaboration: when multiple ontology engineers
work together, how to make sure they have
consistent understanding of the ontology?
Challenges	
  in	
  Mannual	
  ConstrucAon	
  
45
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Requirement and evaluation: how to specify the
requirement of ontology construction and test if the
requirements have been fulfilled?
•  Expressiveness v.s. Efficiency: which knowledge
representation should we use? Is it sufficient to
describe the domain? Is there efficient reasoning
and query answering mechanism and system
available?
•  Ontology reuse: do we have to construct everything
from scratch? Is there ontology available covering
partially the domain?
Challenges	
  for	
  both	
  AutomaAc/Mannual	
  
ConstrucAons	
  
46
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Key challenges:
•  Entity identification: certain entities can be
hard to identify, e.g. movie titles
•  AVP (attribute-value pair) identification: an
entity, attribute and its value may scattered
across the text or dataset, making it hard to
establish the relation
Challenge	
  in	
  Data	
  Liding	
  
Data Lifting enrichs unstructured data with structural
annotations, therefore extract the entities and their
relations, properties for knowledge graph
47
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Challenge	
  in	
  EnAty	
  IdenAficaAon	
  	
  
•  There different ways to identify entities: e.g. “The
President of the U.S.” and “Barak Obama”
•  The same name can be referring to different
entities
•  People may use acronym or abbreviation for
entities: e.g. “K-Drive” is the acronym for
“Knowledge-driven Data Exploitation” project
instead of the drive labelled K in my computer.
•  Natural language text may have typos, values
may use different notations
48
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Users are unfamiliar with the content of knowledge
graphs:
•  What is the vocabulary?
•  What is described by the knowledge graph?
•  How is the content organised?
•  How is it connected to the other datasets I
have?
•  Users do not know how to exploit the knowledge
graph:
•  Which query can I ask this knowledge graph?
•  Which query can be answered with this
knowledge graph?
Challenge	
  in	
  Data	
  Understanding	
  
49
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Challenge	
  in	
  Knowledge	
  Dynamics	
  
•  Validity of knowledge: is a piece of information permanent or
temporary?
•  Representation: e.g. to represent the temporal dependency of
knowledge, e.g. “George W. Bush was the president of the U.S.
until Barak Obama became the president.”
•  Updating of knowledge graph: When and how do we retract a
previously unknown mistake from the knowledge graph? Which
knowledge should become obsolete after the current update?
•  Querying: to query w.r.t. the temporal properties of knowledge,
e.g. “Who was the last president of the U.S.?”
•  Predicting the dynamics: which change is likely to occur given
the history of the knowledge graph?
50
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Key challenges
•  Efficiency of the services: knowledge graphs are
usually accessed by multiple users in real-time.
Efficiency is crucial to the quality of service.
•  Scalability of the services: knowledge graphs are
usually of large scale while basic reasoning services,
e.g. transitive closure, can already consume large
amount of time and resources.
Challenge	
  in	
  Intelligent	
  Services	
  
The large amount of information and their inter-connection in
a knowledge graph can be used to provide intelligent
services; e.g. reasoning can be used to discover hidden
relations in a knowledge graph
51
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Agenda	
  	
  	
  
ConstrucAng	
  Knowledge	
  Graphs	
  
2:00pm	
  –	
  
3:05pm	
  
3:05pm	
  –	
  
3:40pm	
  
Understanding	
  Knowledge	
  Graphs	
  
2:30pm	
  –	
  
2:45pm	
  
Coffee	
  Break	
  
PART	
  II	
  METHODS	
  &	
  TECHNIQUES	
  
3:40pm	
  –	
  
3:45pm	
  
Outlook	
  
52
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Test Driven Ontology Construction
•  Methodology
•  A Protégé plug-in
•  Handling Entity DisambiguaLon	
  
•  Approach
•  Some evaluation result
•  Briding Requirements and Authoring Tests
•  Competency Questions as Informal Requirement
Specification
•  Some evaluation results
CONSTRUCTING	
  KNOWLEDGE	
  GRAPHS	
  
53
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Uschold	
  &	
  King’s	
  (1995)	
  Methodology	
  on	
  
Ontology	
  ConstrucAon	
  
•  Key steps: capturing, coding, integrating and
evaluating/testing
•  Ontology evaluation/testing:
•  to make a technical judgment of the ontologies
•  w.r.t. to a frame of reference
•  A frame of reference can be:
•  requirement specifications
•  competency questions
•  or, the real world
54
54
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Ontology and Tests
•  Uschold & King’s methodology
•  Test ontology after axioms are written
•  Test-driven ontology authoring
•  Write authoring tests before writing axioms
•  Writing authoring tests before axioms does not take any
more efforts than writing them after axioms
•  Force authors to think about requirements before
writing axioms
•  Writing authoring tests first will help authors to detect
and remove errors sooner
•  Understand how good is a(n) existing/reused ontology
55
55
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Gruninger	
  &	
  Fox’s	
  (1995)	
  Methodology	
  
Key steps:
1.  Motivating Scenarios
2.  Informal competency questions
3.  FOL terminology (classes, properties, objects)
4.  Formal competency questions (2 -> 4?)
5.  FOL axioms
6.  Completeness theorem (defining the conditions
under which the solutions to the questions are
complete)
56
56
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  METHONTOLOGY	
  (2003)	
  Methodology	
  
•  Key steps:
1.  specification of requirements
2.  terminology with tabular and/or graph notations
3.  formalisation with logic based ontology language
4.  maintenance (including evaluation/testing)
•  Ontology evaluation/testing:
•  checking consistency, completeness, redundancy
57
57
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  DKAP	
  (2007)	
  Methodology	
  
•  Key steps:
1.  determine the domain and scope
2.  check availability of existing ontologies
3.  collect and analyse data for knowledge extraction
4.  develop initial ontology
5.  refine and validate ontology
•  Ontology Validation/testing:
•  consistency and accuracy checking
58
58
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
LimitaAons	
  of	
  ExisAng	
  Methodologies	
  
•  Methodology level:
•  Lack of details about the transitions
• from requirement to tests
• from requirements to terminology
• form terminology to axioms
•  Tool level:
•  lack of tools to guide the above transitions
59
59
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
An approach to Test-­‐Driven	
  Ontology	
  
Authoring	
  (presented	
  in	
  an	
  invited	
  talk	
  at	
  
BMIR,	
  Stanford	
  University,	
  June	
  2013)
•  An ontology contains not only OWL files, but
also a test suit
•  A test suit contains a set of tests as SPARQL
1.1 queries
•  not all requirements can be represented in
SPARQL 1.1 though
•  Ontology reuse
•  check the associated test suit before ontology
reuse, to better understand the original intention
•  Collaborative ontology authoring
•  all authors agree upon a common test suit
•  each author can have their an extra test suit locally
60
60
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Authoring	
  Tests	
  
Test	
  Suite	
  
Test	
  1	
   Test	
  2	
   …	
  
Query	
   Expected	
  
results	
  
Ontology	
  
Actual	
  
results	
  
Pass/
fail	
  reasoner	
  
SPARQL	
  
1.1	
  
61
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
A	
  Protégé	
  Plug-­‐in	
  for	
  Authoring	
  Tests	
  
(based	
  on	
  the	
  TrOWL	
  reasoner)	
  
62
62
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Clicking	
  on	
  a	
  test	
  to	
  
show	
  the	
  expected	
  
and	
  actual	
  results	
  
Loading	
  the	
  Manifest	
  File	
  
•  A	
  manifest	
  file	
  
specifies	
  queries	
  and	
  
expected	
  results	
  
•  Running	
  reasoner	
  to	
  
get	
  the	
  results	
  for	
  
each	
  test	
  
63
63
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Compute	
  JusAficaAons	
  for	
  Errors	
  Related	
  to	
  
Failed	
  Tests	
  
• with	
  the	
  jusLficaLon	
  plug-­‐in	
  (and	
  reasoners,	
  
such	
  as	
  TrOWL)	
  
64
64
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Modify	
  the	
  Ontology	
  	
  
	
  
• so	
  that	
  CheeseTopping	
  no	
  longer	
  disjoint	
  with	
  
VegetableTopping	
  
65
65
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Key	
  Issue	
  (to	
  be	
  revisited	
  ader	
  the	
  EnAty	
  
DisambiguaAon	
  part)	
  
• Understanding	
  the	
  intension	
  of	
  ontology	
  authors	
  
•  How	
  to	
  generate	
  authoring	
  tests?	
  
•  How	
  to	
  judge	
  the	
  quality	
  of	
  the	
  authoring	
  tests?	
  
66
66
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
EnAty	
  RecogniAon	
  and	
  DisambiguaAon	
  	
  	
  
•  Challenge	
  revisit:	
  
•  There different ways to identify entities: e.g. “The President of the
U.S.” and “Barak Obama”
•  The same name can be referring to different entities
•  Contextual	
  hypothesis	
  used in many existing
aproaches
•  terms	
  with	
  similar	
  meanings	
  are	
  oien	
  used	
  in	
  similar	
  
contexts	
  
•  The	
  role	
  of	
  these	
  contexts	
  is	
  typically	
  played	
  by	
  already	
  
annotated	
  documents	
  (e.g.	
  wikipedia	
  arLcles)	
  which	
  
are	
  used	
  to	
  train	
  term	
  classifiers
67
67
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
AlternaAve	
  Context:	
  Evidence	
  Model	
  
•  	
  Idea: semantic entities that may serve as disambiguation
evidence for the scenario’s target entities
68
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Evidence	
  Model	
  ConstrucAon	
  (Manual)	
  
•  The identification of target concepts whose instances we wish to
disambiguate (e.g. locations)
•  The determination related concepts whose instances may serve
as contextual disambiguation evidence.
• For example, in texts that describe historical events, some
concepts whose instances may act as location evidence
are related locations, historical events, and historical
groups and persons.
•  The identification, for each pair of evidence and target concept, of
the relation paths that links them.
69
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Evidence-­‐Target	
  Paths	
  
70
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Term	
  ExtracAon	
  (AutomaAc)	
  
Extraction is performed with Knowledge Tagger (from
iSOCO) based on GATE.
71
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
EvaluaAon	
  Results:	
  Football	
  Match	
  Scenario	
  
•  50 texts describing football matches.
•  E.g. “It's the 70th minute of the game and after a
magnificent pass by Pedro, Messi managed to beat
Claudio Bravo. Barcelona now leads 1-0 against Real."
72
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
EvaluaAon	
  Results:	
  Military	
  Conflict	
  Scenario	
  
•  50 historical texts describing military conflicts.
•  E.g. “The Siege of Augusta was a significant battle of the
American Revolution. Fought for control of Fort
Cornwallis, a British fort near Augusta, the battle was a
major victory for the Patriot forces of Lighthorse Harry Lee
and a stunning reverse to the British and Loyalist forces in
the South”.
73
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Future	
  Work	
  	
  	
  
•  Fully automated construction of the disambiguation evidence model.
•  Challenge here is how to automatically identify the text’s domain/
topic.
•  Combination with statistical methods for cases where available
domain semantic information is incomplete.
•  Challenge here is how to select the optimal ratio of ontological
evidence v.s. statistical one.
•  Development of tool to enable users to dynamically build such models
out of existing semantic data and use them for disambiguation
purposes
74
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Issues	
  in	
  Test-­‐Driven	
  Ontology	
  Authoring
1.  How to generate
tests
2.  How to judge the
quality of tests
•  why they are relevant
•  how to provide the
correct expected
answers
75
75
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Requirement	
  Driven?	
  
• How	
  about	
  starLng	
  from	
  requirements	
  instead	
  
of	
  tests?	
  	
  
Ontology	
  
Authoring	
  
Requirements	
  
Ontology	
  
Authoring	
  
Tests	
  
Test	
  Results	
  
76
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Requirement-­‐Driven	
  Ontology	
  Authoring	
  	
  
[Ren	
  et.	
  al,	
  2014]
•  Key questions
•  RQ1: what forms of requirements should we consider
•  RQ2: how to generate authoring tests from requirements
77
77
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Competency	
  QuesAon	
  	
  	
  
•  QuesLons	
  that	
  people	
  
expect	
  the	
  constructed	
  
ontologies	
  to	
  answer	
  
•  Useful	
  for	
  novice	
  users	
  
	
  	
  
•  in	
  natural	
  languages	
  
•  about	
  domain	
  knowledge	
  
•  requires	
  liSle	
  
understanding	
  of	
  
ontology	
  technologies	
  
•  A	
  typical	
  CQ:	
  Which	
  pizza	
  has	
  some	
  cheese	
  topping?	
  	
  
78
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
	
  
RQ1: what forms of requirements should we
consider	
  
RQ1’:	
  How	
  are	
  CQs	
  formulated?	
  
Competency	
  QuesAons	
  (CQs)	
  can	
  be	
  
regarded	
  as	
  a	
  funcAonal	
  requirement	
  of	
  
the	
  ontology	
  
	
  
79
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Key	
  Idea	
  1:	
  IdenAficaAon	
  of	
  CQ	
  Paoerns	
  
•  A	
  typical	
  CQ:	
  Which	
  pizza	
  has	
  some	
  cheese	
  topping?	
  	
  
•  Hypothesis:	
  CQs	
  usually	
  have	
  
clear	
  syntacLc	
  paSerns	
  
•  Features	
  and	
  elements	
  can	
  
be	
  extracted	
   Feature:	
  Type	
  of	
  quesLon	
  
Element:	
  Class	
  expression	
  CE1	
  
Element:	
  Object	
  property	
  
expressions	
  OPE	
  
Feature:	
  Binary	
  predicate	
  
Element:	
  Class	
  expression	
  CE2	
  
CE1	
   OPE	
   CE2	
  
80
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Result	
  1:	
  A	
  Feature-­‐based	
  Framework	
  for	
  CQ	
  
FormulaAon	
  
•  Based	
  on	
  CQs	
  collected	
  from	
  the	
  Soiware	
  Ontology	
  Project	
  (75	
  CQs)	
  
and	
  Manchester	
  OWL	
  Workshops	
  (70	
  CQs)	
  
•  Primary	
  features	
  -­‐>	
  CQ	
  Archetypes	
  
•  Secondary	
  features	
  -­‐>	
  CQ	
  Subtypes	
  
Feature	
  
Primary	
  Feature	
   Secondary	
  Feature	
  
QuesLon	
  
Type	
  
Element	
  
Visibility	
  
SelecLon	
  
Boolean	
  
CounLng	
  
Explicit	
  
Implicit	
  
Predicate	
  
Arity	
  
Unary	
  
Binary	
  
N-­‐ary	
  
RelaLon	
  
Type	
  
Object	
  
Datatype	
  
Modifier	
  
QuanLty	
  
Numeric	
  
Domain	
  
Independent	
  
Element	
  
SpaLal	
  
Temporal	
  
QuesLon	
  
Polarity	
  
PosiLve	
  
NegaLve	
  
81
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Result	
  2:	
  Archetypes	
  of	
  CQ	
  Paoerns	
  
82
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Answerability	
  of	
  CQs	
  
•  ExisLng	
  work	
  focused	
  on	
  
answering	
  CQs	
  directly	
  
•  But	
  is	
  the	
  answer	
  
meaningful?	
  
•  The	
  ability	
  to	
  answer	
  CQs	
  
meaningfully	
  can	
  be	
  
regarded	
  as	
  a	
  funcLonal	
  
requirement	
  of	
  the	
  ontology	
  
•  What	
  if	
  the	
  answer	
  is	
  an	
  
empty	
  set	
  
•  Possible	
  scenarios	
  
•  Pizza	
  does	
  not	
  exist	
  
•  Cheese	
  topping	
  does	
  not	
  exist	
  
•  Pizzas	
  are	
  not	
  allowed	
  to	
  have	
  
cheese	
  topping	
  
•  The	
  ontology	
  has	
  not	
  been	
  
populated	
  with	
  any	
  cheesy	
  
pizza	
  yet	
  
•  …	
  
•  A	
  typical	
  CQ:	
  Which	
  pizza	
  has	
  some	
  cheese	
  topping?	
  	
  
83
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
	
  
RQ2: how to generate authoring tests from
requirements
RQ2’:	
  How	
  can	
  we	
  automaLcally	
  test	
  whether	
  a	
  
CQ	
  can	
  be	
  meaningfully	
  answered?	
  
84
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Key	
  Idea	
  2:	
  PresupposiAons	
  of	
  CQ	
  
•  A	
  CQ	
  comes	
  with	
  certain	
  
presupposi(ons	
  
•  Some	
  condi(ons	
  the	
  speakers	
  
assume	
  to	
  be	
  met	
  
•  A	
  CQ	
  can	
  be	
  meaningfully	
  
answered	
  only	
  when	
  its	
  
presupposiLons	
  are	
  saLsfied	
  
•  Classes	
  Pizza,	
  CheeseTopping	
  
should	
  occur	
  in	
  the	
  ontology	
  
•  Property	
  has(Topping)	
  should	
  
occur	
  in	
  the	
  ontology	
  
•  The	
  ontology	
  should	
  allow	
  
Pizza	
  to	
  have	
  CheeseTopping	
  
•  The	
  ontology	
  should	
  also	
  
allow	
  Pizza	
  to	
  not	
  have	
  
CheeseTopping	
  
•  A	
  typical	
  CQ:	
  Which	
  pizza	
  has	
  some	
  cheese	
  topping?	
  	
  
85
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
CQs	
  and	
  Authoring	
  Tests	
  
•  A	
  typical	
  CQ:	
  Which	
  pizza	
  has	
  some	
  cheese	
  topping?	
  	
  
•  SaLsfiability	
  of	
  CQ	
  
presupposiLons	
  can	
  be	
  
verified	
  by	
  authoring	
  tests	
  
generated	
  based	
  on	
  its	
  
features	
  and	
  elements	
  
•  Classes	
  Pizza,	
  CheeseTopping	
  
should	
  occur	
  in	
  the	
  ontology	
  
•  [CE1],	
  [CE2]	
  should	
  both	
  occur	
  in	
  the	
  
class	
  vocabulary	
  
•  Property	
  has(Topping)	
  should	
  occur	
  
in	
  the	
  ontology	
  
•  	
  [OPE]	
  should	
  occur	
  in	
  the	
  property	
  
vocabulry	
  
•  The	
  ontology	
  should	
  allow	
  Pizza	
  to	
  
have	
  CheeseTopping	
  
•  	
  should	
  be	
  sa6sfiable	
  
•  The	
  ontology	
  should	
  also	
  allow	
  
Pizza	
  to	
  not	
  have	
  CheeseTopping
•  	
  should	
  be	
  sa6sfiable	
  
CE1	
   OPE	
   CE2	
  
86
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Result	
  3:	
  Associate	
  PresupposiAons	
  with	
  
Features	
  
• Features	
  in	
  a	
  CQ	
  are	
  associated	
  with	
  the	
  
presupposiLons	
  of	
  the	
  CQ.	
  	
  
•  An	
  example	
  on	
  the	
  ques6on	
  type	
  feature:	
  
QuesLon	
  
Type	
  
SelecLon	
  
Boolean	
  
CounLng	
  
Occurrence	
  of	
  “Pizza”,	
  “Pork”,	
  
“contains”	
  
Which	
  pizza	
  contains	
  pork?	
  
Can	
  pizza	
  contain	
  pork?	
  
How	
  many	
  pizza	
  contains	
  pork?	
  
Some	
  pizza	
  can	
  contain	
  pork	
  
Some	
  pizza	
  can	
  contain	
  no	
  pork	
  
87
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Result	
  4:	
  Formal	
  Authoring	
  Tests	
  
•  All	
  tesLngs	
  can	
  be	
  automated	
  
88
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Class	
  
Hierarchy	
  
Verbalise
r	
  
Competency	
  
QuesLons	
  
User/System	
  Dialogue	
  
History	
  
User	
  Input	
  
WhatIf	
  Gadget	
  
89
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Input	
  (Manchester	
  Syntax)	
  
1. User	
  selects	
  a	
  speech	
  act	
  by	
  clicking	
  or	
  selecLng	
  a	
  
shortcut.	
  
2. We	
  need	
  to	
  evaluate	
  their	
  usefulness.	
  
3. Examples:	
  
●  Class:	
  Pizza	
  SubClassOf:	
  Food	
  
●  Class:	
  Fruit	
  DisjointWith:	
  Pizza	
  	
  
90
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Input	
  (OWL	
  Simplified	
  English)	
  
1.  A	
  set	
  of	
  restricted	
  natural	
  language	
  paSerns.	
  
2.  System	
  recognises	
  the	
  speech	
  act.	
  
3.  Capable	
  of	
  accepLng	
  Competency	
  QuesLons.	
  
4.  Examples:	
  
●  Which	
  pizza	
  has	
  topping	
  a	
  tomato	
  topping?	
  
●  An	
  apple	
  is	
  a	
  fruit.	
  
91
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Modelling	
  User	
  Goals	
  (1)	
  
1. Users	
  can	
  import	
  or	
  write	
  their	
  own	
  CQs	
  in	
  OWL	
  
Simplified	
  English	
  
2. Based	
  on	
  the	
  inserted	
  CQ,	
  a	
  list	
  of	
  Authoring	
  Tests	
  
(ATs)	
  will	
  be	
  generated.	
  
3. A	
  tree	
  structure	
  displays	
  these	
  CQs	
  and	
  ATs.	
  
4. The	
  system	
  is	
  constantly	
  monitoring	
  these	
  CQs	
  and	
  
ATs.	
  Any	
  change	
  in	
  saLsfiability	
  of	
  ATs:	
  
a. Will	
  be	
  reported	
  by	
  changing	
  the	
  icon	
  of	
  ATs	
  in	
  
the	
  tree.	
  Red/Green	
  respecLvely	
  represent	
  fail/
pass	
  of	
  each	
  AT.	
  
b. Will	
  be	
  reported	
  in	
  the	
  “history”	
  pane.	
  	
  
92
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Modelling	
  User	
  Goals	
  (2)	
  
CQ	
  +	
  AT	
  hierarchical	
  representaLon.	
  
Icons	
  represent	
  the	
  saLsfiability	
  state	
  
WriSen	
  feedback	
  presented	
  to	
  the	
  
user	
  in	
  the	
  “history”	
  pane.	
  
93
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Further	
  Challenges	
  
●  Maintaining a continuous and meaningful
interaction with the user
●  Generating a coherent and comprehensive
set of entailments in response to What-If
questions
❖ Content selection
❖ Grouping and aggregation
❖ Ordering
94
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
•  Data	
  understanding
•  Data summarisation
•  Query generation	
  
UNDERSTANDING	
  KNOWLEDGE	
  GRAPHS	
  
95
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Data	
  Understanding:	
  A	
  Core	
  AcAvity	
  in	
  Data	
  ExploitaAon	
  
•  TradiLonal	
  focus	
  in	
  semanLc	
  web	
  research:	
  data	
  
understanding	
  for	
  machines	
  and	
  programs.	
  
•  More	
  importantly:	
  Data	
  understanding	
  for	
  human	
  
•  humans	
  are	
  the	
  ulLmate	
  owners	
  and	
  consumers	
  of	
  
data	
  
• systems	
  such	
  as	
  knowledge	
  graphs,	
  Watson,	
  Siri,	
  etc.	
  
•  to	
  help	
  human	
  users	
  to	
  understand	
  the	
  contents,	
  
implicaLons	
  and	
  applicaLons	
  of	
  data	
  
• More	
  than	
  HCI,	
  we	
  want	
  interesLng	
  and	
  insighqul	
  data!	
  
9696
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
SemanAc	
  Datasets	
  Are	
  HARD	
  to	
  Understand	
  
•  Non-expert users might not be familiar with
RDF, OWL and SPARQL
•  RDF(s) has 6 core documents
•  OWL 2 has 6 core documents
•  SPARQL 1.1 has 11 core documents
•  Users are unfamiliar with datasets
•  That are too large to explore
•  That are external to their organisation
•  …
•  It is HARD for novice users to construct
queries
9797
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Challenges	
  of	
  Data	
  Understanding	
  	
  
• Challenges	
  
•  Expressing	
  needs	
  (keywords/SPARQL)	
  
•  Describing	
  datasets	
  
•  Only	
  retrieve	
  the	
  relevant	
  parts	
  
• 9.96%	
  SPARQL	
  /	
  8.19%	
  DUMP	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
SoluAon	
  –	
  Summary	
  based	
  profiling	
  for	
  LD	
  
•  Key	
  idea:	
  building	
  block	
  based	
  informaLon	
  space	
  
modelling	
  
•  Decomposing	
  &	
  ConstrucLng	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
The	
  philosophy	
  of	
  interpreAng	
  informaAon	
  	
  
• Task:	
  explain	
  the	
  data	
  to	
  human	
  users	
  	
  	
  	
  
Entity Centric
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
EnAty-­‐centric	
  View	
  of	
  RDF	
  Data	
  
En6ty	
  Descrip6on	
  Block	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Concrete	
  to	
  abstract	
  	
  
En6ty	
  Descrip6on	
  Pa?ern	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Data	
  SummarisaAon	
  –	
  EDP	
  Graph
•  Reveal	
  the	
  schema	
  level	
  informaLon	
  
•  What	
  concepts	
  are	
  there	
  (nodes)and	
  how	
  they	
  are	
  related	
  to	
  each	
  
other(edges)?	
  
•  Disclose	
  	
  individual	
  level	
  distribuLon	
  
•  StaAsAcs	
  aSached	
  to	
  nodes	
  and	
  edges	
  
Jamendo	
  dataset	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Understanding	
  Data	
  Redundancy	
  
[Wu	
  et.	
  al,	
  2014]	
  
104
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Related	
  Paper	
  at	
  JIST2014	
  
• Graph	
  PaSern	
  based	
  RDF	
  Data	
  Compression	
  
Jeff	
  Z.	
  Pan,	
  Jose	
  Manuel	
  Gomez-­‐Perez,	
  Yuan	
  Ren,	
  
Honghan	
  Wu,	
  Haofen	
  Wang	
  and	
  Man	
  Zhu	
  
• (Monday	
  aiernoon)	
  
105
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Understanding	
  How	
  Data	
  Can	
  be	
  Used	
  
•  Given	
  a	
  knowledge	
  graph,	
  generate	
  candidate	
  
insighqul	
  queries	
  
•  Manual	
  generaLon/automaLc	
  generaLon	
  
•  GeneraLon	
  based	
  on	
  schema/actual	
  data	
  
•  With/without	
  user	
  interference	
  
•  Our	
  aim:	
  automaLc	
  generaLon	
  based	
  on	
  data	
  
without	
  user	
  interference	
  
•  Most	
  friendly	
  to	
  new,	
  novice	
  users	
  
•  Complementary	
  to	
  inference	
  (heavily	
  based	
  on	
  
schema)	
  
106106
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Candidate	
  Insighpul	
  Queries	
  
[Pan,	
  et	
  al,	
  2013]	
  
•  Graph	
  paSerns	
  are	
  summarisaLons	
  that	
  represent	
  
many	
  subsets	
  of	
  the	
  RDF	
  graph	
  	
  
•  PaSern	
  structure	
  
•  Structured	
  knowledge,	
  which	
  is	
  difficult	
  to	
  express	
  with	
  
schema	
  
•  Such	
  as	
  star,	
  chain,	
  tree,	
  loop	
  
•  Correspondences	
  between	
  mulLple	
  graph	
  
paSerns	
  
•  Strongly	
  corresponding	
  paSerns	
  (large	
  overlapping)	
  
•  Weakly	
  corresponding	
  paSerns	
  (liSle	
  overlapping)	
  
•  ExcepLons	
  
	
  
107
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Query	
  GeneraAon	
  Framework	
  
•  1.	
  data	
  summarisaLon	
  
•  Significantly	
  decrease	
  the	
  
search	
  space	
  in	
  rule	
  mining	
  
•  2.	
  data	
  analyLcs	
  
•  First	
  order	
  inducLve	
  
learning	
  
•  AssociaLon	
  rule	
  mining	
  
•  3.	
  query	
  generaLon	
  
•  ExploiLng	
  the	
  relaLons	
  
between	
  queries	
  and	
  rules	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
EvaluaAon	
  
109
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Another	
  Example	
  
• Given	
  university	
  data	
  set	
  in	
  LUBM,	
  the	
  
following	
  two	
  queries	
  have	
  the	
  same	
  results	
  
(when	
  no	
  reasoning	
  is	
  applied)	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
Summary	
  and	
  Future	
  Work	
  
• Take	
  home	
  message	
  
•  Data	
  summarisaLon	
  and	
  data	
  analyLcs	
  
technologies	
  not	
  only	
  help	
  people	
  to	
  find	
  answers,	
  
but	
  also	
  help	
  people	
  asking	
  quesLons!	
  
• Future	
  work	
  
•  Integrate	
  with	
  applicaLon	
  scenario	
  background	
  
knowledge	
  
•  Integrate	
  with	
  reasoning	
  
•  Integrate	
  with	
  user	
  preferences	
  
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
OUTLOOK	
  
Outlook of Knowledge Graph: from application’s point of view
112
Jeff	
  Z.	
  Pan	
  (University	
  of	
  	
  Aberdeen)	
  
What knowledge graph still needs:
•  “How to…” knowledge in addition to “What is
…” knowledge
•  Operations associated to the entities
Outlook	
  
What knowledge graph is good at:
Maintaining factual knowledge in a structural
manner and answer queries about them
113
JIST2014	
  Tutorial	
  on	
  	
  
ConstrucAng	
  and	
  Understanding	
  
Knowledge	
  Graphs	
  	
  
Thanks	
  you!	
  
	
  
	
   	
   	
  QuesAons?	
  

More Related Content

What's hot

Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Neo4j
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesNeo4j
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxNeo4j
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the WebArmin Haller
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Extracting, Aligning, and Linking Data to Build Knowledge Graphs
Extracting, Aligning, and Linking Data to Build Knowledge GraphsExtracting, Aligning, and Linking Data to Build Knowledge Graphs
Extracting, Aligning, and Linking Data to Build Knowledge GraphsCraig Knoblock
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
 
Open Data & Open Research Data Repositories
Open Data & Open Research Data RepositoriesOpen Data & Open Research Data Repositories
Open Data & Open Research Data RepositoriesVasantha Raju N
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge GraphsNeo4j
 

What's hot (20)

Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptx
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Extracting, Aligning, and Linking Data to Build Knowledge Graphs
Extracting, Aligning, and Linking Data to Build Knowledge GraphsExtracting, Aligning, and Linking Data to Build Knowledge Graphs
Extracting, Aligning, and Linking Data to Build Knowledge Graphs
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
Open Data & Open Research Data Repositories
Open Data & Open Research Data RepositoriesOpen Data & Open Research Data Repositories
Open Data & Open Research Data Repositories
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
 

Similar to Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
VIVO: A Community-driven Research Information Management System: Challenges a...
VIVO: A Community-driven Research Information Management System: Challenges a...VIVO: A Community-driven Research Information Management System: Challenges a...
VIVO: A Community-driven Research Information Management System: Challenges a...Muhammad Javed
 
SNSInkCloudWiner20150410
SNSInkCloudWiner20150410SNSInkCloudWiner20150410
SNSInkCloudWiner20150410Dov Winer
 
Scholars@Cornell: Visualizing the scholarly record
Scholars@Cornell: Visualizing the scholarly recordScholars@Cornell: Visualizing the scholarly record
Scholars@Cornell: Visualizing the scholarly recordMuhammad Javed
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Karl Kwon, Ph.D.
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Karl Kwon, Ph.D.
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers Getaneh Alemu
 
Reuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and RealizationReuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and Realizationandrea huang
 
Digital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDigital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDov Winer
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 

Similar to Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs (20)

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
VIVO: A Community-driven Research Information Management System: Challenges a...
VIVO: A Community-driven Research Information Management System: Challenges a...VIVO: A Community-driven Research Information Management System: Challenges a...
VIVO: A Community-driven Research Information Management System: Challenges a...
 
Javed - VIVO: Community Driven RIM
Javed - VIVO: Community Driven RIM Javed - VIVO: Community Driven RIM
Javed - VIVO: Community Driven RIM
 
SNSInkCloudWiner20150410
SNSInkCloudWiner20150410SNSInkCloudWiner20150410
SNSInkCloudWiner20150410
 
Scholars@Cornell: Visualizing the scholarly record
Scholars@Cornell: Visualizing the scholarly recordScholars@Cornell: Visualizing the scholarly record
Scholars@Cornell: Visualizing the scholarly record
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
Reuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and RealizationReuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and Realization
 
Usp dh 2013
Usp dh 2013Usp dh 2013
Usp dh 2013
 
Digital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDigital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic Annotations
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

  • 1. JIST2014  Tutorial  on     Linked  Data  and  Knowledge  Graphs   -­‐  ConstrucAng  and  Understanding  Knowledge  Graphs     Presenter   Jeff  Z.  Pan  (University  of    Aberdeen)     Contributors   Honghan  Wu  (University  of    Aberdeen)   Yuan  Ren  (University  of    Aberdeen)   Panos  Alexopoulos  (iSOCO)  
  • 2. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       Overview  &  ApplicaAons   1:00pm  –   1:20pm   1:35pm  –   1:45pm   The  Current  Status  of  Linked  Data:  the  Good,  the  Bad  and   the  Ugly   1:20pm  –   1:35pm   Example  Linked  Data  Knowledge  Repositories     PART  I  LINKED  DATA  &  KNOWLEDGE  GRAPHS   1:45pm  –   2:00pm   Research  Challenges   2
  • 3. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       ConstrucAng  Knowledge  Graphs   2:00pm  –   3:05pm   3:05pm  –   3:40pm   Understanding  Knowledge  Graphs   2:30pm  –   2:45pm   Coffee  Break   PART  II  METHODS  &  TECHNIQUES   3:40pm  –   3:45pm   Outlook   3
  • 4. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Overview   •  ApplicaLons   •  Linked  Data  Knowledge  Repositories   •  Knowledge  Graph  on  Linked  Data   •  Research  Challenges   PART  I     LINKED  DATA  &  KNOWLEDGE  GRAPHS   4
  • 5. Jeff  Z.  Pan  (University  of    Aberdeen)   Knowledge   •  What  is  knowledge?   •  Something  is  known   •  Structured  informaLon     •  About  certain  aspects  of   the  (real)  world     5
  • 6. Jeff  Z.  Pan  (University  of    Aberdeen)   Semantic Networks A semantic network is a graph   structure  for  represenLng   knowledge  in  paSerns  of   interconnected  nodes  and  arcs. •  with nodes representing objects, concepts, or situations, and •  arcs representing relationships 6
  • 7. Jeff  Z.  Pan  (University  of    Aberdeen)   RDF: Standard for Directed Labelled Graph KBs for the Web •  RDF is •  a modern version of semantic network, with formal syntax and semantics •  a  standard  model  for  data  interchange  on  the   Web •  RDF statements: Subject-property-value triples [my-­‐chair  colour  tan  .]   [my-­‐chair  rdf:type  chair  .]   [chair  rdfs:subClassOf  furniture  .]   7
  • 8. Jeff  Z.  Pan  (University  of    Aberdeen)   Linked  Data  and  Knowledge  Graphs   • Linked  Data  refers  to  (RDF)  data  published  on   the  web   •  with  its  meaning  explicitly  defined  with  ontological   (OWL)  vocabulary   •  can  be  inter-­‐linked  with  external  datasets   • A  knowledge  graph  is  a  set  of  interconnected   typed  enLLes  and  their  aSributes   8
  • 9. Jeff  Z.  Pan  (University  of    Aberdeen)   Knowledge  Graph  (KG)  Services  and   Related  Research  Problems   •  KG  construcLon:  how  to  construct  high  quality   knowledge  graphs?   •  Knowledge  aquaciLon     •  Knowledge  evaluaLon   •  KG  understanding:  how  to  make  it  easier  to  access  and   reuse  knowledge?   •  for  end  users   •  for  data  engineers   •  KG  reasoning:  how  to  bridge  the  gap  between   vocabulary  used  in  the  graphs  and  those  used  in  qeuries   •  Scalability     •  Efficiency   9
  • 10. Jeff  Z.  Pan  (University  of    Aberdeen)   APPLICATIONS  OF     KNOWLEDGE  GRAPHS   Summary of entities, Faceted fact, From best to list, EntityAssociations, Structured Queries, and QuestionAnswering 10
  • 11. Jeff  Z.  Pan  (University  of    Aberdeen)   ENTITY  UNDERSTANDING:   THINGS,  NOT  STRINGS   11
  • 12. Jeff  Z.  Pan  (University  of    Aberdeen)   What  is  it?  (EnAty  Understanding)   12
  • 13. Jeff  Z.  Pan  (University  of    Aberdeen)   FACETED  FACT:   GETTING  THE  VALUE  OF  SOME   ATTRIBUTE   13
  • 14. Jeff  Z.  Pan  (University  of    Aberdeen)   What  is  the  Ame  there?  (Faceted  Fact)   14
  • 15. Jeff  Z.  Pan  (University  of    Aberdeen)   FROM  BEST  TO  LIST:   NOT  ONLY  THE  BEST   15
  • 16. Jeff  Z.  Pan  (University  of    Aberdeen)   Give  a  List  instead  of  Best   16
  • 17. Jeff  Z.  Pan  (University  of    Aberdeen)   ENTITY  ASSOCIATION:   SHOW  THE  CONNECTIONS   17
  • 18. Jeff  Z.  Pan  (University  of    Aberdeen)   How  are  they  connected?  (EnAty  AssociaAon)   Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC 2014, pp. 422–437. http://ws.nju.edu.cn/explass/ 18
  • 19. Jeff  Z.  Pan  (University  of    Aberdeen)   STRUCTURED  QUERIES:   EVEN  WHEN  THE  INPUTS  ARE   KEYWORDS   19
  • 20. Jeff  Z.  Pan  (University  of    Aberdeen)   From  keywords  to  structural  queries   Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu. Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of ESWC 2008, pp 584-598. “Capin SVG” find specifications about“SVG”whose author’s name is“Capin” 20
  • 21. Jeff  Z.  Pan  (University  of    Aberdeen)   QUESTION  ANSWERING:    COMPUTE  ANSWERS  WITH  THE  KG   21
  • 22. Jeff  Z.  Pan  (University  of    Aberdeen)   QuesAon  Answering   Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question answering over RDF data." In Proceedings of the 21st international conference onWorldWideWeb, pp. 639-648.ACM, 2012. “films starring Brad Pitt” 22
  • 23. Jeff  Z.  Pan  (University  of    Aberdeen)   SAMPLE  LINKED  DATA  KNOWLEDGE   REPOSITORIES   DBpedia,WikiData, GoodRelation 23
  • 24. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia   •  A  crowd-­‐sourced  community  effort  to  extract   structured  informaLon  from  Wikipedia   •  allows  to  ask  structured  queries  against   Wikipedia   •  and  to  link  the  different  data  sets  on  the  Web   to  Wikipedia  data.     24
  • 25. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  the  content   Entities and their attributes from Wikipedia infobox templates, categorisation information, images, geo- coordinates, etc Classification Schemas •  Wikipedia Categories are represented using the SKOS vocabulary and DCMI terms. •  YAGO Classification is derived from the Wikipedia category system using Word Net. •  Word Net Synset Links were generated by manually relating Wikipedia infobox templates and Word Net synsets DBpedia 2014 release consists of 3 billion RDF triples 25
  • 26. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  services   http://dbpedia.org/sparql Query Builders (e.g. Leipzig query builder at http://querybuilder.dbpedia.org) Public Faceted Web Service Interface Dump Downloads •  DBpedia dumps in 125 languages at DBpedia download server. •  DBpedia Ontology 26
  • 27. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  use  cases   Nucleus for the Web of Data Revolutionise Access to Wikipedia information “Give me all cities in New Jersey with more than 10,000 inhabitants” 27
  • 28. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData   •  A  collaboraAvely  edited  knowledge  base   operated  by  the  Wikimedia  FoundaLon.       •  Can  be  read  and  edited  by  both  humans  and   machines.   •  Acts  as  central  storage  for  the  structured   data  of  its  Wikimedia  sister  projects  including   Wikipedia,  Wikivoyage,  Wikisource,  and   others   28
  • 29. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  –  the  content   Wikidata is a document- oriented, focused around topics. •  Information is added to items by creating statements (key-value pairs) 29
  • 30. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  -­‐  to  Linked  Data  Web  (1)   Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Exporting Statements as Triples •  Faithful representations: with additional quantifiers and references •  Simplified representations: without additional quantifiers and references 30
  • 31. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  -­‐  to  Linked  Data  Web  (2)   Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Extracting Schema Information from Wikidata •  instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf •  constraints for the use of properties → OWL Axioms 31
  • 32. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  –  use  case  &  data  access   Use Cases •  Information about the sources helps support the notion of verifiability •  Collecting structured data: allow easy reuse of that data •  Support for Wikimedia projects: reducing the workload in Wikipedia and increasing its quality •  Support well beyond that. Everyone can use Wikidata Accessing the data •  Mediawiki Lua Scribunto interface •  Wikibase/API •  RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/ 32
  • 33. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons   GoodRelations is a lightweight ontology for annotating offerings and other aspects of e-commerce on the Web. [Slide  credit:    MarLn  Hepp]   33
  • 34. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  use  cases   [Slide  credit:    MarLn  Hepp]   34
  • 35. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  use  cases(2)   35 Google, Bing, Yahoo, and Yandex will improve the rendering of your page directly in the search results Rich Snippets:Search engines use your markup  to augment the preview of your site Targeted Searching:profile and preferences of the person behind the query
  • 36. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  who  are  using   36 Search Engines and 10,000+ small and large shops Publishers Software OpenLink (Virtuoso)
  • 37. Jeff  Z.  Pan  (University  of    Aberdeen)   CURRENT  STATUS  OF  ONLINE   LINKED  DATA   The good, the bad and the ugly 37
  • 38. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Good   Ontology  Mapping   Data  linkage   RDF  /  OWL   Querying  and  reasoning  techniques   -­‐   Flexible    schema  sebng   -­‐   schemaless  -­‐>  simple   schema  -­‐>  rich  schema   -­‐  Universal  Unique  ID  for  data  enLLes:  URI   -­‐  Shared  vocabularies   -­‐  Schema  mapping   -­‐   Instance  mapping   -­‐   SPARQL  entailment  regimes   -­‐   DisLrbuted  SPARQL  endpoints   38 Flexible  linked  data  eco-­‐system  
  • 39. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Good   • Flexible  linked  data  eco-­‐system   • FaciliLes  of  sharing  and  linking  knowledge  in   open  environment   • Knowledge  representaLon:  various  levels  of   expressive  power   • Services,  tools,  and  approaches  for  knowledge   generaLon,  understanding,  and  consuming   • Interlinked  knowledge  repositories  across   various  domains     39
  • 40. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Bad   • Knowledge  Quality  (errors,  provenance,   quanLfier,  freshness…)   • Data  protecLon  (license,  access  control)   • Data  business  model   40
  • 41. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Ugly   • Excel  in  knowledge  representaLon   •  But,  a  large  amount  of  datasets  missing  schema   informaLon       • RDF  is  triple  based  model   •  But,  it  is  hard  and  Lme-­‐consuming  (even  for  SW   geeks)  to  understand  a  RDF  knowledge  repository   41
  • 42. Jeff  Z.  Pan  (University  of    Aberdeen)   RESEARCH  CHALLENGES   42
  • 43. Jeff  Z.  Pan  (University  of    Aberdeen)   Research  Challenges   •  KG Construction •  Ontology / Schema Construction •  Data Lifting •  Quality Evaluation •  Understanding KG •  User Understanding •  Data Understanding •  Dynamic Knowledge in KG •  Stream Data / Prediction •  Belief Revision •  Intelligent Services for KG •  Ontology Reasoning (see my tutorial at ISWC2014) •  Problem Solving / Workflow 43
  • 44. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Incompleteness of data: is the constructed schema generic enough to accommodate new data? •  Inconsistency of data: what if data conflicts with each other? e.g. Birthdate of people: some people may not have birthdate asserted in the dataset, should the schema specify that each people has a birthdate? Some people may have different birthdates asserted in different datasets, should the schema specify that birthdate is unique? Challenges  in  AutomaAc  ConstrucAon   44
  • 45. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Expertise of ontology engineers: do the engineers have sufficient understanding and experience of ontology technologies (RDF(S), OWL, SPARQL, RIF, etc…) •  Workload of ontology engineers: how much time does it take to manually construct a large ontology? E.g. SNOMED CT has about 400,000 concepts •  Collaboration: when multiple ontology engineers work together, how to make sure they have consistent understanding of the ontology? Challenges  in  Mannual  ConstrucAon   45
  • 46. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Requirement and evaluation: how to specify the requirement of ontology construction and test if the requirements have been fulfilled? •  Expressiveness v.s. Efficiency: which knowledge representation should we use? Is it sufficient to describe the domain? Is there efficient reasoning and query answering mechanism and system available? •  Ontology reuse: do we have to construct everything from scratch? Is there ontology available covering partially the domain? Challenges  for  both  AutomaAc/Mannual   ConstrucAons   46
  • 47. Jeff  Z.  Pan  (University  of    Aberdeen)   Key challenges: •  Entity identification: certain entities can be hard to identify, e.g. movie titles •  AVP (attribute-value pair) identification: an entity, attribute and its value may scattered across the text or dataset, making it hard to establish the relation Challenge  in  Data  Liding   Data Lifting enrichs unstructured data with structural annotations, therefore extract the entities and their relations, properties for knowledge graph 47
  • 48. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenge  in  EnAty  IdenAficaAon     •  There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” •  The same name can be referring to different entities •  People may use acronym or abbreviation for entities: e.g. “K-Drive” is the acronym for “Knowledge-driven Data Exploitation” project instead of the drive labelled K in my computer. •  Natural language text may have typos, values may use different notations 48
  • 49. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Users are unfamiliar with the content of knowledge graphs: •  What is the vocabulary? •  What is described by the knowledge graph? •  How is the content organised? •  How is it connected to the other datasets I have? •  Users do not know how to exploit the knowledge graph: •  Which query can I ask this knowledge graph? •  Which query can be answered with this knowledge graph? Challenge  in  Data  Understanding   49
  • 50. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenge  in  Knowledge  Dynamics   •  Validity of knowledge: is a piece of information permanent or temporary? •  Representation: e.g. to represent the temporal dependency of knowledge, e.g. “George W. Bush was the president of the U.S. until Barak Obama became the president.” •  Updating of knowledge graph: When and how do we retract a previously unknown mistake from the knowledge graph? Which knowledge should become obsolete after the current update? •  Querying: to query w.r.t. the temporal properties of knowledge, e.g. “Who was the last president of the U.S.?” •  Predicting the dynamics: which change is likely to occur given the history of the knowledge graph? 50
  • 51. Jeff  Z.  Pan  (University  of    Aberdeen)   Key challenges •  Efficiency of the services: knowledge graphs are usually accessed by multiple users in real-time. Efficiency is crucial to the quality of service. •  Scalability of the services: knowledge graphs are usually of large scale while basic reasoning services, e.g. transitive closure, can already consume large amount of time and resources. Challenge  in  Intelligent  Services   The large amount of information and their inter-connection in a knowledge graph can be used to provide intelligent services; e.g. reasoning can be used to discover hidden relations in a knowledge graph 51
  • 52. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       ConstrucAng  Knowledge  Graphs   2:00pm  –   3:05pm   3:05pm  –   3:40pm   Understanding  Knowledge  Graphs   2:30pm  –   2:45pm   Coffee  Break   PART  II  METHODS  &  TECHNIQUES   3:40pm  –   3:45pm   Outlook   52
  • 53. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Test Driven Ontology Construction •  Methodology •  A Protégé plug-in •  Handling Entity DisambiguaLon   •  Approach •  Some evaluation result •  Briding Requirements and Authoring Tests •  Competency Questions as Informal Requirement Specification •  Some evaluation results CONSTRUCTING  KNOWLEDGE  GRAPHS   53
  • 54. Jeff  Z.  Pan  (University  of    Aberdeen)   Uschold  &  King’s  (1995)  Methodology  on   Ontology  ConstrucAon   •  Key steps: capturing, coding, integrating and evaluating/testing •  Ontology evaluation/testing: •  to make a technical judgment of the ontologies •  w.r.t. to a frame of reference •  A frame of reference can be: •  requirement specifications •  competency questions •  or, the real world 54 54
  • 55. Jeff  Z.  Pan  (University  of    Aberdeen)   Ontology and Tests •  Uschold & King’s methodology •  Test ontology after axioms are written •  Test-driven ontology authoring •  Write authoring tests before writing axioms •  Writing authoring tests before axioms does not take any more efforts than writing them after axioms •  Force authors to think about requirements before writing axioms •  Writing authoring tests first will help authors to detect and remove errors sooner •  Understand how good is a(n) existing/reused ontology 55 55
  • 56. Jeff  Z.  Pan  (University  of    Aberdeen)   Gruninger  &  Fox’s  (1995)  Methodology   Key steps: 1.  Motivating Scenarios 2.  Informal competency questions 3.  FOL terminology (classes, properties, objects) 4.  Formal competency questions (2 -> 4?) 5.  FOL axioms 6.  Completeness theorem (defining the conditions under which the solutions to the questions are complete) 56 56
  • 57. Jeff  Z.  Pan  (University  of    Aberdeen)   The  METHONTOLOGY  (2003)  Methodology   •  Key steps: 1.  specification of requirements 2.  terminology with tabular and/or graph notations 3.  formalisation with logic based ontology language 4.  maintenance (including evaluation/testing) •  Ontology evaluation/testing: •  checking consistency, completeness, redundancy 57 57
  • 58. Jeff  Z.  Pan  (University  of    Aberdeen)   The  DKAP  (2007)  Methodology   •  Key steps: 1.  determine the domain and scope 2.  check availability of existing ontologies 3.  collect and analyse data for knowledge extraction 4.  develop initial ontology 5.  refine and validate ontology •  Ontology Validation/testing: •  consistency and accuracy checking 58 58
  • 59. Jeff  Z.  Pan  (University  of    Aberdeen)   LimitaAons  of  ExisAng  Methodologies   •  Methodology level: •  Lack of details about the transitions • from requirement to tests • from requirements to terminology • form terminology to axioms •  Tool level: •  lack of tools to guide the above transitions 59 59
  • 60. Jeff  Z.  Pan  (University  of    Aberdeen)   An approach to Test-­‐Driven  Ontology   Authoring  (presented  in  an  invited  talk  at   BMIR,  Stanford  University,  June  2013) •  An ontology contains not only OWL files, but also a test suit •  A test suit contains a set of tests as SPARQL 1.1 queries •  not all requirements can be represented in SPARQL 1.1 though •  Ontology reuse •  check the associated test suit before ontology reuse, to better understand the original intention •  Collaborative ontology authoring •  all authors agree upon a common test suit •  each author can have their an extra test suit locally 60 60
  • 61. Jeff  Z.  Pan  (University  of    Aberdeen)   Authoring  Tests   Test  Suite   Test  1   Test  2   …   Query   Expected   results   Ontology   Actual   results   Pass/ fail  reasoner   SPARQL   1.1   61
  • 62. Jeff  Z.  Pan  (University  of    Aberdeen)   A  Protégé  Plug-­‐in  for  Authoring  Tests   (based  on  the  TrOWL  reasoner)   62 62
  • 63. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Clicking  on  a  test  to   show  the  expected   and  actual  results   Loading  the  Manifest  File   •  A  manifest  file   specifies  queries  and   expected  results   •  Running  reasoner  to   get  the  results  for   each  test   63 63
  • 64. Jeff  Z.  Pan  (University  of    Aberdeen)   Compute  JusAficaAons  for  Errors  Related  to   Failed  Tests   • with  the  jusLficaLon  plug-­‐in  (and  reasoners,   such  as  TrOWL)   64 64
  • 65. Jeff  Z.  Pan  (University  of    Aberdeen)   Modify  the  Ontology       • so  that  CheeseTopping  no  longer  disjoint  with   VegetableTopping   65 65
  • 66. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Issue  (to  be  revisited  ader  the  EnAty   DisambiguaAon  part)   • Understanding  the  intension  of  ontology  authors   •  How  to  generate  authoring  tests?   •  How  to  judge  the  quality  of  the  authoring  tests?   66 66
  • 67. Jeff  Z.  Pan  (University  of    Aberdeen)   EnAty  RecogniAon  and  DisambiguaAon       •  Challenge  revisit:   •  There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” •  The same name can be referring to different entities •  Contextual  hypothesis  used in many existing aproaches •  terms  with  similar  meanings  are  oien  used  in  similar   contexts   •  The  role  of  these  contexts  is  typically  played  by  already   annotated  documents  (e.g.  wikipedia  arLcles)  which   are  used  to  train  term  classifiers 67 67
  • 68. Jeff  Z.  Pan  (University  of    Aberdeen)   AlternaAve  Context:  Evidence  Model   •   Idea: semantic entities that may serve as disambiguation evidence for the scenario’s target entities 68
  • 69. Jeff  Z.  Pan  (University  of    Aberdeen)   Evidence  Model  ConstrucAon  (Manual)   •  The identification of target concepts whose instances we wish to disambiguate (e.g. locations) •  The determination related concepts whose instances may serve as contextual disambiguation evidence. • For example, in texts that describe historical events, some concepts whose instances may act as location evidence are related locations, historical events, and historical groups and persons. •  The identification, for each pair of evidence and target concept, of the relation paths that links them. 69
  • 70. Jeff  Z.  Pan  (University  of    Aberdeen)   Evidence-­‐Target  Paths   70
  • 71. Jeff  Z.  Pan  (University  of    Aberdeen)   Term  ExtracAon  (AutomaAc)   Extraction is performed with Knowledge Tagger (from iSOCO) based on GATE. 71
  • 72. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon  Results:  Football  Match  Scenario   •  50 texts describing football matches. •  E.g. “It's the 70th minute of the game and after a magnificent pass by Pedro, Messi managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real." 72
  • 73. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon  Results:  Military  Conflict  Scenario   •  50 historical texts describing military conflicts. •  E.g. “The Siege of Augusta was a significant battle of the American Revolution. Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to the British and Loyalist forces in the South”. 73
  • 74. Jeff  Z.  Pan  (University  of    Aberdeen)   Future  Work       •  Fully automated construction of the disambiguation evidence model. •  Challenge here is how to automatically identify the text’s domain/ topic. •  Combination with statistical methods for cases where available domain semantic information is incomplete. •  Challenge here is how to select the optimal ratio of ontological evidence v.s. statistical one. •  Development of tool to enable users to dynamically build such models out of existing semantic data and use them for disambiguation purposes 74
  • 75. Jeff  Z.  Pan  (University  of    Aberdeen)   Issues  in  Test-­‐Driven  Ontology  Authoring 1.  How to generate tests 2.  How to judge the quality of tests •  why they are relevant •  how to provide the correct expected answers 75 75
  • 76. Jeff  Z.  Pan  (University  of    Aberdeen)   Requirement  Driven?   • How  about  starLng  from  requirements  instead   of  tests?     Ontology   Authoring   Requirements   Ontology   Authoring   Tests   Test  Results   76
  • 77. Jeff  Z.  Pan  (University  of    Aberdeen)   Requirement-­‐Driven  Ontology  Authoring     [Ren  et.  al,  2014] •  Key questions •  RQ1: what forms of requirements should we consider •  RQ2: how to generate authoring tests from requirements 77 77
  • 78. Jeff  Z.  Pan  (University  of    Aberdeen)   Competency  QuesAon       •  QuesLons  that  people   expect  the  constructed   ontologies  to  answer   •  Useful  for  novice  users       •  in  natural  languages   •  about  domain  knowledge   •  requires  liSle   understanding  of   ontology  technologies   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     78
  • 79. Jeff  Z.  Pan  (University  of    Aberdeen)     RQ1: what forms of requirements should we consider   RQ1’:  How  are  CQs  formulated?   Competency  QuesAons  (CQs)  can  be   regarded  as  a  funcAonal  requirement  of   the  ontology     79
  • 80. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Idea  1:  IdenAficaAon  of  CQ  Paoerns   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     •  Hypothesis:  CQs  usually  have   clear  syntacLc  paSerns   •  Features  and  elements  can   be  extracted   Feature:  Type  of  quesLon   Element:  Class  expression  CE1   Element:  Object  property   expressions  OPE   Feature:  Binary  predicate   Element:  Class  expression  CE2   CE1   OPE   CE2   80
  • 81. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  1:  A  Feature-­‐based  Framework  for  CQ   FormulaAon   •  Based  on  CQs  collected  from  the  Soiware  Ontology  Project  (75  CQs)   and  Manchester  OWL  Workshops  (70  CQs)   •  Primary  features  -­‐>  CQ  Archetypes   •  Secondary  features  -­‐>  CQ  Subtypes   Feature   Primary  Feature   Secondary  Feature   QuesLon   Type   Element   Visibility   SelecLon   Boolean   CounLng   Explicit   Implicit   Predicate   Arity   Unary   Binary   N-­‐ary   RelaLon   Type   Object   Datatype   Modifier   QuanLty   Numeric   Domain   Independent   Element   SpaLal   Temporal   QuesLon   Polarity   PosiLve   NegaLve   81
  • 82. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  2:  Archetypes  of  CQ  Paoerns   82
  • 83. Jeff  Z.  Pan  (University  of    Aberdeen)   Answerability  of  CQs   •  ExisLng  work  focused  on   answering  CQs  directly   •  But  is  the  answer   meaningful?   •  The  ability  to  answer  CQs   meaningfully  can  be   regarded  as  a  funcLonal   requirement  of  the  ontology   •  What  if  the  answer  is  an   empty  set   •  Possible  scenarios   •  Pizza  does  not  exist   •  Cheese  topping  does  not  exist   •  Pizzas  are  not  allowed  to  have   cheese  topping   •  The  ontology  has  not  been   populated  with  any  cheesy   pizza  yet   •  …   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     83
  • 84. Jeff  Z.  Pan  (University  of    Aberdeen)     RQ2: how to generate authoring tests from requirements RQ2’:  How  can  we  automaLcally  test  whether  a   CQ  can  be  meaningfully  answered?   84
  • 85. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Idea  2:  PresupposiAons  of  CQ   •  A  CQ  comes  with  certain   presupposi(ons   •  Some  condi(ons  the  speakers   assume  to  be  met   •  A  CQ  can  be  meaningfully   answered  only  when  its   presupposiLons  are  saLsfied   •  Classes  Pizza,  CheeseTopping   should  occur  in  the  ontology   •  Property  has(Topping)  should   occur  in  the  ontology   •  The  ontology  should  allow   Pizza  to  have  CheeseTopping   •  The  ontology  should  also   allow  Pizza  to  not  have   CheeseTopping   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     85
  • 86. Jeff  Z.  Pan  (University  of    Aberdeen)   CQs  and  Authoring  Tests   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     •  SaLsfiability  of  CQ   presupposiLons  can  be   verified  by  authoring  tests   generated  based  on  its   features  and  elements   •  Classes  Pizza,  CheeseTopping   should  occur  in  the  ontology   •  [CE1],  [CE2]  should  both  occur  in  the   class  vocabulary   •  Property  has(Topping)  should  occur   in  the  ontology   •   [OPE]  should  occur  in  the  property   vocabulry   •  The  ontology  should  allow  Pizza  to   have  CheeseTopping   •   should  be  sa6sfiable   •  The  ontology  should  also  allow   Pizza  to  not  have  CheeseTopping •   should  be  sa6sfiable   CE1   OPE   CE2   86
  • 87. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  3:  Associate  PresupposiAons  with   Features   • Features  in  a  CQ  are  associated  with  the   presupposiLons  of  the  CQ.     •  An  example  on  the  ques6on  type  feature:   QuesLon   Type   SelecLon   Boolean   CounLng   Occurrence  of  “Pizza”,  “Pork”,   “contains”   Which  pizza  contains  pork?   Can  pizza  contain  pork?   How  many  pizza  contains  pork?   Some  pizza  can  contain  pork   Some  pizza  can  contain  no  pork   87
  • 88. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  4:  Formal  Authoring  Tests   •  All  tesLngs  can  be  automated   88
  • 89. Jeff  Z.  Pan  (University  of    Aberdeen)   Class   Hierarchy   Verbalise r   Competency   QuesLons   User/System  Dialogue   History   User  Input   WhatIf  Gadget   89
  • 90. Jeff  Z.  Pan  (University  of    Aberdeen)   Input  (Manchester  Syntax)   1. User  selects  a  speech  act  by  clicking  or  selecLng  a   shortcut.   2. We  need  to  evaluate  their  usefulness.   3. Examples:   ●  Class:  Pizza  SubClassOf:  Food   ●  Class:  Fruit  DisjointWith:  Pizza     90
  • 91. Jeff  Z.  Pan  (University  of    Aberdeen)   Input  (OWL  Simplified  English)   1.  A  set  of  restricted  natural  language  paSerns.   2.  System  recognises  the  speech  act.   3.  Capable  of  accepLng  Competency  QuesLons.   4.  Examples:   ●  Which  pizza  has  topping  a  tomato  topping?   ●  An  apple  is  a  fruit.   91
  • 92. Jeff  Z.  Pan  (University  of    Aberdeen)   Modelling  User  Goals  (1)   1. Users  can  import  or  write  their  own  CQs  in  OWL   Simplified  English   2. Based  on  the  inserted  CQ,  a  list  of  Authoring  Tests   (ATs)  will  be  generated.   3. A  tree  structure  displays  these  CQs  and  ATs.   4. The  system  is  constantly  monitoring  these  CQs  and   ATs.  Any  change  in  saLsfiability  of  ATs:   a. Will  be  reported  by  changing  the  icon  of  ATs  in   the  tree.  Red/Green  respecLvely  represent  fail/ pass  of  each  AT.   b. Will  be  reported  in  the  “history”  pane.     92
  • 93. Jeff  Z.  Pan  (University  of    Aberdeen)   Modelling  User  Goals  (2)   CQ  +  AT  hierarchical  representaLon.   Icons  represent  the  saLsfiability  state   WriSen  feedback  presented  to  the   user  in  the  “history”  pane.   93
  • 94. Jeff  Z.  Pan  (University  of    Aberdeen)   Further  Challenges   ●  Maintaining a continuous and meaningful interaction with the user ●  Generating a coherent and comprehensive set of entailments in response to What-If questions ❖ Content selection ❖ Grouping and aggregation ❖ Ordering 94
  • 95. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Data  understanding •  Data summarisation •  Query generation   UNDERSTANDING  KNOWLEDGE  GRAPHS   95
  • 96. Jeff  Z.  Pan  (University  of    Aberdeen)   Data  Understanding:  A  Core  AcAvity  in  Data  ExploitaAon   •  TradiLonal  focus  in  semanLc  web  research:  data   understanding  for  machines  and  programs.   •  More  importantly:  Data  understanding  for  human   •  humans  are  the  ulLmate  owners  and  consumers  of   data   • systems  such  as  knowledge  graphs,  Watson,  Siri,  etc.   •  to  help  human  users  to  understand  the  contents,   implicaLons  and  applicaLons  of  data   • More  than  HCI,  we  want  interesLng  and  insighqul  data!   9696
  • 97. Jeff  Z.  Pan  (University  of    Aberdeen)   SemanAc  Datasets  Are  HARD  to  Understand   •  Non-expert users might not be familiar with RDF, OWL and SPARQL •  RDF(s) has 6 core documents •  OWL 2 has 6 core documents •  SPARQL 1.1 has 11 core documents •  Users are unfamiliar with datasets •  That are too large to explore •  That are external to their organisation •  … •  It is HARD for novice users to construct queries 9797
  • 98. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenges  of  Data  Understanding     • Challenges   •  Expressing  needs  (keywords/SPARQL)   •  Describing  datasets   •  Only  retrieve  the  relevant  parts   • 9.96%  SPARQL  /  8.19%  DUMP  
  • 99. Jeff  Z.  Pan  (University  of    Aberdeen)   SoluAon  –  Summary  based  profiling  for  LD   •  Key  idea:  building  block  based  informaLon  space   modelling   •  Decomposing  &  ConstrucLng  
  • 100. Jeff  Z.  Pan  (University  of    Aberdeen)   The  philosophy  of  interpreAng  informaAon     • Task:  explain  the  data  to  human  users         Entity Centric
  • 101. Jeff  Z.  Pan  (University  of    Aberdeen)   EnAty-­‐centric  View  of  RDF  Data   En6ty  Descrip6on  Block  
  • 102. Jeff  Z.  Pan  (University  of    Aberdeen)   Concrete  to  abstract     En6ty  Descrip6on  Pa?ern  
  • 103. Jeff  Z.  Pan  (University  of    Aberdeen)   Data  SummarisaAon  –  EDP  Graph •  Reveal  the  schema  level  informaLon   •  What  concepts  are  there  (nodes)and  how  they  are  related  to  each   other(edges)?   •  Disclose    individual  level  distribuLon   •  StaAsAcs  aSached  to  nodes  and  edges   Jamendo  dataset  
  • 104. Jeff  Z.  Pan  (University  of    Aberdeen)   Understanding  Data  Redundancy   [Wu  et.  al,  2014]   104
  • 105. Jeff  Z.  Pan  (University  of    Aberdeen)   Related  Paper  at  JIST2014   • Graph  PaSern  based  RDF  Data  Compression   Jeff  Z.  Pan,  Jose  Manuel  Gomez-­‐Perez,  Yuan  Ren,   Honghan  Wu,  Haofen  Wang  and  Man  Zhu   • (Monday  aiernoon)   105
  • 106. Jeff  Z.  Pan  (University  of    Aberdeen)   Understanding  How  Data  Can  be  Used   •  Given  a  knowledge  graph,  generate  candidate   insighqul  queries   •  Manual  generaLon/automaLc  generaLon   •  GeneraLon  based  on  schema/actual  data   •  With/without  user  interference   •  Our  aim:  automaLc  generaLon  based  on  data   without  user  interference   •  Most  friendly  to  new,  novice  users   •  Complementary  to  inference  (heavily  based  on   schema)   106106
  • 107. Jeff  Z.  Pan  (University  of    Aberdeen)   Candidate  Insighpul  Queries   [Pan,  et  al,  2013]   •  Graph  paSerns  are  summarisaLons  that  represent   many  subsets  of  the  RDF  graph     •  PaSern  structure   •  Structured  knowledge,  which  is  difficult  to  express  with   schema   •  Such  as  star,  chain,  tree,  loop   •  Correspondences  between  mulLple  graph   paSerns   •  Strongly  corresponding  paSerns  (large  overlapping)   •  Weakly  corresponding  paSerns  (liSle  overlapping)   •  ExcepLons     107
  • 108. Jeff  Z.  Pan  (University  of    Aberdeen)   Query  GeneraAon  Framework   •  1.  data  summarisaLon   •  Significantly  decrease  the   search  space  in  rule  mining   •  2.  data  analyLcs   •  First  order  inducLve   learning   •  AssociaLon  rule  mining   •  3.  query  generaLon   •  ExploiLng  the  relaLons   between  queries  and  rules  
  • 109. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon   109
  • 110. Jeff  Z.  Pan  (University  of    Aberdeen)   Another  Example   • Given  university  data  set  in  LUBM,  the   following  two  queries  have  the  same  results   (when  no  reasoning  is  applied)  
  • 111. Jeff  Z.  Pan  (University  of    Aberdeen)   Summary  and  Future  Work   • Take  home  message   •  Data  summarisaLon  and  data  analyLcs   technologies  not  only  help  people  to  find  answers,   but  also  help  people  asking  quesLons!   • Future  work   •  Integrate  with  applicaLon  scenario  background   knowledge   •  Integrate  with  reasoning   •  Integrate  with  user  preferences  
  • 112. Jeff  Z.  Pan  (University  of    Aberdeen)   OUTLOOK   Outlook of Knowledge Graph: from application’s point of view 112
  • 113. Jeff  Z.  Pan  (University  of    Aberdeen)   What knowledge graph still needs: •  “How to…” knowledge in addition to “What is …” knowledge •  Operations associated to the entities Outlook   What knowledge graph is good at: Maintaining factual knowledge in a structural manner and answer queries about them 113
  • 114. JIST2014  Tutorial  on     ConstrucAng  and  Understanding   Knowledge  Graphs     Thanks  you!          QuesAons?