SlideShare a Scribd company logo
1 of 77
Talking to your Data: 
Natural Language Interfaces for a 
schema-less world 
André Freitas 
NLIWoD at ISWC 2014 
Riva del Garda
Outline 
 Shift in the Database Landscape 
 On Schema-agnosticism & Semantics 
 Distributional Semantics to the Help 
 Case Study: Treo QA System 
 Living in a Schema-less World 
 Take-away Message
Shift in the Database 
Landscape 
3
Big Data (Data Variety) 
 Vision: More complete data-based picture of the world for 
systems and users. 
4
The Long Tail of Data Variety
The Long Tail of Data Variety 
6
Data variety + 
Data 
Programs 
Full knowledge 
Full data coverage 
Full automation 
The Long Tail of Data Variety 
7
Data variety + 
Data 
Programs 
Full knowledge 
Full data coverage 
Full automation 
The Long Tail of Data Variety 
Data generation 
8
Very-large and dynamic “schemas” 
10s-100s attributes 
1,000s-1,000,000s attributes 
circa 2000 
circa 2014 
9
Semantic Heterogeneity 
 Decentralized content generation. 
 Multiple perspectives (conceptualizations) of the reality. 
 Ambiguity, vagueness, inconsistency. 
10
Data variety + 
Data 
Programs 
Full knowledge 
Full data coverage 
Full automation 
The Long Tail of Data Variety 
Data consumption 
Data generation 
11
Databases for a Complex World 
How do you query data at this scale? 
12
Schema-agnosticism 
Abstraction 
Layer 
User 
13
First-level independency 
(Relational Model) 
“… it provides a basis for a high level data language which 
will yield maximal independence between programs on 
the one hand and representation and organization of data 
on the other” 
Codd, 1970 
Second-level independency 
(Schema-agnosticism) 
14
On Schema-agnosticism 
& semantics 
15
Vocabulary Problem for Databases 
Query: Who is the daughter of Bill Clinton married to? 
Semantic Gap 
Possible representations 
Schema-agnostic query 
mechanisms 
 Abstraction level differences 
 Lexical variation 
 Structural (compositional) differences 
 Operational/functional differences 
16
Robust Semantic Model 
 Semantic intelligent behaviour is highly dependent on 
knowledge scale (commonsense, semantic) 
Semantics 
= 
Formal meaning representation model 
(lots of data) 
+ 
inference model 
17
Robust Semantic Model 
 Not scalable! 
1st Hard problem: Acquisition 
Semantics 
= 
Formal meaning representation model 
(lots of data) 
+ 
inference model 
18
Robust Semantic Model 
 Not scalable! 
2nd Hard problem: Consistency 
Semantics 
= 
Formal meaning representation model 
(lots of data) 
+ 
inference model 
19
Semantics for a Complex World 
 “Most semantic models have dealt with particular types of 
constructions, and have been carried out under very simplifying 
assumptions, in true lab conditions.” 
 “If these idealizations are removed it is not clear at all that modern 
semantics can give a full account of all but the simplest 
models/statements.” 
Formal World Real World 
Baroni et al. 2013 
20
Distributional Semantic Models 
 Semantic Model with low acquisition effort 
(automatically built from text) 
Simplification of the representation 
 Enables the construction of comprehensive 
commonsense/semantic KBs 
 What is the cost? 
Some level of noise 
(semantic best-effort) 
21
Distributional Hypothesis 
“Words occurring in similar (linguistic) contexts tend 
to be semantically similar” 
 He filled the wampimuk with the substance, passed it 
around and we all drunk some 
22
Distributional Semantic Models (DSMs) 
“The dog barked in the park. The owner of the dog put him on the 
leash since he barked.” 
contexts = nouns and verbs in the same 
sentence 
23
Distributional Semantic Models (DSMs) 
“The dog barked in the park. The owner of the dog put him on the 
leash since he barked.” 
bark 
dog 
park 
leash 
contexts = nouns and verbs in the same 
sentence 
bark : 2 
park : 1 
leash : 1 
owner : 1 
24
Distributional Semantic Models (DSMs) 
Context 
car 
dog 
bark 
run 
leash 
25
Semantic Similarity & Relatedness 
dog 
car 
bark 
run 
leash 
26 
Query: cat
Semantic Similarity & Relatedness 
θ 
dog 
cat 
car 
bark 
run 
leash 
27 
Query: cat
DSMs as Commonsense Reasoning 
Commonsense is here 
θ 
car 
dog 
cat 
bark 
run 
leash 
Semantic Approximation is here 
28
DSMs as Commonsense Reasoning 
θ 
car 
dog 
cat 
bark 
run 
leash 
... 
vs. 
Semantic best-effort
Case Study: Treo QA 
System 
30
Approach Overview 
Query Query Analysis Query Features 
Query Planner 
Ƭ-Space 
Large-scale 
unstructured data 
Query Plan 
Structured 
Data 
Commonsense 
knowledge 
Distributional 
semantics 
Core semantic approximation & 
composition operations 
31
Approach Overview 
Query Query Analysis Query Features 
Query Planner 
Ƭ-Space 
Wikipedia 
Query Plan 
RDF Data 
Explicit Semantic 
Analysis (ESA) 
Core semantic approximation & 
composition operations 
Commonsense 
knowledge 
32
Ƭ-Space 
e 
p 
r 
33
Core Operations 
Search & 
Composition 
Operations 
Query 
34
Does it work? 
35
Addressing the Vocabulary Problem for 
Databases (with Distributional Semantics) 
Gaelic: direction 
36
Solution (Video) 
37
More Complex Queries (Video) 
38
Treo Answers Jeopardy Queries (Video) 
39 http://bit.ly/1hWcch9
Relevance 
 Test Collection: QALD 2011. 
 DBpedia. 
Dataset (DBpedia + YAGO links): 45,767 predicates, 9,434,677 
instances, more than 200,000 classes 
40
Query Pre-Processing 
(Question Analysis) 
 Transform natural language queries into triple 
patterns. 
“Who is the daughter of Bill Clinton married to?” 
41
Query Pre-Processing 
(Question Analysis) 
 Step 1: POS Tagging 
- Who/WP 
- is/VBZ 
- the/DT 
- daughter/NN 
- of/IN 
- Bill/NNP 
- Clinton/NNP 
- married/VBN 
- to/TO 
- ?/. 
42
Query Pre-Processing 
(Question Analysis) 
 Step 2: Core Entity Recognition 
- Rules-based: POS Tag + TF/IDF 
Who is the daughter of Bill Clinton married to? 
(PROBABLY AN INSTANCE) 
43
Query Pre-Processing 
(Question Analysis) 
Step 3: Determine answer type 
Rules-based. 
Who is the daughter of Bill Clinton married to? 
(PERSON) 
44
Query Pre-Processing 
(Question Analysis) 
 Step 4: Dependency parsing 
- dep(married-8, Who-1) 
- auxpass(married-8, is-2) 
- det(daughter-4, the-3) 
- nsubjpass(married-8, daughter-4) 
- prep(daughter-4, of-5) 
- nn(Clinton-7, Bill-6) 
- pobj(of-5, Clinton-7) 
- root(ROOT-0, married-8) 
- xcomp(married-8, to-9) 
45
Query Pre-Processing 
(Question Analysis) 
 Step 5: Determine Partial Ordered Dependency Structure 
(PODS) 
- Rules based. 
• Remove stop words. 
• Merge words into entities. 
• Reorder structure from core entity position. 
Bill Clinton daughter married to 
46 
(INSTANCE) 
ANSWER 
TYPE 
Person 
Lower level of ambiguity, QUESTION FOCUS 
vagueness, synonimy
Question Analysis 
Transform natural language queries into triple 
patterns 
“Who is the daughter of Bill Clinton married to?” 
Bill Clinton daughter married to 
PODS 
(INSTANCE) (PREDICATE) (PREDICATE) Query Features 
47
Query Plan 
Map query features into a query plan. 
A query plan contains a sequence of core operations. 
(INSTANCE) (PREDICATE) (PREDICATE) Query Features 
Query Plan 
 (1) INSTANCE SEARCH (Bill Clinton) 
 (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter) 
 (3) e1 <- NAVIGATE (Bill Clintion, p1) 
 (4) p2 <- SEARCH PREDICATE (e1, married to) 
 (5) e2 <- NAVIGATE (e1, p2) 
48
Instance Search 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
Instance Search 
49
Predicate Search 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
:Chelsea_Clinton 
:child 
:Baptists 
:religion 
:almaMater 
:Yale_Law_School 
... 
(PIVOT ENTITY) 
(ASSOCIATED 
TRIPLES) 
50
Predicate Search 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
Which properties are semantically related to ‘daughter’? 
:Chelsea_Clinton 
:child 
:Baptists 
:religion 
:almaMater 
:Yale_Law_School 
... 
sem_rel(daughter,child)=0.054 
sem_rel(daughter,child)=0.004 
sem_rel(daughter,alma mater)=0.001 
51
Predicate Search 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
Which properties are semantically related to ‘daughter’? 
(In the context of Bill Clinton) 
:Chelsea_Clinton 
:child 
:Baptists 
:religion 
:almaMater 
:Yale_Law_School 
... 
sem_rel(daughter,child)=0.054 
sem_rel(daughter,child)=0.004 
sem_rel(daughter,alma mater)=0.001 
52
Navigate 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
:Chelsea_Clinton 
:child 
53
Navigate 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
:Chelsea_Clinton 
:child 
(PIVOT ENTITY) 
54
Predicate Search 
Bill Clinton daughter married to 
:Bill_Clinton 
Query: 
Linked 
Data: 
:Chelsea_Clinton 
:child 
(PIVOT ENTITY) 
:Mark_Mezvinsky 
:spouse 
55
Results 
56
Core Principles 
 Minimize the impact of Ambiguity, Vagueness, Synonymy with 
semantic pivoting. 
 Semantic pivoting: Address the simplest matchings first 
(heuristics). 
 Semantic Relatedness as a primitive semantic approximation 
operation. 
 Distributional semantics as commonsense/semantic 
knowledge. 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional- 
Compositional Semantics Approach, IUI 2014
Living in a 
Schema-less World 
58
How do we build systems today? 
Structure the domain 
59
How do we build systems today? 
Generalize and encode some rules
How do we build systems today? 
Allow some constrained interaction 
Query is here 
61
Siloed Systems 
62
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
63
Linked Data: Datasets are easier to integrate and to 
consume (data model level). However, the semantic 
barrier for consumption is still there
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
65
Distributional DBMS 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional- 
Compositional Semantics Approach, IUI 2014
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
67
Simplification of Information Extraction 
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs, WoLE, 2012
Simplification of Information Extraction 
General Electric Company, or GE , is an American multinational conglomerate 
corporation incorporated in Schenectady , New York 
69
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
70
Schema-agnostic programs 
Towards An Approximative Ontology-Agnostic Approach for Logic Programs, FOIKS 2014
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
72
Reasoning with Distributional Semantics 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph 
Knowledge Bases, NLDB 2014
Data variety + 
Data 
Full knowledge 
Full data coverage 
Full automation 
74
Take-away Message 
 Existing semantic technologies can address today major data 
management problems 
 Muiti-disciplinarity is one key (and NLI people are very good at it!): 
- NLP + IR + Semantic Web + Databases 
 Schema-agnosticism is a central property/functionality/goal! 
 Distributional Semantics + semantics of structured data = 
schema-agnosticism 
 Schema-agnosticism brings major impact for information systems. 
 We can tame the long tail of data variety! 
 The wave is just starting. Be a part of it! 
75
Want to play with Distributional 
Semantics? 
http://easy-esa.org 
76
Any Queries?

More Related Content

Similar to Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

A Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataA Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataAndre Freitas
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Andre Freitas
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Artificial Intelligence Institute at UofSC
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Daniel Katz
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?Szymon Klarman
 

Similar to Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014) (20)

A Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataA Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured Data
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Mpi talk
Mpi talkMpi talk
Mpi talk
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 

More from Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAndre Freitas
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...Andre Freitas
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2Andre Freitas
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?Andre Freitas
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackAndre Freitas
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study Andre Freitas
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional SemanticsAndre Freitas
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
 

More from Andre Freitas (20)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category Descriptors
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

  • 1. Talking to your Data: Natural Language Interfaces for a schema-less world André Freitas NLIWoD at ISWC 2014 Riva del Garda
  • 2. Outline  Shift in the Database Landscape  On Schema-agnosticism & Semantics  Distributional Semantics to the Help  Case Study: Treo QA System  Living in a Schema-less World  Take-away Message
  • 3. Shift in the Database Landscape 3
  • 4. Big Data (Data Variety)  Vision: More complete data-based picture of the world for systems and users. 4
  • 5. The Long Tail of Data Variety
  • 6. The Long Tail of Data Variety 6
  • 7. Data variety + Data Programs Full knowledge Full data coverage Full automation The Long Tail of Data Variety 7
  • 8. Data variety + Data Programs Full knowledge Full data coverage Full automation The Long Tail of Data Variety Data generation 8
  • 9. Very-large and dynamic “schemas” 10s-100s attributes 1,000s-1,000,000s attributes circa 2000 circa 2014 9
  • 10. Semantic Heterogeneity  Decentralized content generation.  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsistency. 10
  • 11. Data variety + Data Programs Full knowledge Full data coverage Full automation The Long Tail of Data Variety Data consumption Data generation 11
  • 12. Databases for a Complex World How do you query data at this scale? 12
  • 14. First-level independency (Relational Model) “… it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and representation and organization of data on the other” Codd, 1970 Second-level independency (Schema-agnosticism) 14
  • 15. On Schema-agnosticism & semantics 15
  • 16. Vocabulary Problem for Databases Query: Who is the daughter of Bill Clinton married to? Semantic Gap Possible representations Schema-agnostic query mechanisms  Abstraction level differences  Lexical variation  Structural (compositional) differences  Operational/functional differences 16
  • 17. Robust Semantic Model  Semantic intelligent behaviour is highly dependent on knowledge scale (commonsense, semantic) Semantics = Formal meaning representation model (lots of data) + inference model 17
  • 18. Robust Semantic Model  Not scalable! 1st Hard problem: Acquisition Semantics = Formal meaning representation model (lots of data) + inference model 18
  • 19. Robust Semantic Model  Not scalable! 2nd Hard problem: Consistency Semantics = Formal meaning representation model (lots of data) + inference model 19
  • 20. Semantics for a Complex World  “Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.”  “If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.” Formal World Real World Baroni et al. 2013 20
  • 21. Distributional Semantic Models  Semantic Model with low acquisition effort (automatically built from text) Simplification of the representation  Enables the construction of comprehensive commonsense/semantic KBs  What is the cost? Some level of noise (semantic best-effort) 21
  • 22. Distributional Hypothesis “Words occurring in similar (linguistic) contexts tend to be semantically similar”  He filled the wampimuk with the substance, passed it around and we all drunk some 22
  • 23. Distributional Semantic Models (DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” contexts = nouns and verbs in the same sentence 23
  • 24. Distributional Semantic Models (DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” bark dog park leash contexts = nouns and verbs in the same sentence bark : 2 park : 1 leash : 1 owner : 1 24
  • 25. Distributional Semantic Models (DSMs) Context car dog bark run leash 25
  • 26. Semantic Similarity & Relatedness dog car bark run leash 26 Query: cat
  • 27. Semantic Similarity & Relatedness θ dog cat car bark run leash 27 Query: cat
  • 28. DSMs as Commonsense Reasoning Commonsense is here θ car dog cat bark run leash Semantic Approximation is here 28
  • 29. DSMs as Commonsense Reasoning θ car dog cat bark run leash ... vs. Semantic best-effort
  • 30. Case Study: Treo QA System 30
  • 31. Approach Overview Query Query Analysis Query Features Query Planner Ƭ-Space Large-scale unstructured data Query Plan Structured Data Commonsense knowledge Distributional semantics Core semantic approximation & composition operations 31
  • 32. Approach Overview Query Query Analysis Query Features Query Planner Ƭ-Space Wikipedia Query Plan RDF Data Explicit Semantic Analysis (ESA) Core semantic approximation & composition operations Commonsense knowledge 32
  • 33. Ƭ-Space e p r 33
  • 34. Core Operations Search & Composition Operations Query 34
  • 36. Addressing the Vocabulary Problem for Databases (with Distributional Semantics) Gaelic: direction 36
  • 38. More Complex Queries (Video) 38
  • 39. Treo Answers Jeopardy Queries (Video) 39 http://bit.ly/1hWcch9
  • 40. Relevance  Test Collection: QALD 2011.  DBpedia. Dataset (DBpedia + YAGO links): 45,767 predicates, 9,434,677 instances, more than 200,000 classes 40
  • 41. Query Pre-Processing (Question Analysis)  Transform natural language queries into triple patterns. “Who is the daughter of Bill Clinton married to?” 41
  • 42. Query Pre-Processing (Question Analysis)  Step 1: POS Tagging - Who/WP - is/VBZ - the/DT - daughter/NN - of/IN - Bill/NNP - Clinton/NNP - married/VBN - to/TO - ?/. 42
  • 43. Query Pre-Processing (Question Analysis)  Step 2: Core Entity Recognition - Rules-based: POS Tag + TF/IDF Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE) 43
  • 44. Query Pre-Processing (Question Analysis) Step 3: Determine answer type Rules-based. Who is the daughter of Bill Clinton married to? (PERSON) 44
  • 45. Query Pre-Processing (Question Analysis)  Step 4: Dependency parsing - dep(married-8, Who-1) - auxpass(married-8, is-2) - det(daughter-4, the-3) - nsubjpass(married-8, daughter-4) - prep(daughter-4, of-5) - nn(Clinton-7, Bill-6) - pobj(of-5, Clinton-7) - root(ROOT-0, married-8) - xcomp(married-8, to-9) 45
  • 46. Query Pre-Processing (Question Analysis)  Step 5: Determine Partial Ordered Dependency Structure (PODS) - Rules based. • Remove stop words. • Merge words into entities. • Reorder structure from core entity position. Bill Clinton daughter married to 46 (INSTANCE) ANSWER TYPE Person Lower level of ambiguity, QUESTION FOCUS vagueness, synonimy
  • 47. Question Analysis Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?” Bill Clinton daughter married to PODS (INSTANCE) (PREDICATE) (PREDICATE) Query Features 47
  • 48. Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE) Query Features Query Plan  (1) INSTANCE SEARCH (Bill Clinton)  (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)  (3) e1 <- NAVIGATE (Bill Clintion, p1)  (4) p2 <- SEARCH PREDICATE (e1, married to)  (5) e2 <- NAVIGATE (e1, p2) 48
  • 49. Instance Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: Instance Search 49
  • 50. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :almaMater :Yale_Law_School ... (PIVOT ENTITY) (ASSOCIATED TRIPLES) 50
  • 51. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: Which properties are semantically related to ‘daughter’? :Chelsea_Clinton :child :Baptists :religion :almaMater :Yale_Law_School ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 51
  • 52. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: Which properties are semantically related to ‘daughter’? (In the context of Bill Clinton) :Chelsea_Clinton :child :Baptists :religion :almaMater :Yale_Law_School ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 52
  • 53. Navigate Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child 53
  • 54. Navigate Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY) 54
  • 55. Predicate Search Bill Clinton daughter married to :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY) :Mark_Mezvinsky :spouse 55
  • 57. Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy with semantic pivoting.  Semantic pivoting: Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive semantic approximation operation.  Distributional semantics as commonsense/semantic knowledge. Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional- Compositional Semantics Approach, IUI 2014
  • 58. Living in a Schema-less World 58
  • 59. How do we build systems today? Structure the domain 59
  • 60. How do we build systems today? Generalize and encode some rules
  • 61. How do we build systems today? Allow some constrained interaction Query is here 61
  • 63. Data variety + Data Full knowledge Full data coverage Full automation 63
  • 64. Linked Data: Datasets are easier to integrate and to consume (data model level). However, the semantic barrier for consumption is still there
  • 65. Data variety + Data Full knowledge Full data coverage Full automation 65
  • 66. Distributional DBMS Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional- Compositional Semantics Approach, IUI 2014
  • 67. Data variety + Data Full knowledge Full data coverage Full automation 67
  • 68. Simplification of Information Extraction A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs, WoLE, 2012
  • 69. Simplification of Information Extraction General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York 69
  • 70. Data variety + Data Full knowledge Full data coverage Full automation 70
  • 71. Schema-agnostic programs Towards An Approximative Ontology-Agnostic Approach for Logic Programs, FOIKS 2014
  • 72. Data variety + Data Full knowledge Full data coverage Full automation 72
  • 73. Reasoning with Distributional Semantics A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases, NLDB 2014
  • 74. Data variety + Data Full knowledge Full data coverage Full automation 74
  • 75. Take-away Message  Existing semantic technologies can address today major data management problems  Muiti-disciplinarity is one key (and NLI people are very good at it!): - NLP + IR + Semantic Web + Databases  Schema-agnosticism is a central property/functionality/goal!  Distributional Semantics + semantics of structured data = schema-agnosticism  Schema-agnosticism brings major impact for information systems.  We can tame the long tail of data variety!  The wave is just starting. Be a part of it! 75
  • 76. Want to play with Distributional Semantics? http://easy-esa.org 76