SlideShare a Scribd company logo
1 of 22
Download to read offline
“30 are better than one”

 Query-Driven Hypothesis Generation for
  Answering Queries over NLP Graphs

   Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora


  Tex
             Answering Conjunctive SPARQL Queries over NLP Graphs
Approach
         the NLP process is not a one-shot deal
to decrease the cost of maintaining critical system DBs
 the query provides context without changing the LSW
 can we replace the human for what the user is seeking
     andcan we build a machine re-interpret the text
         thus an opportunity to reader for this



                         NLP                   NLP Stack	

                        Graphs	


         query	

                                                   re-interpret	





                    Answering Conjunctive SPARQL Queries over NLP Graphs
NLP Stack
•  Contains NER, CoRef, RelEx, entity disambiguation
  •  RelEx: SVM learner with output score: probabilities/
        confidences for each known relation that the
        sentence expresses it between each pair of
        mentions
•  Run over target corpus producing NLP graph
  •  nodes are entities (clusters of mentions produced
        by coref)
   •  edges are type statements between entities and
        classes in the ontology, or relations detected
        between mentions of these entities in the corpus

 Answering Conjunctive SPARQL Queries over NLP Graphs
NLP Graph

                      citizenOf
        Person                         Country
                        citizenOf

          …	
  Mr.	
  X	
  of	
  India	
  …	
  
                           coref

…	
  in	
  places	
  like	
  India,	
  Iraq,	
  …	
  
                                    citizenOf
                            GPE                   Country
                                    subPlace
NLP Graph

                    citizenOf
Person                                  Country

         Mr.	
  X	
             India	
  
                        coref


                            India	
         Iraq	
  
                         GPE                       Country
                                    subPlace
NLP Graph

                          Mr. X                                    rdf:type	

     rdf:type	


                         citizenOf	

          India                                    Country
                                                 India
Person
                   GPE                                                    rdf:type	

                                                 subPlaceOf	

                                 rdf:type	

                                                                         Iraq



                                               rdf:subClassOf
Relation Extraction by RelEx
•    RelEx: a set of SVM binary classifiers, one per relation
•    for each sentence in the corpus,
     •    for each pair of mentions in that sentence,
          •    for each known relation
               •    produce a probability that that pair is related by the
                    relation
•    NLP graphs are generated by selecting relations from RelEx output
     in two ways:
     •    Primary: takes only the top scoring relation between any
          mention pair above a confidence threshold
     •    Secondary: takes all relations between all mention pairs above
          a threshold

                         Answering Conjunctive SPARQL Queries over NLP Graphs
RelEx Secondary Graph

                           Mr. X                                    rdf:type	

     rdf:type	


                               causes	

                          locatedIn	

                         subPlaceOf	

                          citizenOf	

          India                                    Country
                                                  India
Person
                   GPE                                                     rdf:type	

                                                  subPlaceOf	

                                  rdf:type	

                                                                          Iraq



                                                rdf:subClassOf
Primary vs. Secondary

                 P          R               F

Primary @ 0.1   0.19      0.39            0.26

Primary @ 0.2   0.29      0.33            0.30

Secondary @ 0   0.01      0.95            0.02


                       Recall of max-F configuration
Conjunctive Queries
find all terrorist organizations that were agents of bombings
               in Lebanon on October 23, 1983:

            	
  
            SELECT	
  ?t	
  
            WHERE	
  {	
  
                    ?t	
  rdf:type	
  mric:TerroristOrganization	
  .	
  
                    ?b	
  rdf:type	
  mric:Bombing	
  .	
  
R	
  =	
  .65	
   ?b	
  mric:mediatingAgent	
  ?t	
  .	
  
R	
  =	
  .09	
   ?b	
  mric:eventLocation	
  mric:Lebanon	
  .	
  
R	
  =	
  .97	
   ?b	
  mric:eventDate	
  "1983-­‐10-­‐23"	
  .	
  
            	
  	
  }	
  
R	
  =	
  .057	
  
            	
  
Problem with Conjunctive Queries
     n
•  [Π Recall(Rk) ] x Recallcoref
    k=1


•  Recall for n term query O(Recalln)
•  for complex queries Recall becomes
   dominating factor
•  in our experiments: query recall <.1 for n>3
•  To get any particular correct answer, all NLP
   components had to get it right
Hypothesis Generation
•  For queries of size N
   –  For each term
       •  relax the query by removing the term H
       •  for each solution
           –  bind the variables in H from the solution forming a hypothesis

   –  If no solutions for size N-1 are found, then try for N-2



•  appropriate for queries that are almost answerable,
   e.g. missing one of the terms
•  biased towards generating more answers to queries,
   e.g. perform poorly on queries for which the corpus
   does not contain the answer
mric:bombing	
  
                       mric:TerroristOrganiza=on	
  


                                                                                    rdf:type
                                   rdf:type


SELECT	
  ?t	
                         t	
         mric:mediatingAgent                 b	
  
WHERE	
  {	
  
                ?t	
  rdf:type	
  mric:TerroristOrganization	
  .	
  
                ?b	
  rdf:type	
  mric:Bombing	
  .	
                               mric:eventLocation
                ?b	
  mric:mediatingAgent	
  ?t	
  .	
             mric:eventDate
                ?b	
  mric:eventLocation	
  mric:Lebanon	
  .	
  
                ?b	
  mric:eventDate	
  "1983-­‐10-­‐23"	
  .	
  
	
  	
  }	
                                                                                    mric:Lebanon	
  
                                                           1983-­‐10-­‐23	
  


         find all bombings by terrorist orgs in Lebanon
         (hypothesize that the bombings were on 1983-10-23)
This subgraph matches
find all bombings by                                          the relaxed query	
  
terrorist orgs in Lebanon




                                           mric:org-­‐16	
                            mric:event-­‐3	
  


                                                                     mric:eventDate



                                                       1983-­‐10-­‐23	
  



               hypothesize that event-3 was on 1983-10-23	
  
Hypothesis Validation
•  Once generated, a hypothesis must be validated
   –  gather evidence that it is true
   –  the probability of a triple being true increases


•  We utilize a stack of hypothesis checkers that provide
   –  confidence whether a hypothesis holds
   –  provenance: a pointer to a span of text that supports it

•  Can be used to bind complex computational tasks
   –  e.g. formal reasoning/choosing between low-confidence extractions
   –  such tasks are made more tractable by using hypotheses as
      goals, e.g. a reasoner may be used effectively by constraining to
      only a part of the graph connected to a hypothesis
Secondary Graph for Validation
•  Hypotheses can be validated by looking for the
   tuple in the secondary graph
   •  a tuple will appear in SG if the subject and object entities
      occur in the same sentence somewhere in the corpus



•  With precision at .02, it is important to find a
   productive threshold for accepting hypotheses
   •  we conducted several experiments to find this threshold
Experiments
•  3 for dev, 3 for test


•  each experiment compares query results from
   only PG to query results using the PG+SG for
   hypothesis validation


•  the three experiments compare performance
   at different primary graph thresholds
0-threshold primary graph
    with & without secondary graph
                          secondary graph: all@0




for a given PG threshold we vary the SG threshold for validated hypotheses (x-axis)
.1-threshold primary graph
      with & without secondary graph
                            secondary graph: all@0


best performance point
  (.01 SG threshold)




red line indicates the PG threshold - the PG-only flattens below this threshold as expected
.2-threshold primary graph
   with & without secondary graph
                        secondary graph: all@0

best performance point
  (.01 SG threshold)



   if a triple in the SG completes a query that is mostly answered by the PG
                             it is very likely to be true


     the best performing configuration for dev is .2 threshold PG with SG
                    hypotheses validated at .01 threshold
Performance




                          Text

 the difference at the chosen threshold on the test set
significantly outperforms the baseline on the same set
Conclusions
•  the secondary graph can be exploited for getting answers

•  the probability that a relation is true between two entities
   increases significantly when that relation completes a query
   answer that is partially satisfied in the primary graph

•  able to target discarded interpretations when they will meet
   some user need

•  the NLP process is not a one-shot deal, the query provides
   context for what the user is seeking and thus an opportunity to
   re-interpret the text

More Related Content

More from Lora Aroyo

CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumLora Aroyo
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyLora Aroyo
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...Lora Aroyo
 
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumLora Aroyo
 
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Lora Aroyo
 

More from Lora Aroyo (20)

CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
 
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
 
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

  • 1. “30 are better than one” Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora Tex Answering Conjunctive SPARQL Queries over NLP Graphs
  • 2. Approach the NLP process is not a one-shot deal to decrease the cost of maintaining critical system DBs the query provides context without changing the LSW can we replace the human for what the user is seeking andcan we build a machine re-interpret the text thus an opportunity to reader for this NLP NLP Stack Graphs query re-interpret Answering Conjunctive SPARQL Queries over NLP Graphs
  • 3. NLP Stack •  Contains NER, CoRef, RelEx, entity disambiguation •  RelEx: SVM learner with output score: probabilities/ confidences for each known relation that the sentence expresses it between each pair of mentions •  Run over target corpus producing NLP graph •  nodes are entities (clusters of mentions produced by coref) •  edges are type statements between entities and classes in the ontology, or relations detected between mentions of these entities in the corpus Answering Conjunctive SPARQL Queries over NLP Graphs
  • 4. NLP Graph citizenOf Person Country citizenOf …  Mr.  X  of  India  …   coref …  in  places  like  India,  Iraq,  …   citizenOf GPE Country subPlace
  • 5. NLP Graph citizenOf Person Country Mr.  X   India   coref India   Iraq   GPE Country subPlace
  • 6. NLP Graph Mr. X rdf:type rdf:type citizenOf India Country India Person GPE rdf:type subPlaceOf rdf:type Iraq rdf:subClassOf
  • 7. Relation Extraction by RelEx •  RelEx: a set of SVM binary classifiers, one per relation •  for each sentence in the corpus, •  for each pair of mentions in that sentence, •  for each known relation •  produce a probability that that pair is related by the relation •  NLP graphs are generated by selecting relations from RelEx output in two ways: •  Primary: takes only the top scoring relation between any mention pair above a confidence threshold •  Secondary: takes all relations between all mention pairs above a threshold Answering Conjunctive SPARQL Queries over NLP Graphs
  • 8. RelEx Secondary Graph Mr. X rdf:type rdf:type causes locatedIn subPlaceOf citizenOf India Country India Person GPE rdf:type subPlaceOf rdf:type Iraq rdf:subClassOf
  • 9. Primary vs. Secondary P R F Primary @ 0.1 0.19 0.39 0.26 Primary @ 0.2 0.29 0.33 0.30 Secondary @ 0 0.01 0.95 0.02 Recall of max-F configuration
  • 10. Conjunctive Queries find all terrorist organizations that were agents of bombings in Lebanon on October 23, 1983:   SELECT  ?t   WHERE  {   ?t  rdf:type  mric:TerroristOrganization  .   ?b  rdf:type  mric:Bombing  .   R  =  .65   ?b  mric:mediatingAgent  ?t  .   R  =  .09   ?b  mric:eventLocation  mric:Lebanon  .   R  =  .97   ?b  mric:eventDate  "1983-­‐10-­‐23"  .      }   R  =  .057    
  • 11. Problem with Conjunctive Queries n •  [Π Recall(Rk) ] x Recallcoref k=1 •  Recall for n term query O(Recalln) •  for complex queries Recall becomes dominating factor •  in our experiments: query recall <.1 for n>3 •  To get any particular correct answer, all NLP components had to get it right
  • 12. Hypothesis Generation •  For queries of size N –  For each term •  relax the query by removing the term H •  for each solution –  bind the variables in H from the solution forming a hypothesis –  If no solutions for size N-1 are found, then try for N-2 •  appropriate for queries that are almost answerable, e.g. missing one of the terms •  biased towards generating more answers to queries, e.g. perform poorly on queries for which the corpus does not contain the answer
  • 13. mric:bombing   mric:TerroristOrganiza=on   rdf:type rdf:type SELECT  ?t   t   mric:mediatingAgent b   WHERE  {   ?t  rdf:type  mric:TerroristOrganization  .   ?b  rdf:type  mric:Bombing  .   mric:eventLocation ?b  mric:mediatingAgent  ?t  .   mric:eventDate ?b  mric:eventLocation  mric:Lebanon  .   ?b  mric:eventDate  "1983-­‐10-­‐23"  .      }   mric:Lebanon   1983-­‐10-­‐23   find all bombings by terrorist orgs in Lebanon (hypothesize that the bombings were on 1983-10-23)
  • 14. This subgraph matches find all bombings by the relaxed query   terrorist orgs in Lebanon mric:org-­‐16   mric:event-­‐3   mric:eventDate 1983-­‐10-­‐23   hypothesize that event-3 was on 1983-10-23  
  • 15. Hypothesis Validation •  Once generated, a hypothesis must be validated –  gather evidence that it is true –  the probability of a triple being true increases •  We utilize a stack of hypothesis checkers that provide –  confidence whether a hypothesis holds –  provenance: a pointer to a span of text that supports it •  Can be used to bind complex computational tasks –  e.g. formal reasoning/choosing between low-confidence extractions –  such tasks are made more tractable by using hypotheses as goals, e.g. a reasoner may be used effectively by constraining to only a part of the graph connected to a hypothesis
  • 16. Secondary Graph for Validation •  Hypotheses can be validated by looking for the tuple in the secondary graph •  a tuple will appear in SG if the subject and object entities occur in the same sentence somewhere in the corpus •  With precision at .02, it is important to find a productive threshold for accepting hypotheses •  we conducted several experiments to find this threshold
  • 17. Experiments •  3 for dev, 3 for test •  each experiment compares query results from only PG to query results using the PG+SG for hypothesis validation •  the three experiments compare performance at different primary graph thresholds
  • 18. 0-threshold primary graph with & without secondary graph secondary graph: all@0 for a given PG threshold we vary the SG threshold for validated hypotheses (x-axis)
  • 19. .1-threshold primary graph with & without secondary graph secondary graph: all@0 best performance point (.01 SG threshold) red line indicates the PG threshold - the PG-only flattens below this threshold as expected
  • 20. .2-threshold primary graph with & without secondary graph secondary graph: all@0 best performance point (.01 SG threshold) if a triple in the SG completes a query that is mostly answered by the PG it is very likely to be true the best performing configuration for dev is .2 threshold PG with SG hypotheses validated at .01 threshold
  • 21. Performance Text the difference at the chosen threshold on the test set significantly outperforms the baseline on the same set
  • 22. Conclusions •  the secondary graph can be exploited for getting answers •  the probability that a relation is true between two entities increases significantly when that relation completes a query answer that is partially satisfied in the primary graph •  able to target discarded interpretations when they will meet some user need •  the NLP process is not a one-shot deal, the query provides context for what the user is seeking and thus an opportunity to re-interpret the text