SlideShare a Scribd company logo
1 of 37
Biperpedia
AN ONTOLOGY FOR SEARCH APPLICATIONS
1
Present By
Dipen Shah
110420107064
Harsh Kevadia
110420107049
Nancy Sukhadia
110420107025
2
Introduction
 Search engines make significant efforts to recognize queries that
can be answered by structured data and invest heavily in creating
and maintaining high-precision databases.
 While these databases have a relatively wide coverage of entities,
the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM)
is relatively small.
3
Introduction (Cont.)
 We describe Biperpedia, an ontology with 1.6M (class, attribute)
pairs and 67K distinct attribute names.
 Biperpedia extracts attributes from the query stream, and then uses
the best extractions to seed attribute extraction from text.
 For every attribute Biperpedia saves a set of synonyms and text
patterns in which it appears, thereby enabling it to recognize the
attribute in more contexts.
 In addition to a detailed analysis of the quality of Biperpedia, we
show that it can increase the number of Web tables whose
semantics we can recover by more than a factor of 4 compared
with Freebase(FREEBASE.COM).
4
Introduction (Cont.) 5
Introduction (Cont.)
 We describe Biperpedia, an ontology of binary attributes that
contains up to two orders of magnitude more attributes than
Freebase.
 An attribute in Biperpedia (see Figure 1) is a relationship between a
pair of entities (e.g., CAPITAL of countries), between an entity and a
value (e.g., COFFEE PRODUCTION), or between an entity and a
narrative (e.g., CULTURE).
 Biperpedia is concerned with attributes at the schema level.
 Extracting actual values for these attributes is a subject of a future
effort.
 Biperpedia is a best-effort ontology in the sense that not all the
attributes it contains are meaningful.
6
Introduction (Cont.)
 Biperpedia includes a set of constructs that facilitates query and
text understanding.
 In particular, Biperpedia attaches to every attribute a set of
common misspellings of the attribute, its synonyms (some which may
be approximate), other related attributes (even if the specific
relationship is not known), and common text phrases that mention
the attribute.
7
Agenda
 Section 2 : defines our problem Setting
 Section 3 : describes the architecture of Biperpedia.
 Section 4 : describes how we extract attributes from the query Stream
 Section 5 : describes how we extract additional attributes from text.
 Section 6 : describes how we merge the attribute extractions and enhance the
ontology with synonyms.
 Section 7 : evaluates the attribute quality.
 Section 8 : describes an algorithm for placing attributes in the hierarchy.
 Section 9 : describes how we use Biperpedia to improve our interpretation of
Web tables.
 Section 10 : describes related work
 Section 11 : concludes.
8
Problem Definition
 The goal of Biperpedia is to find schema-level attributes that can be
associated with classes of entities.
 For example, we want to discover CAPITAL, GDP(Gross domestic
product), LANGUAGES SPOKEN, and HISTORY as attributes of
COUNTRIES.
 Biperpedia is not concerned with the values of the attributes. That is,
we are not trying to find the specific GDP of a given country.
9
It Solve The Problem In Following
Steps:
 Name, domain class, and range:
 Synonyms and misspellings:
 Related attributes and mentions:
 Provenance:
 Differences from a traditional ontology:
 Evaluation:
10
The Biperpedia System
 The Biperpedia extraction pipeline is shown in Figure 2. At a high
level, the pipeline has two phases.
 In the first phase, we extract attribute candidates from multiple data
sources, and in the second phase we merge the extractions and
enhance the ontology by finding synonyms, related attributes, and
the best classes for attributes.
 The pipeline is implemented as a FlumeJava pipeline .(FlumeJava is
one type of java library)
11
Biperpedia Extraction Pipeline 12
Query Stream Extraction
 Find Candidate Attribute
 Reconcile to Freebase
 InstanceCount(C,A)
 QueryCount(C,A)
 Remove co-reference mentions
 Output attribute candidates
13
Extraction From Web Text
 Noun and Verb (Concept)
 Extraction via distant supervision
 Attribute classification
14
Extraction Via Distant Supervision
 Figure shows the yield of the top induced extraction patterns.
Although we induce more than 2500 patterns, we see that the top-
200 patterns account for more than 99% of the extractions.
15
Separation By Attribute Type 16
Attribute Classification
 Example
 min
𝑊 𝑖 𝑊 𝑇. 𝐹 𝑥𝑖, 𝑦𝑖 + 𝜆1 𝑊 + 𝜆2| 𝑊 |2
17
Synonym Detection
 For spell correction, we rely on the search engine. Given an
attribute A of a class C, we examine the spell corrections that the
search engine would propose for the query “C A”.
 If one of the corrections is an attribute A’ of C, then we deem A to
be a misspelling of A’.
 For example, given the attribute WRITTER of class BOOKS, the search
engine will propose that books writer is a spell correction of books
writter.
18
Attribute Quality
 DBPedia
 DBpedia is a crowd-sourced community effort to extract structured
information from Wikipedia and make this information available on the
Web.
 Experimental setting
 Overall quality
19
Experimental Setting 20
Overall Quality
 3 evaluators to determine whether an attribute is good or bad for
this class.
1. Rank by Query
2. Rank by Text
3. Precision (specifies the fraction of attributes that were labelled as good)
21
Finding The Best Class
 Biperpedia attaches attribute to every class in hierarchy.
 For more modular ontology or attribute that can contribute to
freebase, best class need to be found.
22
Example 23
Placement Algorithm
 How can we decide which can be best class for the attribute?
 The algorithm traverses, in a bottom up fashion, each tree of classes
for which A has been marked as relevant
 Equation:-
Squery(C, A) = InstanceCount(C, A)
Max A*{InstanceCount(C, A*)}
 Support(S) is the ratio between the number of instances of C that
have A and the maximal number of instances for any attribute of C.
24
(contd..)
 Which one to choose When there are several siblings with sufficient
support.
 Diversity Measure for the sibling.
𝐷(𝐶1, … , 𝐶 𝑛, 𝐴) =
1
𝑛−1 𝑖=1
𝑛
max
𝑗=1,..,𝑛
𝑆 𝐶 𝑗 ,𝐴 −𝑆(𝐶 𝑗,𝐴)
max
𝑗=1,..,𝑛
𝑆 𝐶 𝑗 ,𝐴
0 𝑓𝑜𝑟 𝑛 = 1
n>1,
25
Algorithm 26
Evaluation
 We can Check whether the assignment of the attribute is exact or
not.
Precision Measures:
 Mexact: ratio of number of exact assignments to all assignments.
 Mapprox: ratio of number of approximate assignments to all
assignments. Note that an approximate assignment is still valuable
because a human curator would only have to consider a small
neighbourhood of classes to find the exact match.
27
Results
 Best Result when Θ = 0.9
 Algorithm outperforms by more than 50%.
28
Interpreting WEB TABLES
 Biperpedia is useful if it can improve search applications.
 There are millions of high-quality HTML tables on the Web with very
diverse content.
 One of the major challenges with Web tables is to understand the
attributes that are represented in the tables.
29
Mapping Algorithm 30
Interpretation Quality
 The Representative column shows the number of tables for which at
least one correct representative attribute was found.
 The Overall (P/R) column shows the average precision/recall over all
mappings.
 The Avg. P/R per table columns compute the precision/recall per
table and then averages over all the tables.
31
Comparison with Freebase
 The first set of columns shows the number of mappings to Biperpedia
attributes, the number that were mapped to Freebase attributes,
and the ratio between them.
 The second set of columns show these numbers for mappings to
representative attributes.
32
Error Analysis
 Noisy token in the surrounding text and page title
 Incorrect string matching against column headers
 Table is too specific.
 Not enough information.
 Evaluator Disagreement.
 Biperpedia too small.
33
(Contd..) 34
Conclusion
Biperpedia, an ontology search application that extends Freebase
from query stream and Web text. It enables interpreting over a factor
of 4 more Web tables than is possible with Freebase. This algorithm can
be applied to any query stream with possibly different results.
35
References
 M. D. Adelfio and H. Samet. Schema extraction for tabular data on
the web. PVLDB, 2013.
 S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G.
Ives. Dbpedia: A nucleus for a web of open data.
 M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang.
Webtables: exploring the power of tables on the web.
 A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M.
Mitchell. Toward an architecture for never-ending language
learning.
 A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration.
Morgan Kaufmann, 2012.
36
Thank You
Q/A!
37

More Related Content

What's hot

Ap Power Point Chpt5
Ap Power Point Chpt5Ap Power Point Chpt5
Ap Power Point Chpt5dplunkett
 
Introduction to database-ER Model
Introduction to database-ER ModelIntroduction to database-ER Model
Introduction to database-ER ModelAjit Nayak
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Raj Naik
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R LanguageGaurang Dobariya
 
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationRanking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationJason Yang
 
Triple-Triple RDF Store with Greedy Graph based Grouping
Triple-Triple RDF Store with Greedy Graph based GroupingTriple-Triple RDF Store with Greedy Graph based Grouping
Triple-Triple RDF Store with Greedy Graph based GroupingVinoth Chandar
 
Ch 1 intriductions
Ch 1 intriductionsCh 1 intriductions
Ch 1 intriductionsirshad17
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Ap Power Point Chpt3
Ap Power Point Chpt3Ap Power Point Chpt3
Ap Power Point Chpt3dplunkett
 
Python Collections Tutorial | Edureka
Python Collections Tutorial | EdurekaPython Collections Tutorial | Edureka
Python Collections Tutorial | EdurekaEdureka!
 
Data Structure
Data Structure Data Structure
Data Structure Ibrahim MH
 
data structures and algorithms Unit 3
data structures and algorithms Unit 3data structures and algorithms Unit 3
data structures and algorithms Unit 3infanciaj
 

What's hot (20)

Ap Power Point Chpt5
Ap Power Point Chpt5Ap Power Point Chpt5
Ap Power Point Chpt5
 
C programming
C programmingC programming
C programming
 
Data structure
Data structureData structure
Data structure
 
Introduction to database-ER Model
Introduction to database-ER ModelIntroduction to database-ER Model
Introduction to database-ER Model
 
Lecture-05-DSA
Lecture-05-DSALecture-05-DSA
Lecture-05-DSA
 
Arrays in c_language
Arrays in c_language Arrays in c_language
Arrays in c_language
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
Chapter 14
Chapter 14Chapter 14
Chapter 14
 
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationRanking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
 
Triple-Triple RDF Store with Greedy Graph based Grouping
Triple-Triple RDF Store with Greedy Graph based GroupingTriple-Triple RDF Store with Greedy Graph based Grouping
Triple-Triple RDF Store with Greedy Graph based Grouping
 
Ch 1 intriductions
Ch 1 intriductionsCh 1 intriductions
Ch 1 intriductions
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Data Structure
Data StructureData Structure
Data Structure
 
Ap Power Point Chpt3
Ap Power Point Chpt3Ap Power Point Chpt3
Ap Power Point Chpt3
 
Python Collections Tutorial | Edureka
Python Collections Tutorial | EdurekaPython Collections Tutorial | Edureka
Python Collections Tutorial | Edureka
 
Data types in python
Data types in pythonData types in python
Data types in python
 
Data Structure
Data Structure Data Structure
Data Structure
 
Java -lec-6
Java -lec-6Java -lec-6
Java -lec-6
 
data structures and algorithms Unit 3
data structures and algorithms Unit 3data structures and algorithms Unit 3
data structures and algorithms Unit 3
 

Similar to Biperpedia: An ontology of Search Application

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsPreetha Chatterjee
 
[Research] protocols and structures for inference a res tful api for machine...
[Research] protocols and structures for inference  a res tful api for machine...[Research] protocols and structures for inference  a res tful api for machine...
[Research] protocols and structures for inference a res tful api for machine...PAPIs.io
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
20100810
2010081020100810
20100810guanqoo
 
DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESDATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESsuthi
 
Rozalia alik math3 (latest)
Rozalia alik  math3 (latest)Rozalia alik  math3 (latest)
Rozalia alik math3 (latest)Rozalia Alik
 
An Introduction to the DCMI Abstract Model
An Introduction to the DCMI Abstract ModelAn Introduction to the DCMI Abstract Model
An Introduction to the DCMI Abstract ModelEduserv Foundation
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixNovita Sari
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)smumbahelp
 
Relational data model
Relational data modelRelational data model
Relational data modelSURBHI SAROHA
 
A Recommender System for Refining Ekeko/X Transformation
A Recommender System for Refining Ekeko/X TransformationA Recommender System for Refining Ekeko/X Transformation
A Recommender System for Refining Ekeko/X TransformationCoen De Roover
 
Rozalia alik math3 (newone)
Rozalia alik  math3 (newone)Rozalia alik  math3 (newone)
Rozalia alik math3 (newone)Rozalia Alik
 

Similar to Biperpedia: An ontology of Search Application (20)

ppt
pptppt
ppt
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
final
finalfinal
final
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
[Research] protocols and structures for inference a res tful api for machine...
[Research] protocols and structures for inference  a res tful api for machine...[Research] protocols and structures for inference  a res tful api for machine...
[Research] protocols and structures for inference a res tful api for machine...
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
20100810
2010081020100810
20100810
 
DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESDATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
 
Rozalia alik math3 (latest)
Rozalia alik  math3 (latest)Rozalia alik  math3 (latest)
Rozalia alik math3 (latest)
 
Generics collections
Generics collectionsGenerics collections
Generics collections
 
An Introduction to the DCMI Abstract Model
An Introduction to the DCMI Abstract ModelAn Introduction to the DCMI Abstract Model
An Introduction to the DCMI Abstract Model
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/Unix
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)
 
Relational data model
Relational data modelRelational data model
Relational data model
 
A Recommender System for Refining Ekeko/X Transformation
A Recommender System for Refining Ekeko/X TransformationA Recommender System for Refining Ekeko/X Transformation
A Recommender System for Refining Ekeko/X Transformation
 
Lect1.pptx
Lect1.pptxLect1.pptx
Lect1.pptx
 
ch3
ch3ch3
ch3
 
Rozalia alik math3 (newone)
Rozalia alik  math3 (newone)Rozalia alik  math3 (newone)
Rozalia alik math3 (newone)
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Biperpedia: An ontology of Search Application

  • 1. Biperpedia AN ONTOLOGY FOR SEARCH APPLICATIONS 1
  • 2. Present By Dipen Shah 110420107064 Harsh Kevadia 110420107049 Nancy Sukhadia 110420107025 2
  • 3. Introduction  Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases.  While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. 3
  • 4. Introduction (Cont.)  We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names.  Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text.  For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts.  In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase(FREEBASE.COM). 4
  • 6. Introduction (Cont.)  We describe Biperpedia, an ontology of binary attributes that contains up to two orders of magnitude more attributes than Freebase.  An attribute in Biperpedia (see Figure 1) is a relationship between a pair of entities (e.g., CAPITAL of countries), between an entity and a value (e.g., COFFEE PRODUCTION), or between an entity and a narrative (e.g., CULTURE).  Biperpedia is concerned with attributes at the schema level.  Extracting actual values for these attributes is a subject of a future effort.  Biperpedia is a best-effort ontology in the sense that not all the attributes it contains are meaningful. 6
  • 7. Introduction (Cont.)  Biperpedia includes a set of constructs that facilitates query and text understanding.  In particular, Biperpedia attaches to every attribute a set of common misspellings of the attribute, its synonyms (some which may be approximate), other related attributes (even if the specific relationship is not known), and common text phrases that mention the attribute. 7
  • 8. Agenda  Section 2 : defines our problem Setting  Section 3 : describes the architecture of Biperpedia.  Section 4 : describes how we extract attributes from the query Stream  Section 5 : describes how we extract additional attributes from text.  Section 6 : describes how we merge the attribute extractions and enhance the ontology with synonyms.  Section 7 : evaluates the attribute quality.  Section 8 : describes an algorithm for placing attributes in the hierarchy.  Section 9 : describes how we use Biperpedia to improve our interpretation of Web tables.  Section 10 : describes related work  Section 11 : concludes. 8
  • 9. Problem Definition  The goal of Biperpedia is to find schema-level attributes that can be associated with classes of entities.  For example, we want to discover CAPITAL, GDP(Gross domestic product), LANGUAGES SPOKEN, and HISTORY as attributes of COUNTRIES.  Biperpedia is not concerned with the values of the attributes. That is, we are not trying to find the specific GDP of a given country. 9
  • 10. It Solve The Problem In Following Steps:  Name, domain class, and range:  Synonyms and misspellings:  Related attributes and mentions:  Provenance:  Differences from a traditional ontology:  Evaluation: 10
  • 11. The Biperpedia System  The Biperpedia extraction pipeline is shown in Figure 2. At a high level, the pipeline has two phases.  In the first phase, we extract attribute candidates from multiple data sources, and in the second phase we merge the extractions and enhance the ontology by finding synonyms, related attributes, and the best classes for attributes.  The pipeline is implemented as a FlumeJava pipeline .(FlumeJava is one type of java library) 11
  • 13. Query Stream Extraction  Find Candidate Attribute  Reconcile to Freebase  InstanceCount(C,A)  QueryCount(C,A)  Remove co-reference mentions  Output attribute candidates 13
  • 14. Extraction From Web Text  Noun and Verb (Concept)  Extraction via distant supervision  Attribute classification 14
  • 15. Extraction Via Distant Supervision  Figure shows the yield of the top induced extraction patterns. Although we induce more than 2500 patterns, we see that the top- 200 patterns account for more than 99% of the extractions. 15
  • 17. Attribute Classification  Example  min 𝑊 𝑖 𝑊 𝑇. 𝐹 𝑥𝑖, 𝑦𝑖 + 𝜆1 𝑊 + 𝜆2| 𝑊 |2 17
  • 18. Synonym Detection  For spell correction, we rely on the search engine. Given an attribute A of a class C, we examine the spell corrections that the search engine would propose for the query “C A”.  If one of the corrections is an attribute A’ of C, then we deem A to be a misspelling of A’.  For example, given the attribute WRITTER of class BOOKS, the search engine will propose that books writer is a spell correction of books writter. 18
  • 19. Attribute Quality  DBPedia  DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.  Experimental setting  Overall quality 19
  • 21. Overall Quality  3 evaluators to determine whether an attribute is good or bad for this class. 1. Rank by Query 2. Rank by Text 3. Precision (specifies the fraction of attributes that were labelled as good) 21
  • 22. Finding The Best Class  Biperpedia attaches attribute to every class in hierarchy.  For more modular ontology or attribute that can contribute to freebase, best class need to be found. 22
  • 24. Placement Algorithm  How can we decide which can be best class for the attribute?  The algorithm traverses, in a bottom up fashion, each tree of classes for which A has been marked as relevant  Equation:- Squery(C, A) = InstanceCount(C, A) Max A*{InstanceCount(C, A*)}  Support(S) is the ratio between the number of instances of C that have A and the maximal number of instances for any attribute of C. 24
  • 25. (contd..)  Which one to choose When there are several siblings with sufficient support.  Diversity Measure for the sibling. 𝐷(𝐶1, … , 𝐶 𝑛, 𝐴) = 1 𝑛−1 𝑖=1 𝑛 max 𝑗=1,..,𝑛 𝑆 𝐶 𝑗 ,𝐴 −𝑆(𝐶 𝑗,𝐴) max 𝑗=1,..,𝑛 𝑆 𝐶 𝑗 ,𝐴 0 𝑓𝑜𝑟 𝑛 = 1 n>1, 25
  • 27. Evaluation  We can Check whether the assignment of the attribute is exact or not. Precision Measures:  Mexact: ratio of number of exact assignments to all assignments.  Mapprox: ratio of number of approximate assignments to all assignments. Note that an approximate assignment is still valuable because a human curator would only have to consider a small neighbourhood of classes to find the exact match. 27
  • 28. Results  Best Result when Θ = 0.9  Algorithm outperforms by more than 50%. 28
  • 29. Interpreting WEB TABLES  Biperpedia is useful if it can improve search applications.  There are millions of high-quality HTML tables on the Web with very diverse content.  One of the major challenges with Web tables is to understand the attributes that are represented in the tables. 29
  • 31. Interpretation Quality  The Representative column shows the number of tables for which at least one correct representative attribute was found.  The Overall (P/R) column shows the average precision/recall over all mappings.  The Avg. P/R per table columns compute the precision/recall per table and then averages over all the tables. 31
  • 32. Comparison with Freebase  The first set of columns shows the number of mappings to Biperpedia attributes, the number that were mapped to Freebase attributes, and the ratio between them.  The second set of columns show these numbers for mappings to representative attributes. 32
  • 33. Error Analysis  Noisy token in the surrounding text and page title  Incorrect string matching against column headers  Table is too specific.  Not enough information.  Evaluator Disagreement.  Biperpedia too small. 33
  • 35. Conclusion Biperpedia, an ontology search application that extends Freebase from query stream and Web text. It enables interpreting over a factor of 4 more Web tables than is possible with Freebase. This algorithm can be applied to any query stream with possibly different results. 35
  • 36. References  M. D. Adelfio and H. Samet. Schema extraction for tabular data on the web. PVLDB, 2013.  S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data.  M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web.  A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning.  A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. 36