Your SlideShare is downloading. ×
0
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Semantic Search Engines
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantic Search Engines

1,374

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,374
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
98
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Semantic Search Engines based on Data Integration Systems Chapter 13
    • 2. Agenda <ul><li>Semantic Search Engines : Motivation </li></ul><ul><li>Semantic Search Engines : Ingredients </li></ul><ul><li>The SEWASIE project </li></ul><ul><ul><li>Architecture of the SEWASIE system </li></ul></ul><ul><ul><li>Building the SEWASIE system ontology </li></ul></ul><ul><ul><li>Querying the SEWASIE system </li></ul></ul><ul><li>An architectural evolution of SEWASIE : WISDOM </li></ul><ul><li>Conclusion and Future Work </li></ul>
    • 3. Motivation <ul><li>Semantic Search Engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. </li></ul><ul><li>Ingredients for develop Semantic Search Engines with good performance: </li></ul><ul><ul><li>Data Integration Systems , </li></ul></ul><ul><ul><li>Domain Ontologies , and </li></ul></ul><ul><ul><li>Peer-to-Peer architectures </li></ul></ul><ul><li>We will provide empirical evidence for our hypothesis: we will describe two projects, SEWASIE and WISDOM, which rely on these architectural features and developed key semantic search functionalities. </li></ul><ul><li>They both exploit the MOMIS data integration system. </li></ul>
    • 4. Ingredients for Semantic Search Engines (1) <ul><li>Data Integration Systems </li></ul><ul><ul><li>Data Integration : to combine data residing at different autonomous sources, and providing the user with a unified view of these data. </li></ul></ul><ul><ul><li>Data Integration Systems: are characterized by a wrapper/mediator architecture based on a Global Virtual Schema ( Global Virtual View - GVV ) and a set of data sources: The data sources contain the real data, while the GVV provides a reconciled, integrated, and virtual view of the underlying sources. </li></ul></ul><ul><li>Domain Ontologies </li></ul><ul><ul><li>In the Semantic Web , the data is associated with descriptions with a formal semantics, defined in terms of ontologies </li></ul></ul><ul><ul><li>Given a set of data sources related to a domain Data integration provides a GVV that is a conceptualization ( domain ontology ) describing the involved sources. </li></ul></ul>
    • 5. Ingredients for Semantic Search Engines (2) <ul><li>Peer-to-Peer architectures </li></ul><ul><ul><li>Schema based P2P networks : combine approaches from P2P as well as from the data integration and semantic web research areas. Such networks build upon peers that use metadata (ontologies) to describe their contents and semantic mappings among concepts of different peers’ ontologies. </li></ul></ul><ul><ul><li>Peer Data Management Systems : each node can be a data source, a mediator system, or both; a mediator node performs the semantic integration of a set of information sources to derive a global schema of the acquired information. </li></ul></ul><ul><ul><li>Super-peer networks : metadata for a small group of peers is centralized onto a single super-peer; a super-peer is a node that acts as a centralized server to a subset of clients. </li></ul></ul><ul><ul><li>Semantic overlay clustering approach : aims at creating logical layers above the physical network topology, by matching semantic information provided by peers to clusters of nodes. </li></ul></ul>
    • 6. MOMIS <ul><li>The MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration from both structured and semistructured data sources. (www.dbgroup.unimo.it/Momis) </li></ul><ul><li>Information integration is performed in a semi-automatic way, by exploiting the knowledge in a Common Thesaurus and descriptions of source schemas with a combination of clustering techniques and Description Logics. </li></ul><ul><ul><li>An object-oriented language, with an underlying Description Logic, called ODL I3 , is introduced for information extraction </li></ul></ul><ul><ul><li>The integration process gives rise to a virtual integrated view of the underlying sources: it is thus possible to synthesize a domain ontology ( GVV ) of a set of data sources related to a domain. </li></ul></ul><ul><ul><li>MOMIS follows a Global-As-View (GAV) approach where the GVV and the mappings among the local sources and the GVV are defined in a semi-automatic way. </li></ul></ul>
    • 7. SEWASIE <ul><li>SEWASIE - SEmantic Webs and AgentS in Integrated Economies (www.sewasie.org) is a project funded by EU on action line Semantic Web (2002-2005) </li></ul><ul><li>In SEWASIE the schema-based and super-peer network approaches are combined, that is a schema-based super-peer network organized into a two-level architecture: </li></ul><ul><ul><li>Peer level : a peer contains a data integration system, which integrates heterogeneous data sources into an ontology composed of: an annotated GVV and Mappings to the source schemas. </li></ul></ul><ul><ul><li>Super-peer level : a super-peer contains a integration system, which integrates the GVV of its peers into an ontology composed of a GVV of the peers GVVs and Mappings to the GVVs of its peers. </li></ul></ul><ul><li>A novel approach for defining the ontology of the super-peer and querying the peer network is introduced. </li></ul><ul><li>The search engine has been fully exploiting agent technology </li></ul>
    • 8. WISDOM <ul><li>WISDOM - Web Intelligent Search based on DOMain ontologies (www.dbgroup.unimo.it/wisdom) is an italian MIUR-PRIN project (2004-2006) </li></ul><ul><li>WISDOM is based on an overlay network of semantic peers, where each peer contains a mediator-based integration system. Key feature is a distributed architecture based on the P2P paradigm and the adoption of domain ontologies . </li></ul><ul><li>Two level of integration of information sources: </li></ul><ul><ul><li>Lower Level - Strong integration : a semantic peer contains a data integration system, which integrates heterogeneous data sources into a domain ontology composed of: an annotated GVV and Mappings to the data source schemas. </li></ul></ul><ul><ul><li>Upper Level - Loose integration : a network of peers with semantic mappings among the ontologies of a set of semantic peer When a query is posed against one given peer, it is suitably propagated towards other peers among the network of mappings. </li></ul></ul>
    • 9. Agenda <ul><li>Semantic Search Engines : Motivation </li></ul><ul><li>Semantic Search Engines : Ingredients </li></ul><ul><li>The SEWASIE project </li></ul><ul><ul><li>Architecture of the SEWASIE system </li></ul></ul><ul><ul><li>Building the SEWASIE system ontology </li></ul></ul><ul><ul><li>Querying the SEWASIE system </li></ul></ul><ul><li>An architectural evolution of SEWASIE : WISDOM </li></ul><ul><li>Conclusion and Future Work </li></ul>
    • 10. The SEWASIE architecture Query Results Brokering Agent (BA) BA Ontology Monitoring Agent (MA ) Query Tool Interface OLAP Tool SINode Structured Databases RDBs Wrapper Query Metadata Repository Semi - Databases Wrapper <XML> <DATA>... </DATA> Wrapper Unstructured Text documents <HTML> ... Structured Databases RDBs Wrapper Query Manager Ontology Databases Wrapper <XML> <DATA>... Wrapper <HTML> ... Ontology Builder Structured Query Agent Query Agent Query Agent SINode SINode SEWASIE Interconnection infrastructure BA BA BA BA Brokering Agent (BA)
    • 11. SEWASIE - Goal <ul><li>We propose a novel approach (implemented in SEWASIE) for querying a super-peer within a schema-based super-peer network focusing on querying a single BA </li></ul><ul><li>We have two different levels of mappings: </li></ul><ul><ul><li>The first mapping ( m1 ) is at the BA level and maps several GVVs of SINodes to the GVV of the BA; </li></ul></ul><ul><ul><li>the second mapping ( m2 ) is within an SINode and maps the data sources into the GVV of an SINode. </li></ul></ul><ul><li>Query answering can be carried out in terms of two reformulation steps </li></ul><ul><ul><li>Reformulation w.r.t. the BA ontology (mapping m1 ); </li></ul></ul><ul><ul><li>Reformulation w.r.t. the SINode ontology (mapping m2 ). </li></ul></ul>
    • 12. The two different levels of mapping
    • 13. The two-level data integration system <ul><li>An Integration System IS = (GVV,N,M) is constituted by: </li></ul><ul><ul><li>A GVV , which is a schema in ODLI3, with is-a relationships and both key and foreign key constraints. </li></ul></ul><ul><ul><li>A set N of local sources ; each local source has an ODLI3 schema. </li></ul></ul><ul><ul><li>A set M of GAV mapping assertions between GVV and N, where each assertion associates to an element g in GVV a query q N over the schemas of a set of local sources in N. </li></ul></ul><ul><ul><li>For each global class C of the GVV we define: </li></ul></ul><ul><ul><ul><li>a (possibly empty) set of local classes, denoted by L(C), belonging to the local sources in N . </li></ul></ul></ul><ul><ul><ul><li>a conjunctive query q N over L(C). </li></ul></ul></ul><ul><li>A SEWASIE system is constituted by: </li></ul><ul><ul><li>A set of SINodes SN = {SN1, SN2, . . . , SNn} , where each SINode is a IS = (GVV,N,M) , with N a set of data sources. </li></ul></ul><ul><ul><li>A Brokering Agent BA , which is an IS = (GVV,N,M) where N = SN , i.e., the sources of BA are the SINodes. </li></ul></ul>
    • 14. Integration System: Semantics <ul><li>SEWASIE: GVV contains integrity constraints, and sources are considered sound (but not necessarily complete). </li></ul><ul><ul><li>When the global schema contains integrity constraints, even of simple forms, the semantics of the data integration system is best described in terms of a set of databases, rather than a single one, and this implies that query processing is intimately connected to the notion of querying incomplete databases . </li></ul></ul><ul><li>Traditional data integration systems ( MOMIS ) follow one of the following strategies: they either express the global schema as a set of plain relations without integrity constraints, or they consider the sources as exact, as opposed to sound. </li></ul><ul><li>[Calvanese et al - KR2004] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. What to ask to a peer: Ontolgoy-based query reformulation. KR 2004 </li></ul>
    • 15. Agenda <ul><li>Semantic Search Engines : Motivation </li></ul><ul><li>Semantic Search Engines : Ingredients </li></ul><ul><li>The SEWASIE project </li></ul><ul><ul><li>Architecture of the SEWASIE system </li></ul></ul><ul><ul><li>Building the SEWASIE system ontology </li></ul></ul><ul><ul><li>Querying the SEWASIE system </li></ul></ul><ul><li>An architectural evolution of SEWASIE : WISDOM </li></ul><ul><li>Conclusion and Future Work </li></ul>
    • 16. Building the SEWASIE system ontology (GVV) <ul><li>MOMIS/SEWASIE allows the integration designer semi-automatically building a GVV starting from a set of local sources: Ontology Builder . The integration process exploits: </li></ul><ul><ul><li>Schema derived relationships </li></ul></ul><ul><ul><li>Lexicon derived relationships </li></ul></ul><ul><ul><li>Description Logics techniques for generating new relationships </li></ul></ul><ul><ul><li>Clustering techniques for grouping similar contents of local sources </li></ul></ul><ul><li>The process gives rise to a Mapping Table (MT) for each global class C of GVV, whose columns represent the local classes L(C) belonging to C and whose rows represent the global attributes of C. </li></ul><ul><ul><li>An element MT[GA][LC] represents the set of local attributes of LC which are mapped onto the global attribute GA. </li></ul></ul>
    • 17. Overview of the GVV-generation process AUTOMATIC/ MANUAL ANNOTATION SEMI-AUTOMATIC ANNOTATION INFERRED RELATIONSHIPS LEXICON DERIVED RELATIONSHIPS SCHEMA DERIVED RELATIONSHIPS Common Thesaurus COMMON THESAURUS GENERATION USER SUPPLIED RELATIONSHIPS ODLI3 LOCAL SCHEMA N WRAPPING ODLI3 LOCAL SCHEMA 1 … GVV GENERATION MAPPING TABLES GLOBAL SCHEMA (ODLI3) clusters generation Structured source RDB <XML> <DATA> Semi-Structured Source SYNSET 1 SYNSET # SYNSET 2 WNEditor
    • 18. Example of mapping table
    • 19. Building the Mappings: q N definition <ul><li>GAV mappings : for each global class C of the GVV we must define a query qN over the local classes of C. </li></ul><ul><li>Starting from the Mapping Table of C, the integration designer, supported by the Ontology Builder graphical interface, can implicitly define q N by: </li></ul><ul><ul><li>using and extending the Mapping Table with </li></ul></ul><ul><ul><ul><li>Data Conversion Functions from local to global attributes </li></ul></ul></ul><ul><ul><ul><li>Join Conditions among pairs of local classes belonging to C </li></ul></ul></ul><ul><ul><ul><li>Resolution Functions for global attributes to solve data conflicts of local attribute values. </li></ul></ul></ul><ul><ul><li>using and extending the Full Disjunction operator, that has been recognized as providing a natural semantics for data merging queries </li></ul></ul>
    • 20. Building the Mappings: an example from T_SN1 full join T_SN2 Join Attribute on (T_SN1.COMPANY_ID = T_SN2.COMPANY_ID) Join Conditions Full Disjunction Select COMPANY_ID, precedence(T_SN1.ADDRESS, T_SN2.ADRESS) as Address, T_SN2.SUBCONTRACTOR, … Resolution Functions Precedence(SN1,SN2) ... ADDRESS ADDRESS ADDRESS REGION REGION REGION CAPITAL_STOCK CAPITAL_STOCK SUBCONTRATOR SUBCONTRATOR COMPANY_ID, COUNTRY_ID COMPANY_ID COMPANY_ID SN2.company SN1.company
    • 21. Data Conversion Functions <ul><li>The designer defines how local attributes are mapped onto the global attribute GA by means of Data Conversion Functions: </li></ul><ul><ul><li>for each not null element MT[GA][L], a not a null a Data Conversion Function, denoted by MTF[GA][L], which represents how the local attributes of L are mapped into the global attribute GA is defined. </li></ul></ul><ul><ul><li>MTF[GA][L] is a function executable by the local source L. For example, for relational sources, MTF[GA][L] is an SQL value expression; the following defaults hold: </li></ul></ul><ul><li>T(L) denotes L transformed by the Data Conversion Function; the schema of T(L) is composed of the global attributes GA such that MT[GA][L] is not null. </li></ul>
    • 22. Join Conditions <ul><li>Object Identification : Merging data from different sources requires different representations of the same real world object to be identified. </li></ul><ul><li>Join Conditions : To identify instances of the same object and fuse them among pairs of local classes belonging to the same global class. </li></ul><ul><li>Given two local classes L1 and L2 belonging to C, a Join Condition between L1 and L2, denoted with JC(L1,L2), is an expression over L1.Ai and L2.Aj where Ai (Aj) are global attributes with a not null mapping in L1 (L2). </li></ul><ul><li>As an example, for BA-GVV.Company the designer can define JC(SN1.Company,SN1.Company) : SN1.Company.COMPANY ID = SN2.Company.COMPANY ID </li></ul>
    • 23. Resolution Functions <ul><li>The fusion of data coming from different sources has to take into account the problem of inconsistent information among sources. </li></ul><ul><li>MOMIS/SEWASIE adopts Resolution Functions . </li></ul><ul><ul><li>A Resolution Function may be defined for each global attribute mapping onto local attributes coming from several sources, to solve data conflicts due to different local attribute values. </li></ul></ul><ul><ul><li>Homogeneous Attributes : If there are no data conflicts for a global attribute mapped onto more than one source </li></ul></ul><ul><ul><li>As an example, in BA-GVV.Company, we define all the global attributes as Homogeneous Attributes except for Address where we used a precedence function : </li></ul></ul><ul><ul><li>SN1.Company.ADDRESS has a higher precedence than SN2.Company.ADDRESS </li></ul></ul>
    • 24. Full Disjunction <ul><li>Full Disjunction (FD) [Galindo Legaria-SIGMOD1994] and [Rajarama, Ullman - PODS 1996] “computing the natural outer-join of many relations preserving all possible connections among facts” </li></ul><ul><li>Given a global class C composed of L1,L2, ..., Ln we consider </li></ul><ul><ul><li>FD(T(L1), T(L2), . . . , T(Ln)) </li></ul></ul><ul><ul><li>computed on the basis of the Join Conditions </li></ul></ul><ul><ul><li>where T(L) denotes L transformed by the Data Conversion Function, i.e., the full disjunction operator is applied after data conversion. </li></ul></ul>
    • 25. Full Disjunction Computation (1/2) <ul><li>[Rajarama, Ullman - PODS 1996] : There is a natural outerjoin sequence producing FD if and only if the set of relation schemes forms a connected, acyclic hypergraph. (with two relations, FD corresponds to the full (outer) join) </li></ul><ul><li>A Global Class C with more than 2 local classes is a cyclic hypergraph  new method </li></ul><ul><li>Moreover, we consider the requirement that qN has to contain a unique tuple merging all the tuples representing the same real world object. </li></ul>Example with n = 3 : L1 L2 L3 JC(L1,L3) JC(L1,L2) JC(L2,L3)
    • 26. Full Disjunction Computation (2/2) <ul><li>The computation of FD is performed assuming: </li></ul><ul><ul><li>each L contains a key, </li></ul></ul><ul><ul><li>all the join conditions are on key attributes, </li></ul></ul><ul><ul><li>all the join attributes are mapped into the same set of global attributes (K). </li></ul></ul><ul><li>It can be demonstrated that: </li></ul><ul><ul><li>K is a key of C, and </li></ul></ul><ul><ul><li>FD can be computed as (FDExpr): </li></ul></ul><ul><li>T(L1) full join T(L2) on JC(L1,L2)) full join T(L3) on (JC(L1,L3) OR JC(L2,L3)) ... full join T(Ln) on (JC(L1,Ln) OR JC(L2,Ln) OR ...OR JC(Ln-1,Ln)) </li></ul>
    • 27. Agenda <ul><li>Semantic Search Engines : Motivation </li></ul><ul><li>Semantic Search Engines : Ingredients </li></ul><ul><li>The SEWASIE project </li></ul><ul><ul><li>Architecture of the SEWASIE system </li></ul></ul><ul><ul><li>Building the SEWASIE system ontology </li></ul></ul><ul><ul><li>Querying the SEWASIE system </li></ul></ul><ul><li>An architectural evolution of SEWASIE : WISDOM </li></ul><ul><li>Conclusion and Future Work </li></ul>
    • 28. Querying the SEWASIE system <ul><li>A SEWASIE system is a two-level data integrated system : </li></ul><ul><ul><li>Mapping m1, among the SINode-GVVs and the BA-GVV; </li></ul></ul><ul><ul><li>Mapping m2, among the source schemas and the SINode-GVV. </li></ul></ul><ul><li>Halevy et al [VLDB2003] showed that, in general, the mapping from the data source schemas to the BA-GVV is not simply the composition of m1 and m2; Fagin et al [SIGMOD2004] showed that second order logic is needed to express composition. </li></ul><ul><li>Calvanese et al [KR2004] proved that if m1 and m2 are GAV mappings, the mapping is indeed the composition of m1 and m2; this implies that query answering can be carried out in terms of two reformulation steps: </li></ul><ul><ul><li>w.r.t. the BA-ontology (BA-GVV + mapping m1); </li></ul></ul><ul><ul><li>w.r.t. the SINode-ontology (SINode-GVV + mapping m2). </li></ul></ul><ul><li>These reformulation steps are similar: in the following we will discuss the reformulation w.r.t. the BA-ontology </li></ul>
    • 29. Query Reformulation <ul><li>Query expansion ( [ Calvanese et al - KR2004 ]) </li></ul><ul><ul><li>The query on the BA-GVV is expanded by taking into account the constraints in the BA-GVV: all constraints in the ontology are “compiled in” the expansion, so that the expanded query ( EXPQuery ) can be processed by ignoring constraints – this is the first technique of this kind in the data integration literature, as all other approaches to GAV data integration are based on just unfolding (which is an incomplete technique in our case) </li></ul></ul><ul><ul><li>Subqueries ( EXPAtoms ) are extracted from EXPQuery . An EXPAtom is a Single Class Query , i.e., a query on a single Global Class of the BA-GVV. </li></ul></ul><ul><li>Query unfolding (for single class queries) </li></ul><ul><ul><li>Each EXPAtom is unfolded by considering the mappings in the BA-Ontology, so that it is rewritten w.r.t. the SINode-GVVs . </li></ul></ul><ul><ul><li>In the following we will discuss the unfolding process of an EXPAtom by taking into account the new approach to define qN. </li></ul></ul>
    • 30. Query unfolding <ul><li>Given a global class C of the BA-GVV, with classes L1,L2, . . .Ln, we consider a Single Global Query ( SGQ ) Q over a C: Q = select <Q_select-list> from C where <Q_condition> </li></ul><ul><li><Q_condition> is a Boolean expression of atomic constraints: </li></ul><ul><li>(GA1 op value) or (GA1 op GA2), GA1 and GA2 are attributes of C. Example: EXPATOM = SELECT NAME,CAPITAL_STOCK,REGION,ADDRESS,SUBCONTRACTOR </li></ul><ul><li>FROM company </li></ul><ul><li>WHERE CAPITAL_STOCK>50 AND REGION LIKE ’VENETO’ AND SUBCONTRACTOR LIKE ’yes’ </li></ul><ul><li>The output of the query unfolding process is </li></ul><ul><ul><li>a set of SCQs ( FDAtoms ) over the SINodes GVVs: FDAtom = select <select-list> from SINode.C where <condition> where C is a Global Class of the SINode-GVV. </li></ul></ul><ul><ul><li>the FDExpr which computes the Full Disjunction of the FDAtoms </li></ul></ul><ul><ul><li>the resolution functions of the attributes in <select-list> </li></ul></ul><ul><li>The query unfolding process is made up of the following steps: (1) Atomic constraint mapping; (2) Select-list computation </li></ul>
    • 31. Query unfolding : Atomic constraint mapping <ul><li>Each atomic constraint of Q is rewritten into one that can be supported by the local class. </li></ul><ul><ul><li>The atomic constraint mapping is performed on the basis of the mapping functions defined in the Mapping Table. The atomic constraint mapping depends on the definition of the Resolution Functions for global attributes. </li></ul></ul><ul><ul><li>Non Homogeneous Attributes : For example, if we use the AVG function as resolution function for GA, the constraint (GA = value) cannot be pushed at the local sources, because of the AVG function has to be calculated at a global level, the constraint may be globally true but locally false. </li></ul></ul><ul><ul><li>In this case, the constraint is mapped as true in the local class. </li></ul></ul><ul><ul><li>Homogeneous Attributes : An atomic constraint (GA op value) is mapped onto the local class L as follows: (MTF[GA][L] op value) if MT[GA][L] is not null and the op operator is supported into L true otherwise </li></ul></ul>
    • 32. Query unfolding : Select-list computation <ul><li>The select-list of a FDAtom over the local class L is computed by considering the union of </li></ul><ul><ul><li>the attributes in <Q_select-list> with a not null mapping in L </li></ul></ul><ul><ul><li>the set of attributes used to express the join conditions for L </li></ul></ul><ul><ul><li>the attributes in <Q_condition> with a not null mapping in L </li></ul></ul><ul><li>For example, the set of FDAtoms for expatom is : </li></ul><ul><ul><li>FDATOM1 : </li></ul></ul><ul><ul><li>SELECT COMPANY_ID, NAME, REGION, ADDRESS, CAPITAL_STOCK </li></ul></ul><ul><ul><li>FROM SN1.company </li></ul></ul><ul><ul><li>WHERE ((CAPITAL_STOCK) > (50) and (REGION) like (’VENETO’)) </li></ul></ul><ul><ul><li>FDATOM2 : </li></ul></ul><ul><ul><li>SELECT COMPANY_ID, COUNTRY_ID, NAME, REGION, ADDRESS, SUBCONTRACTOR </li></ul></ul><ul><ul><li>FROM SN2.company </li></ul></ul><ul><ul><li>WHERE (REGION) like (’VENETO’) and (SUBCONTRACTOR) like (’yes’) </li></ul></ul>
    • 33. Query unfolding : FDExpr and Resolution Functions <ul><li>The FDExpr to compute the FD of FDAtom1 and FDAtom2 is: FDATOM1 full join FDATOM2 on (FDATOM1.COMPANY_ID=FDATOM2.COMPANY_ID) </li></ul><ul><li>The unfolded query is then obtained by applying to each query attribute of FDExpr, the related Resolution Function: </li></ul><ul><ul><li>for Homogeneous Attributes (e.g. REGION) one of the related values is taken; </li></ul></ul><ul><ul><li>for non Homogeneous Attributes (e.g. ADDRESS) the related Resolution Function is applied. </li></ul></ul><ul><li>After the query reformulation process, we have to consider query processing techniques to evaluate queries over our two-level data integration system. In the following we show the agent-based prototype developed for the SEWASIE Query Management . </li></ul>
    • 34. SEWASIE Query Management: functional architecture SINodeAgent1 Query UNFOLDER ExpAtoms EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT END USER QUERY TOOL SEWASIE_DB Expanded Query: EXPQuery ExpAtoms Unfolding: FDExpr , FDAtoms , ResFunctions Query Result EXECUTION + FUSION + FINAL RESULT FDAtoms FDAtoms Answers to FDAtoms Answers to FDAtoms Map Keeper BBA-GVV SINodeAgent2 mapping
    • 35. SEWASIE Query Management: EXPANDER UNFOLDER Librarian SINodeAgent2 SINodeAgent1 Query ExpAtoms EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT SEWASIE_DB scq1: SELECT CATEGORY_ID FROM Mould_Making scq2: SELECT NAME,COMPANY_ID,CAPITAL_STOCK, REGION,SUBCONTRACTOR,ADDRESS FROM company WHERE CAPITAL_STOCK > 50 AND AND REGION LIKE 'VENETO' AND SUBCONTRACTOR LIKE ’yes’ scq3: ... Expanded Query: EXPQuery EXPQuery: SELECT r2.NAME,r2.ADDRESS,r2.NATION FROM scq1 r1,scq2 r2,scq3 r3 WHERE r1.CATEGORY_ID=r3.CATEGORY_ID AND r2.COMPANY_ID=r3.COMPANY_ID UNION SELECT r2.NAME,r2.ADDRESS,r2.NATION FROM scq4 r1,scq2 r2,scq3 r3 WHERE … UNION … END USER QUERY TOOL Query
    • 36. SEWASIE Query Management: UNFOLDER UNFOLDER SewasieRepository Query ExpAtoms Expanded Query: EXPQuery ExpAtoms Unfolding: FDQuery , FDAtoms , ResFunctions EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT Query <ul><li>FDAtom2: </li></ul><ul><li>SELECT COMPANY_ID,COUNTRY_ID,NAME, REGION,ADDRESS, SUBCONTRACTOR FROM company WHERE ((REGION) like ('VENETO') and (SUBCONTRACTOR) like ('yes')) </li></ul><ul><li>FDAtom1: </li></ul><ul><ul><li>... </li></ul></ul>Full Disjunction: FDQuery: SELECT * FROM FDAtom1 OUTER JOIN FDAtom1 ON (FDAtom1.COMPANY_ID = FDAtom2.COMPANY_ID) scq2: SELECT NAME,COMPANY_ID,CAPITAL_STOCK, REGION,SUBCONTRACTOR,ADDRESS FROM company WHERE CAPITAL_STOCK > 50 AND AND REGION LIKE 'VENETO' AND SUBCONTRACTOR LIKE ’yes’ Resolution Function: precedence ( ${SI-NMAgent2.company.ADDRESS}, ${SI-NMAgent1.company.ADDRESS}) END USER QUERY TOOL
    • 37. UNFOLDER Librarian SINodeAgent2 SINodeAgent1 Query ExpAtoms Expanded Query: EXPQuery ExpAtoms Unfolding: FDQuery , FDAtoms , ResFunctions EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT Query SEWASIE_DB <ul><li>The Query Agent – coordination of query processing </li></ul><ul><ul><li>Accepts the query from the End User Query Tool, interacts with both the BA and the SINode Agents, and returns the result to the End User Query Tool </li></ul></ul>END USER QUERY TOOL
    • 38. The Query Agent : EXECUTION UNFOLDER Librarian SINodeAgent2 SINodeAgent1 EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT <ul><li>EXECUTION For each FDAtom (Parallel Execution): </li></ul><ul><ul><li>INPUT : FDAtom </li></ul></ul><ul><ul><li>MESSAGES : from QA to SINode Agent </li></ul></ul><ul><ul><li>OUTPUT : a table storing the FDAtom result in the SEWASIE_DB </li></ul></ul>EXECUTION SEWASIE_DB FDAtoms Answer to FDAtoms FDAtoms Answer to FDAtoms END USER QUERY TOOL
    • 39. The Query Agent : FUSION UNFOLDER Librarian SINodeAgent2 SINodeAgent1 EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT EXECUTION FUSION <ul><li>FUSION For each EXPATom (Parallel Execution): </li></ul><ul><ul><li>INPUT : FDAtoms, FDQuery, Resolution Functions </li></ul></ul><ul><ul><ul><li>Execution of FDQuery (Full Disjunction of the FDAtoms) </li></ul></ul></ul><ul><ul><ul><li>Application of the Resolution Functions on the result of previous action </li></ul></ul></ul><ul><ul><li>OUTPUT : a view storing the EXPAtom result in the SEWASIE_DB </li></ul></ul>SEWASIE_DB END USER QUERY TOOL
    • 40. The Query Agent : FINAL RESULT UNFOLDER Librarian SINodeAgent2 SINodeAgent1 SEWASIE_DB EXPANDER PLAY MAKER BROKERING AGENT BA Ontology QUERY AGENT EXECUTION FUSION FINAL RESULT <ul><li>FINAL RESULT </li></ul><ul><ul><li>INPUT : Output of the FUSION step </li></ul></ul><ul><ul><ul><li>Execution of the Expanded Query </li></ul></ul></ul><ul><ul><li>OUTPUT : Final Query result view stored in the SEWASIE_DB </li></ul></ul>END USER QUERY TOOL
    • 41. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 42. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 43. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 44. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 45. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 46. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 47. Querying SEWASIE: Interface (available at www.sewasie.org)
    • 48. Agenda <ul><li>Semantic Search Engines : Motivation </li></ul><ul><li>Semantic Search Engines : Ingredients </li></ul><ul><li>The SEWASIE project </li></ul><ul><ul><li>Architecture of the SEWASIE system </li></ul></ul><ul><ul><li>Building the SEWASIE system ontology </li></ul></ul><ul><ul><li>Querying the SEWASIE system </li></ul></ul><ul><li>An architectural evolution of SEWASIE : WISDOM </li></ul><ul><li>Conclusion and Future Work </li></ul>
    • 49. WISDOM: Semantic peer and Wrappers <ul><li>Every information source S ij is associated with a wrapper W ij , whose goal is to make the data access method transparent to the upper layers. </li></ul><ul><li>A wrapper offers a logical schema S ij against which the upper layers can pose queries. </li></ul>Global Virtual View Data source schema Wrapper Web source
    • 50. WISDOM: Semantic peer network
    • 51. Peer-to-Peer Mapping and Query Processing <ul><li>A semantic peer-to-peer mapping , denoted M i,j , is a relationship between the ontology Ont i of the semantic peer P i , and the ontology Ont j of the semantic peer P j . </li></ul><ul><li>By means of p2p mappings, a query at a peer can be ideally extended to each peer for which a mapping is defined. </li></ul><ul><li>It is not always convenient to propagate a query to any peer for which a mapping exists. </li></ul>We associate every peer-to-peer mapping with a content summary. Given a pair of semantic peers for which it exists a peer-to-peer mapping, the content summary associated with such a mapping provides quantitative information about the extension of the concepts in the source ontology that can be found through the mapping in the target semantic peer.
    • 52. Wrapping Large Web Sites <ul><li>A large number of Web sites contain highly structured regions . These sites represent rich and up-to-date information sources, which could be used to populate WISDOM semantic peers. </li></ul><ul><li>Several researchers have recently developed techniques to automatically infer web wrappers (extract data from HTML pages). Many web sites contain large collections of structurally similar pages. </li></ul><ul><li>The main problems, which significantly affect the scalability of the wrapper approach, are how to identify the structured regions of the target site, and how to collect the sample pages to feed the wrapper generation process. </li></ul><ul><ul><li>Based on such a site model we can infer a library of wrappers .The model, together with the wrappers, can then be used to continuously extract data from the target web site. </li></ul></ul>
    • 53. Wrapping Large Web Sites Web site Site model Given a large web site composed by thousands of interconnected page, we aim at producing model , that describes at the intensional level the structure of the site.
    • 54. Query Processing : formulation <ul><li>To ease the user in the task of formulating queries, a graphical user interface is provided that allows queries to be specified with respect to the ontology of the peer the user is connected to (“target ontology”). </li></ul><ul><li>Besides specifying conditions that objects have to satisfy, a user query might also include preferences . </li></ul><ul><li>The result of a query Q with a preference specification pref is the set of objects, reachable from the target peer by navigating its mappings, that better comply with pref . </li></ul>
    • 55. Query rewriting and peers selection <ul><li>Content summary (CS) deal the problem of selecting only relevant peers </li></ul><ul><li>CS is a synopsis of the source peer contents </li></ul><ul><ul><li>In the simplest form a CS includes the cardinalities, in the source peer extension, of the concepts in the target ontology. This is recursively extended to include also information on the extensions that can be found navigating the network through the source peer </li></ul></ul><ul><li>The output is a set of “ranked rewritings” R1,...,Rm for the original query Q, with rewriting R1 being reputed the “most promising” one to return relevant results. </li></ul>At Web scale [giving a complete answer to every query] is unfeasible and query execution must move to a probabilistic world of evidence accumulation and away from exact answers.
    • 56. Query Processing : execution <ul><li>The approach to query execution inspires to works developed for joining ranked inputs, that have been applied to databases and information retrieval systems </li></ul><ul><ul><li>We have a set of data sources, each one ranking objects according to a specific local criterion; we wish to determine the overall best objects: those objects which are ranked higher with respect to a global criterion </li></ul></ul><ul><li>In a network of peers things get more complex, and techniques are properly extended to deal with this increased complexity </li></ul><ul><li>A query posed against the GVV retrieves data from the integrated source: according to the GAV strategy </li></ul><ul><li>Queries are unfolded by taking into account the view Qn </li></ul><ul><li>The results defined from the subqueries onto the local schema is integrated and reconcilated in a global answer on the basis of Qn </li></ul>
    • 57. Conclusion <ul><li>We discussed some ingredients for developing Semantic Search Engines based on Data Integration Systems and peer-to-peer architectures. </li></ul><ul><li>SEWASIE </li></ul><ul><ul><li>Techniques for Building a (super-)peer ontology (GVV and Mappings) </li></ul></ul><ul><ul><li>Techniques for Querying a super-peer </li></ul></ul><ul><li>WISDOM </li></ul><ul><ul><li>Semantic peer network </li></ul></ul><ul><ul><li>Peer-to-Peer Mapping and Query Processing (content summary) </li></ul></ul><ul><ul><li>Query rewriting and peers selection </li></ul></ul>
    • 58. Future Work <ul><li>SEWASIE </li></ul><ul><ul><li>To manage the evolution of a super-peer ontology </li></ul></ul><ul><ul><li>Managing the evolution of a peer ontology which integrates a set of peers, is an important feature in a peer-to-peer network, where peers can appear and disappear very frequently from the network. </li></ul></ul><ul><ul><li>To investigate efficient query processing techniques to evaluate queries over two-level data integration systems. </li></ul></ul><ul><li>WISDOM </li></ul><ul><ul><li>improving the previous proposal by providing a framework for building an ontology customized for a set of information sources and annotating them according the built ontology. </li></ul></ul>

    ×